Fix VLM metric structured output by davidberenstein1957 · Pull Request #594 · PrunaAI/pruna

davidberenstein1957 · 2026-03-24T17:24:43Z

Summary

fall back CPU-only stateful VLM metrics to cpu during metric construction
route transformers structured outputs through outlines for existing pydantic schemas
add focused regressions plus a manual VLM smoke script

Verification

uv run --extra dev pytest -q tests/evaluation/test_task.py::test_vlm_metrics_fallback_to_cpu_on_auto_device tests/evaluation/test_vlm_metrics.py::test_transformers_generate_routes_pydantic_response_format_to_outlines tests/evaluation/test_vlm_metrics.py::test_transformers_outlines_result_serialization tests/evaluation/test_vlm_metrics.py::test_evaluation_agent_update_stateful_metrics_with_stub_vlm
uv run --extra dev python scripts/smoke_vlm_metrics.py --stub

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Bugbot Autofix prepared a fix for the issue found in the latest run.

✅ Fixed: Registry crashes when metric_device is None
- I added a guard in MetricRegistry so device normalization/fallback only runs when a stateful metric device is provided, allowing None to pass through to StatefulMetric initialization as before.

Or push these changes by commenting:

@cursor push b308d27d2e

Preview (b308d27d2e)

diff --git a/src/pruna/evaluation/metrics/registry.py b/src/pruna/evaluation/metrics/registry.py
--- a/src/pruna/evaluation/metrics/registry.py
+++ b/src/pruna/evaluation/metrics/registry.py
@@ -137,9 +137,10 @@
         elif isclass(metric_cls):
             if issubclass(metric_cls, StatefulMetric):
                 metric_device = stateful_metric_device if stateful_metric_device else device
-                requested_device, _ = split_device(device_to_string(metric_device), strict=False)
-                if requested_device not in metric_cls.runs_on and "cpu" in metric_cls.runs_on:
-                    metric_device = "cpu"
+                if metric_device is not None:
+                    requested_device, _ = split_device(device_to_string(metric_device), strict=False)
+                    if requested_device not in metric_cls.runs_on and "cpu" in metric_cls.runs_on:
+                        metric_device = "cpu"
                 kwargs["device"] = metric_device
             elif issubclass(metric_cls, BaseMetric):
                 kwargs["device"] = inference_device if inference_device else device

_{This Bugbot Autofix run was free. To enable autofix for future PRs, go to the Cursor dashboard.}

Comment @cursor review or bugbot run to trigger another review on this PR

cursor · 2026-03-24T17:30:38Z

src/pruna/evaluation/metrics/registry.py

            if issubclass(metric_cls, StatefulMetric):
-                kwargs["device"] = stateful_metric_device if stateful_metric_device else device
+                metric_device = stateful_metric_device if stateful_metric_device else device
+                requested_device, _ = split_device(device_to_string(metric_device), strict=False)


Registry crashes when metric_device is None

High Severity

device_to_string(metric_device) raises ValueError when metric_device is None. This happens when MetricRegistry.get_metric is called for a StatefulMetric subclass without providing device, stateful_metric_device, or inference_device kwargs. The previous code simply passed None through to the metric constructor, which handled it gracefully via set_to_best_available_device(None). At least one existing caller (base_tester.py) invokes get_metric(metric) with no device args for arbitrary metrics.

Fix VLM metric structured output

015af72

davidberenstein1957 merged commit 20f59c9 into feat/metrics-vlm-support Mar 24, 2026
1 check passed

cursor bot reviewed Mar 24, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix VLM metric structured output#594

Fix VLM metric structured output#594
davidberenstein1957 merged 1 commit intofeat/metrics-vlm-supportfrom
davidberenstein1957/vlm-metrics-review

davidberenstein1957 commented Mar 24, 2026

Uh oh!

Uh oh!

cursor bot left a comment •

edited

Loading

Uh oh!

cursor bot Mar 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

davidberenstein1957 commented Mar 24, 2026

Summary

Verification

Uh oh!

Uh oh!

cursor bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cursor bot Mar 24, 2026

Choose a reason for hiding this comment

Registry crashes when metric_device is None

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

cursor bot left a comment •

edited

Loading