Skip to content

Wire VLM benchmarks end to end#595

Merged
davidberenstein1957 merged 1 commit intofeat/metrics-vlm-supportfrom
davidberenstein1957/vlm-metrics-review
Mar 24, 2026
Merged

Wire VLM benchmarks end to end#595
davidberenstein1957 merged 1 commit intofeat/metrics-vlm-supportfrom
davidberenstein1957/vlm-metrics-review

Conversation

@davidberenstein1957
Copy link
Copy Markdown
Member

Summary

  • wire benchmark registry entries to the VLM metrics they now support
  • default VLM metrics to cpu plus outlines-enabled structured local generation
  • replace the stub smoke script with benchmark-path end-to-end test coverage

Verification

  • uv run --extra dev pytest -q tests/evaluation/test_task.py::test_vlm_metrics_fallback_to_cpu_on_auto_device tests/evaluation/test_vlm_metrics.py::test_vlm_metric_defaults_enable_structured_local_generation tests/evaluation/test_vlm_metrics.py::test_benchmark_vlm_metrics_end_to_end tests/evaluation/test_vlm_metrics.py::test_transformers_generate_routes_pydantic_response_format_to_outlines tests/evaluation/test_vlm_metrics.py::test_transformers_outlines_result_serialization tests/evaluation/test_vlm_metrics.py::test_evaluation_agent_update_stateful_metrics_with_stub_vlm

@davidberenstein1957 davidberenstein1957 merged commit e54a9c2 into feat/metrics-vlm-support Mar 24, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant