fix(cloud-eval): lift grader execution errors into RowMetric.error by placerda · Pull Request #202 · Azure/agentops

placerda · 2026-05-29T05:23:17Z

Hotfix mirror of #201 onto main.

Source-controlled trace of the same fix already merged to develop as commit 6bea832.

Summary

When a Foundry azure_ai_evaluator grader fails to execute (most commonly because the evaluator service principal lacks Cognitive Services OpenAI User on the target model deployment), the per-metric score comes back null and the real cause is buried in result.sample.error.message. The cloud-results parser now lifts that message into RowMetric.error (including the error code prefix when present), so the actionable error appears in results.json and report.md instead of operators only seeing actual=missing in the threshold table. The orchestrator's "0 usable metric scores" warning also quotes the first grader error so CI logs carry the signal without operators having to download the raw artifact.

Files

src/agentops/pipeline/cloud_results.py — new _extract_grader_error + _normalize_error_payload helpers; _metric_from_result now lifts grader errors.
src/agentops/pipeline/orchestrator.py — new _first_metric_error helper; the "0 usable scores" warning quotes the first error.
tests/unit/test_cloud_results.py — 3 new tests covering the RBAC failure shape, top-level error dict, and "real score wins over stale sample.error".
CHANGELOG.md — entry under [Unreleased] ▸ Fixed.

Verification

pytest tests/unit/test_cloud_results.py → 14 passed (locally on develop before merge).
pytest tests/ → 792 passed, 3 skipped (locally on develop before merge).
All 12 CI checks green on fix(cloud-eval): lift grader execution errors into RowMetric.error #201 (lint, build-vsix, coverage, 6× pytest matrix).

Notes

The amend ensures source files retain CRLF line endings consistent with main's existing convention (no .gitattributes here yet).
After this merges, sync develop ← main (no-op for code; merges only the branch pointer).

Closes the gap left by #195/#196 (parser widening) and #197/#199 (artifact upload): those probed for scores that never existed because the grader had errored. This PR finally surfaces the error.

) When a Foundry azure_ai_evaluator grader fails to execute (e.g., the evaluator service principal lacks Cognitive Services OpenAI User on the model deployment), the per-metric score returns null and the real cause is buried in result.sample.error.message. Without surfacing it, operators see only actual=missing in the threshold table and have to dig into cloud_output_items.json to find the RBAC failure. The parser now extracts sample.error.message (and top-level error dicts), prefixing the error code when present. The orchestrator's 0-usable-scores warning quotes the first grader error so CI logs carry the actionable cause. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

placerda merged commit 723f7a8 into main May 29, 2026
1 check passed

placerda deleted the hotfix/cloud-eval-grader-error-surface branch May 29, 2026 05:25

placerda mentioned this pull request May 29, 2026

docs(skill): require Cognitive Services OpenAI User as prereq RBAC role #203

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(cloud-eval): lift grader execution errors into RowMetric.error#202

fix(cloud-eval): lift grader execution errors into RowMetric.error#202
placerda merged 1 commit into
mainfrom
hotfix/cloud-eval-grader-error-surface

placerda commented May 29, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

placerda commented May 29, 2026

Summary

Files

Verification

Notes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant