Skip to content

fix(cloud-eval): lift grader execution errors into RowMetric.error#202

Merged
placerda merged 1 commit into
mainfrom
hotfix/cloud-eval-grader-error-surface
May 29, 2026
Merged

fix(cloud-eval): lift grader execution errors into RowMetric.error#202
placerda merged 1 commit into
mainfrom
hotfix/cloud-eval-grader-error-surface

Conversation

@placerda
Copy link
Copy Markdown
Contributor

Hotfix mirror of #201 onto main.

Source-controlled trace of the same fix already merged to develop as commit 6bea832.

Summary

When a Foundry azure_ai_evaluator grader fails to execute (most commonly because the evaluator service principal lacks Cognitive Services OpenAI User on the target model deployment), the per-metric score comes back null and the real cause is buried in result.sample.error.message. The cloud-results parser now lifts that message into RowMetric.error (including the error code prefix when present), so the actionable error appears in results.json and report.md instead of operators only seeing actual=missing in the threshold table. The orchestrator's "0 usable metric scores" warning also quotes the first grader error so CI logs carry the signal without operators having to download the raw artifact.

Files

  • src/agentops/pipeline/cloud_results.py — new _extract_grader_error + _normalize_error_payload helpers; _metric_from_result now lifts grader errors.
  • src/agentops/pipeline/orchestrator.py — new _first_metric_error helper; the "0 usable scores" warning quotes the first error.
  • tests/unit/test_cloud_results.py — 3 new tests covering the RBAC failure shape, top-level error dict, and "real score wins over stale sample.error".
  • CHANGELOG.md — entry under [Unreleased]Fixed.

Verification

Notes

  • The amend ensures source files retain CRLF line endings consistent with main's existing convention (no .gitattributes here yet).
  • After this merges, sync develop ← main (no-op for code; merges only the branch pointer).

Closes the gap left by #195/#196 (parser widening) and #197/#199 (artifact upload): those probed for scores that never existed because the grader had errored. This PR finally surfaces the error.

)

When a Foundry azure_ai_evaluator grader fails to execute (e.g., the evaluator service principal lacks Cognitive Services OpenAI User on the model deployment), the per-metric score returns null and the real cause is buried in result.sample.error.message. Without surfacing it, operators see only actual=missing in the threshold table and have to dig into cloud_output_items.json to find the RBAC failure.

The parser now extracts sample.error.message (and top-level error dicts), prefixing the error code when present. The orchestrator's 0-usable-scores warning quotes the first grader error so CI logs carry the actionable cause.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@placerda placerda merged commit 723f7a8 into main May 29, 2026
1 check passed
@placerda placerda deleted the hotfix/cloud-eval-grader-error-surface branch May 29, 2026 05:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant