Skip to content

fix(eval overview): hide non-output metrics for evaluator steps#3897

Merged
bekossy merged 4 commits intorelease/v0.94.2from
fix/evaluator-overview-output-metrics
Mar 12, 2026
Merged

fix(eval overview): hide non-output metrics for evaluator steps#3897
bekossy merged 4 commits intorelease/v0.94.2from
fix/evaluator-overview-output-metrics

Conversation

@mmabrouk
Copy link
Member

@mmabrouk mmabrouk commented Mar 3, 2026

Summary

  • Restrict evaluator metrics in Overview to evaluator output namespaces (attributes.ag.data.outputs.* and normalized equivalents)
  • Filter both live run metrics and fallback evaluator metric definitions using the same namespace check
  • Prevent annotation infra metrics (duration, cost, tokens, errors) from showing as evaluator metrics in the Overview section

Testing

  • Not run (frontend-only filtering change)

Open with Devin

Filter evaluator overview metrics by output namespaces so annotation infra metrics (duration, cost, tokens, errors) are not displayed as evaluator metrics.
@vercel
Copy link

vercel bot commented Mar 3, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
agenta-documentation Ready Ready Preview, Comment Mar 11, 2026 9:09pm

Request Review

@dosubot dosubot bot added the size:S This PR changes 10-29 lines, ignoring generated files. label Mar 3, 2026
Copy link
Contributor

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 potential issue.

View 4 additional findings in Devin Review.

Open in Devin Review

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 normalizeMetricPath produces paths for outputs. prefix that isEvaluatorOutputMetric will always reject

When normalizeMetricPath receives a path starting with outputs. (e.g., "outputs.score"), it produces "attributes.ag.outputs.score". The new isEvaluatorOutputMetric filter then checks this against EVALUATOR_OUTPUT_PATH_PREFIXES, but none of them match "attributes.ag.outputs." — note the missing data. segment.

Root Cause

At evaluatorMetrics.ts:161, normalizeMetricPath maps outputs.Xattributes.ag.outputs.X:

if (trimmed.startsWith("outputs.")) return `attributes.ag.${trimmed}`

But the EVALUATOR_OUTPUT_PATH_PREFIXES at evaluatorMetrics.ts:30-35 does not include "attributes.ag.outputs." — it only includes "attributes.ag.data.outputs." (with the data. segment). So isEvaluatorOutputMetric("attributes.ag.outputs.score") returns false, and the metric is silently dropped at line 179.

Compare with the data. prefix handling at line 160: normalizeMetricPath("data.outputs.score")"attributes.ag.data.outputs.score" which correctly passes the filter.

This inconsistency means any evaluator definition whose metric path starts with outputs. (e.g., "outputs.score") will have that metric silently excluded from fallback metrics.

Impact: In practice, the standard extractMetrics flow (evaluators.ts:87-100) produces bare key names like "score" which hit the default branch of normalizeMetricPath and correctly get prefixed with attributes.ag.data.outputs.. So this bug would only manifest if an evaluator definition provides a metric path explicitly prefixed with outputs., which is a supported but apparently uncommon code path in normalizeMetricPath.

(Refers to line 161)

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

@github-actions
Copy link
Contributor

github-actions bot commented Mar 3, 2026

Railway Preview Environment

Status Destroyed (PR closed)

Updated at 2026-03-12T14:36:12.125Z

@mmabrouk mmabrouk requested review from ardaerzin March 3, 2026 22:46
@mmabrouk mmabrouk marked this pull request as draft March 12, 2026 10:23
@mmabrouk mmabrouk marked this pull request as ready for review March 12, 2026 10:23
@dosubot dosubot bot added the lgtm This PR has been approved by a maintainer label Mar 12, 2026
@bekossy bekossy changed the base branch from main to release/v0.94.2 March 12, 2026 14:35
@bekossy bekossy merged commit 43bf65a into release/v0.94.2 Mar 12, 2026
19 of 22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Evaluation Frontend lgtm This PR has been approved by a maintainer size:S This PR changes 10-29 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants