Skip to content

fix(api): align evaluation invocation step key with SDK slug#3862

Merged
mmabrouk merged 2 commits intorelease/v0.87.2from
fix/eval-step-key-mismatch
Feb 27, 2026
Merged

fix(api): align evaluation invocation step key with SDK slug#3862
mmabrouk merged 2 commits intorelease/v0.87.2from
fix/eval-step-key-mismatch

Conversation

@mmabrouk
Copy link
Member

@mmabrouk mmabrouk commented Feb 27, 2026

Summary

Evaluation outputs were missing in the Eval Run Details UI because invocation results could not be matched to their invocation column. The match key is step_key.

This PR aligns API step key generation with SDK behavior for application invocation steps.

Root cause

For each evaluation run:

  1. The API builds run metadata (steps and mappings) in api/oss/src/core/evaluations/service.py.
  2. The SDK logs per-scenario results with a step_key in sdk/agenta/sdk/evaluations/preview/evaluate.py.
  3. The frontend joins run mappings and results by exact string equality on step_key.

The SDK uses the revision slug directly:

  • application_slug = application_revision.slug
  • step_key = "application-" + application_slug

The API previously recomputed a value with:

  • get_slug_from_name_and_id(str(application_revision.slug), application_revision.id)

That recomputation can diverge from the stored revision slug, depending on how the revision slug was created. When they diverge, frontend joins fail and invocation outputs do not render.

Fix

In api/oss/src/core/evaluations/service.py:

  • Stop recomputing application invocation step keys with get_slug_from_name_and_id.
  • Use the same source as the SDK: application_revision.slug.
  • Add a defensive guard that returns early if the revision slug is missing.

New behavior:

  • step_key = "application-" + application_revision.slug

This guarantees parity with SDK result logging.

Why this is safe

  • Step keys are opaque identifiers used only for joining run mappings to results.
  • The frontend does not parse their internal format. It performs exact string matches.
  • Aligning API to SDK removes fragile recomputation and makes key generation deterministic.
  • Existing runs keep their historical keys. New runs generated after this change will consistently match SDK-logged results.

Research notes

During debugging, we verified:

  • Invocation output values exist in traces at attributes.ag.data.outputs.
  • UI renderers already handle object outputs correctly.
  • Missing output display was a join failure, not a rendering failure.
  • The failure point was key mismatch between API run mappings and SDK result entries.

The API generated evaluation step keys using application_revision.slug
(a short hex like '6b39282c8e87') while the SDK used
application_revision.name (a human-readable string like
'RAG E2E Pipeline'). This mismatch caused the frontend to fail
matching results to their columns, so evaluation outputs were
never displayed.

Changed the API to use application_revision.name, consistent with
the SDK and every other call site of get_slug_from_name_and_id.
@vercel
Copy link

vercel bot commented Feb 27, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
agenta-documentation Ready Ready Preview, Comment Feb 27, 2026 4:44pm

Request Review

@dosubot dosubot bot added the size:XS This PR changes 0-9 lines, ignoring generated files. label Feb 27, 2026
@dosubot dosubot bot added the bug Something isn't working label Feb 27, 2026
Copy link
Contributor

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 2 additional findings.

Open in Devin Review

@mmabrouk mmabrouk requested a review from jp-agenta February 27, 2026 16:39
devin-ai-integration[bot]

This comment was marked as resolved.

devin-ai-integration[bot]

This comment was marked as resolved.

@dosubot dosubot bot added size:S This PR changes 10-29 lines, ignoring generated files. and removed size:XS This PR changes 0-9 lines, ignoring generated files. labels Feb 27, 2026
@mmabrouk mmabrouk changed the title fix(api): use revision name instead of slug for evaluation step keys fix(api): align evaluation invocation step key with SDK slug Feb 27, 2026
@github-actions
Copy link
Contributor

github-actions bot commented Feb 27, 2026

Railway Preview Environment

Status Destroyed (PR closed)

Updated at 2026-02-27T18:04:28.691Z

@junaway junaway changed the base branch from main to release/v0.87.2 February 27, 2026 17:40
@junaway
Copy link
Contributor

junaway commented Feb 27, 2026

Missing from the migration. Thanks, @mmabrouk. lgtm !

@mmabrouk mmabrouk merged commit bb7633b into release/v0.87.2 Feb 27, 2026
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working size:S This PR changes 10-29 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants