fix(api): align evaluation invocation step key with SDK slug by mmabrouk · Pull Request #3862 · Agenta-AI/agenta

mmabrouk · 2026-02-27T16:35:26Z

Summary

Evaluation outputs were missing in the Eval Run Details UI because invocation results could not be matched to their invocation column. The match key is step_key.

This PR aligns API step key generation with SDK behavior for application invocation steps.

Root cause

For each evaluation run:

The API builds run metadata (steps and mappings) in api/oss/src/core/evaluations/service.py.
The SDK logs per-scenario results with a step_key in sdk/agenta/sdk/evaluations/preview/evaluate.py.
The frontend joins run mappings and results by exact string equality on step_key.

The SDK uses the revision slug directly:

application_slug = application_revision.slug
step_key = "application-" + application_slug

The API previously recomputed a value with:

get_slug_from_name_and_id(str(application_revision.slug), application_revision.id)

That recomputation can diverge from the stored revision slug, depending on how the revision slug was created. When they diverge, frontend joins fail and invocation outputs do not render.

Fix

In api/oss/src/core/evaluations/service.py:

Stop recomputing application invocation step keys with get_slug_from_name_and_id.
Use the same source as the SDK: application_revision.slug.
Add a defensive guard that returns early if the revision slug is missing.

New behavior:

step_key = "application-" + application_revision.slug

This guarantees parity with SDK result logging.

Why this is safe

Step keys are opaque identifiers used only for joining run mappings to results.
The frontend does not parse their internal format. It performs exact string matches.
Aligning API to SDK removes fragile recomputation and makes key generation deterministic.
Existing runs keep their historical keys. New runs generated after this change will consistently match SDK-logged results.

Research notes

During debugging, we verified:

Invocation output values exist in traces at attributes.ag.data.outputs.
UI renderers already handle object outputs correctly.
Missing output display was a join failure, not a rendering failure.
The failure point was key mismatch between API run mappings and SDK result entries.

The API generated evaluation step keys using application_revision.slug (a short hex like '6b39282c8e87') while the SDK used application_revision.name (a human-readable string like 'RAG E2E Pipeline'). This mismatch caused the frontend to fail matching results to their columns, so evaluation outputs were never displayed. Changed the API to use application_revision.name, consistent with the SDK and every other call site of get_slug_from_name_and_id.

vercel · 2026-02-27T16:35:31Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
agenta-documentation	Ready	Preview, Comment	Feb 27, 2026 4:44pm

devin-ai-integration

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 2 additional findings.

github-actions · 2026-02-27T16:57:40Z

Railway Preview Environment


Status	Destroyed (PR closed)

Updated at 2026-02-27T18:04:28.691Z

junaway · 2026-02-27T17:41:18Z

Missing from the migration. Thanks, @mmabrouk. lgtm !

dosubot bot added the size:XS This PR changes 0-9 lines, ignoring generated files. label Feb 27, 2026

vercel bot deployed to Preview February 27, 2026 16:36 View deployment

dosubot bot added the bug Something isn't working label Feb 27, 2026

devin-ai-integration bot reviewed Feb 27, 2026

View reviewed changes

mmabrouk requested a review from jp-agenta February 27, 2026 16:39

This comment was marked as resolved.

Sign in to view

fix(api): align evaluation invocation step key with SDK slug

f74b382

dosubot bot added size:S This PR changes 10-29 lines, ignoring generated files. and removed size:XS This PR changes 0-9 lines, ignoring generated files. labels Feb 27, 2026

vercel bot deployed to Preview February 27, 2026 16:44 View deployment

mmabrouk changed the title ~~fix(api): use revision name instead of slug for evaluation step keys~~ fix(api): align evaluation invocation step key with SDK slug Feb 27, 2026

junaway approved these changes Feb 27, 2026

View reviewed changes

junaway changed the base branch from main to release/v0.87.2 February 27, 2026 17:40

mmabrouk merged commit bb7633b into release/v0.87.2 Feb 27, 2026
11 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(api): align evaluation invocation step key with SDK slug#3862

fix(api): align evaluation invocation step key with SDK slug#3862
mmabrouk merged 2 commits intorelease/v0.87.2from
fix/eval-step-key-mismatch

mmabrouk commented Feb 27, 2026 •

edited

Loading

Uh oh!

vercel bot commented Feb 27, 2026 •

edited

Loading

Uh oh!

devin-ai-integration bot left a comment

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

github-actions bot commented Feb 27, 2026 •

edited

Loading

Uh oh!

junaway commented Feb 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

mmabrouk commented Feb 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Root cause

Fix

Why this is safe

Research notes

Uh oh!

vercel bot commented Feb 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

devin-ai-integration bot left a comment

Choose a reason for hiding this comment

✅ Devin Review: No Issues Found

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

github-actions bot commented Feb 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Railway Preview Environment

Uh oh!

junaway commented Feb 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mmabrouk commented Feb 27, 2026 •

edited

Loading

vercel bot commented Feb 27, 2026 •

edited

Loading

github-actions bot commented Feb 27, 2026 •

edited

Loading