[FE Fix] hide string-type evaluator metrics from scenario table & filters by ardaerzin · Pull Request #4477 · Agenta-AI/agenta

ardaerzin · 2026-05-28T16:21:03Z

Summary

String-typed evaluator outputs (e.g. an LLM-judge reasoning field) resolve through the metric layer as {type: "string", count, ...} stats blobs. unwrapStatsForCompare only unwraps binary and numeric variants, so the raw stats object falls through to JSON.stringify in the cell renderer and is shown as {"type":"string","count":...} in every row. The same blob makes filter predicates impossible — equality checks would compare a user string against a stats object and never match.

Drop string-typed leaves at column-build time in useEtlColumns and from the filter dropdown in ScenarioFilterBar by reusing the existing buildColumnValueTypeResolver (the metricType-from-evaluator-schema lookup that already powers the filter bar's value-type detection). The focus drawer is unaffected — it reads columnResult.groups/columns directly via useFocusDrawerSections, bypassing both code paths, so string metrics still appear there.

Testing

Verified locally

reasoning column is not shown

QA follow-up

test different output metrics

Checklist

I have included a video or screen recording for UI changes, or marked Demo as N/A
Relevant tests pass locally
Relevant linting and formatting pass locally
I have signed the CLA, or I will sign it when the bot prompts me

Contributor Resources

… and filter String-typed evaluator outputs (e.g. an LLM-judge `reasoning` field) resolve through the metric layer as `{type: "string", count, ...}` stats blobs. `unwrapStatsForCompare` only unwraps `binary` and `numeric` variants, so the raw stats object falls through to `JSON.stringify` in the cell renderer and is shown as `{"type":"string","count":...}` in every row. The same blob makes filter predicates impossible — equality checks would compare a user string against a stats object and never match. Drop string-typed leaves at column-build time in `useEtlColumns` and from the filter dropdown in `ScenarioFilterBar` by reusing the existing `buildColumnValueTypeResolver` (the metricType-from-evaluator-schema lookup that already powers the filter bar's value-type detection). The focus drawer is unaffected — it reads `columnResult.groups/columns` directly via `useFocusDrawerSections`, bypassing both code paths, so string metrics still appear there.

vercel · 2026-05-28T16:21:09Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
agenta-documentation	Ready	Preview, Comment	May 28, 2026 8:35pm

coderabbitai · 2026-05-28T16:21:15Z

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 7b6e44b2-c4cc-4356-b134-0184576aa150

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

📝 Walkthrough

Walkthrough

String-typed evaluator outputs in ETL evaluation tables are now detected as backend placeholders and resolved from trace annotations; EtlResolvedCell materializes all referenced traces and ScenarioFilterBar excludes string-typed evaluator fields from the filter UI.

Changes

String-type evaluator field handling

Layer / File(s)	Summary
String-type metric placeholder resolver `web/packages/agenta-entities/src/evaluationRun/etl/resolveMappings.ts`, `web/packages/agenta-entities/src/evaluationRun/etl/__tests__/resolveMappings.test.ts`	A new `isStringTypeMetricPlaceholder` helper recognizes `{type: "string", count: N}` placeholder shapes. `resolveFromMetric` returns `null` for such placeholders so composed resolvers fall back to trace-based annotation resolution. Two tests verify fallback when metric is a placeholder and non-fallback when metric includes distribution (`freq`).
All-trace materialization & clamping `web/oss/src/components/EvalRunDetails/etl/cells/EtlResolvedCell.tsx`, `web/oss/src/components/EvalRunDetails/etl/useEtlColumns.tsx`	`EtlResolvedCell` now depends on the `traces` slice for evaluator columns, collects all trace IDs from cached results, requests missing trace entities, and keeps the cell loading until every required trace entity is present (or failed). Per-row clamping constants are retuned and column-level `ellipsis` is removed in favor of cell-level clamping.
Filter UI hiding for string-type evaluators `web/oss/src/components/EvalRunDetails/etl/ScenarioFilterBar.tsx`	`ScenarioFilterBar` filters the field list to exclude evaluator fields with `valueType: "string"`, preventing generation of UI filter rows for those evaluator outputs.

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly describes the main change: hiding string-type evaluator metrics from both the scenario table and filters, which aligns with the primary objectives and changeset.
Description check	✅ Passed	The description is well-related to the changeset, explaining the problem (string-typed evaluator outputs rendered as JSON blobs), the solution (drop them from columns and filters), and testing/verification steps.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fe-fix/eval-scenarios-table-metric-columns-fix

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 5ca5e55f-0f27-4e12-a419-219b62617f7e

📥 Commits

Reviewing files that changed from the base of the PR and between 0b9012d and 42ac683.

📒 Files selected for processing (3)

web/oss/src/components/EvalRunDetails/Table.tsx
web/oss/src/components/EvalRunDetails/etl/ScenarioFilterBar.tsx
web/oss/src/components/EvalRunDetails/etl/useEtlColumns.tsx

github-actions · 2026-05-28T16:43:28Z

Railway Preview Environment


Status	Destroyed (PR closed)

Updated at 2026-05-28T20:36:18.635Z

`buildColumnValueTypeResolver` falls back to a column-name-only lookup, so without a kind guard a same-named testset / application / metrics column could inherit an evaluator output's string `metricType` and be incorrectly hidden from the scenario table and filter dropdown. Restrict the string-suppression check to `evaluator` leaves in both `useEtlColumns` and `ScenarioFilterBar`.

coderabbitai · 2026-05-28T17:20:01Z

Actionable comments posted: 0

mmabrouk

Thanks @ardaerzin

The fix is not correct from a product level. We should not hide the evaluator reasoning column from the table, just show the string like we used to.

I am working on the other fixes, I will merge as is if no update until then, we can fix it later

Per product feedback: hiding string-type evaluator outputs from the scenario table is the wrong move. The column should stay visible and render the actual string (rendering fix lands separately). The filter dropdown should keep these fields selectable for the same reason. Reverts the table-column filter from `useEtlColumns` and the filter-dropdown filter from `ScenarioFilterBar`, restoring both files to their main-branch state. Drawer behavior is unchanged either way (it reads `columnResult` directly).

…lter fields" This reverts commit 0a50ab3.

…able String-typed evaluator outputs (e.g. an LLM-judge's `reasoning` field) only land in the metric layer as a `{type: "string", count: N}` placeholder — the backend can't build a distribution over arbitrary text, so it only records that *some* string was emitted. The actual string lives on the annotation trace; the focus drawer already resolves it from there. The scenario table was hitting the placeholder via `resolveFromMetric` first, short-circuiting the composed `resolveFromTrace` fallback, and rendering the raw stats blob via `JSON.stringify`. Three coordinated changes: - `resolveFromMetric` (agenta-entities/evaluationRun/etl) detects the bare `{type: "string", count}` placeholder shape and returns `null` so the composed `resolveFromTrace` resolver picks the real string out of the annotation trace. Distribution-bearing `{type: "string", freq: …}` shapes are returned as-is (they carry real data). Covered by two new unit tests in `resolveMappings.test.ts`. - `EtlResolvedCell.SLICES_BY_KIND.evaluator` now includes `"traces"` so the annotation trace gets materialized for evaluator cells. Both the cell-level materializer request and the slice-loading check now iterate every result's `trace_id`, not just the first — a scenario carries multiple traces (invocation, annotation, …) and the annotation isn't always result[0]. - Reverts the column-build hiding in `useEtlColumns` + the `columnResult` plumbing in `Table.tsx` introduced earlier. The column stays visible and renders the actual string now that the resolver chain works correctly. Filter dropdown hiding in `ScenarioFilterBar` is intentionally kept — filtering on free-form LLM-generated reasoning isn't a useful product operation.

coderabbitai · 2026-05-28T20:22:12Z

Actionable comments posted: 0

…uncation Two coordinated fixes so long strings (e.g. an evaluator's `reasoning`) fill the cell across multiple lines and truncate cleanly at the bottom: - Drop `ellipsis: true` from the `useEtlColumns` column spec. Antd applies `white-space: nowrap` when it is set, forcing all cell content onto a single line and overriding the `EtlResolvedCell` body's `-webkit-line-clamp`. The column header has its own ellipsis-ing inside the `Tooltip` span, so the header is unaffected. - Re-tune `MAX_LINES_BY_HEIGHT` in `EtlResolvedCell` to match what actually fits inside `.scenario-table-cell` at each row-height variant (3 / 6 / 12 lines, down from 4 / 9 / 18). With the previous values the clamp point sat past the parent's `overflow: hidden` cut, so the ellipsis was never visible. Matching the line count to the visible area places the ellipsis on the last fully-rendered line.

ardaerzin · 2026-05-28T20:24:46Z

Thanks @ardaerzin

The fix is not correct from a product level. We should not hide the evaluator reasoning column from the table, just show the string like we used to.

I am working on the other fixes, I will merge as is if no update until then, we can fix it later

done

coderabbitai · 2026-05-28T20:28:21Z

Actionable comments posted: 0

`computeColumnGroup`'s default label for an evaluator group is `slugToTitle(evaluatorSlug)`, which leaks the slug shape into the header (e.g. "with-reasoning-jifn" rendered as "With Reasoning Jifn") whenever a real evaluator name is known. The testset and application headers were already resolving real entity names via reference query atoms; the evaluator branch fell through to the slug-derived fallback. `EtlColumnHeader` now also subscribes to `evaluationEvaluatorsByRunQueryAtomFamily(runId)` and, for evaluator-kind groups, matches by `refs.evaluator.id` first and then by slug (`refs.evaluator.slug` / `refs.evaluator_revision.slug`), surfacing the evaluator's real `name`. Falls back to the existing slug-titled label when no match is found (run not loaded yet, ad-hoc evaluator with no definition, etc). `useEtlColumns` now threads `runId` into each header.

Headers now read as self-describing pairs: "Testset: completion_testset" "Application: comp-1" "Evaluator: With Reasoning" Each kind branch builds the label from `Kind: ` + the resolved entity name. The slug-derived fallback for testset / application reads from `group.slug` (not `group.label`) — the default `computeColumnGroup` label already embeds the kind word, so reusing it here would render "Testset: Testset completion-tst".

mmabrouk · 2026-05-28T20:36:07Z

Thanks @ardaerzin

coderabbitai Bot reviewed May 28, 2026

View reviewed changes

Comment thread web/oss/src/components/EvalRunDetails/etl/useEtlColumns.tsx Outdated

ardaerzin marked this pull request as ready for review May 28, 2026 16:33

dosubot Bot added size:M This PR changes 30-99 lines, ignoring generated files. Frontend labels May 28, 2026

vercel Bot deployed to Preview May 28, 2026 17:20 View deployment

mmabrouk reviewed May 28, 2026

View reviewed changes

dosubot Bot added size:XS This PR changes 0-9 lines, ignoring generated files. and removed size:M This PR changes 30-99 lines, ignoring generated files. labels May 28, 2026

vercel Bot deployed to Preview May 28, 2026 20:03 View deployment

Revert "revert(frontend): unhide string-type evaluator columns and fi…

79f7afd

…lter fields" This reverts commit 0a50ab3.

dosubot Bot added size:M This PR changes 30-99 lines, ignoring generated files. and removed size:XS This PR changes 0-9 lines, ignoring generated files. labels May 28, 2026

vercel Bot deployed to Preview May 28, 2026 20:07 View deployment

vercel Bot deployed to Preview May 28, 2026 20:19 View deployment

dosubot Bot added size:L This PR changes 100-499 lines, ignoring generated files. and removed size:M This PR changes 30-99 lines, ignoring generated files. labels May 28, 2026

vercel Bot deployed to Preview May 28, 2026 20:24 View deployment

Merge branch 'main' into fe-fix/eval-scenarios-table-metric-columns-fix

92c8309

ardaerzin requested a review from mmabrouk May 28, 2026 20:25

vercel Bot deployed to Preview May 28, 2026 20:26 View deployment

mmabrouk changed the base branch from main to release/v0.100.6 May 28, 2026 20:30

vercel Bot deployed to Preview May 28, 2026 20:33 View deployment

vercel Bot deployed to Preview May 28, 2026 20:35 View deployment

mmabrouk merged commit b8f48ac into release/v0.100.6 May 28, 2026
20 of 21 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FE Fix] hide string-type evaluator metrics from scenario table & filters#4477

[FE Fix] hide string-type evaluator metrics from scenario table & filters#4477
mmabrouk merged 9 commits into
release/v0.100.6from
fe-fix/eval-scenarios-table-metric-columns-fix

ardaerzin commented May 28, 2026

Uh oh!

vercel Bot commented May 28, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented May 28, 2026 •

edited

Loading

Review skipped

Walkthrough

Changes

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

github-actions Bot commented May 28, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented May 28, 2026

Uh oh!

mmabrouk left a comment

Uh oh!

coderabbitai Bot commented May 28, 2026

Uh oh!

ardaerzin commented May 28, 2026

Uh oh!

coderabbitai Bot commented May 28, 2026

Uh oh!

Uh oh!

mmabrouk commented May 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ardaerzin commented May 28, 2026

Summary

Testing

Verified locally

QA follow-up

Checklist

Contributor Resources

Uh oh!

vercel Bot commented May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai Bot commented May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Walkthrough

Changes

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions Bot commented May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Railway Preview Environment

Uh oh!

coderabbitai Bot commented May 28, 2026

Uh oh!

mmabrouk left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot commented May 28, 2026

Uh oh!

ardaerzin commented May 28, 2026

Uh oh!

coderabbitai Bot commented May 28, 2026

Uh oh!

Uh oh!

mmabrouk commented May 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

vercel Bot commented May 28, 2026 •

edited

Loading

coderabbitai Bot commented May 28, 2026 •

edited

Loading

github-actions Bot commented May 28, 2026 •

edited

Loading