[FE Fix] hide string-type evaluator metrics from scenario table & filters#4477
Conversation
… and filter
String-typed evaluator outputs (e.g. an LLM-judge `reasoning` field) resolve
through the metric layer as `{type: "string", count, ...}` stats blobs.
`unwrapStatsForCompare` only unwraps `binary` and `numeric` variants, so the
raw stats object falls through to `JSON.stringify` in the cell renderer and is
shown as `{"type":"string","count":...}` in every row. The same blob makes
filter predicates impossible — equality checks would compare a user string
against a stats object and never match.
Drop string-typed leaves at column-build time in `useEtlColumns` and from the
filter dropdown in `ScenarioFilterBar` by reusing the existing
`buildColumnValueTypeResolver` (the metricType-from-evaluator-schema lookup
that already powers the filter bar's value-type detection). The focus drawer
is unaffected — it reads `columnResult.groups/columns` directly via
`useFocusDrawerSections`, bypassing both code paths, so string metrics still
appear there.
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Plus Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
📝 WalkthroughWalkthroughString-typed evaluator outputs in ETL evaluation tables are now detected as backend placeholders and resolved from trace annotations; EtlResolvedCell materializes all referenced traces and ScenarioFilterBar excludes string-typed evaluator fields from the filter UI. ChangesString-type evaluator field handling
🎯 3 (Moderate) | ⏱️ ~20 minutes 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: 5ca5e55f-0f27-4e12-a419-219b62617f7e
📒 Files selected for processing (3)
web/oss/src/components/EvalRunDetails/Table.tsxweb/oss/src/components/EvalRunDetails/etl/ScenarioFilterBar.tsxweb/oss/src/components/EvalRunDetails/etl/useEtlColumns.tsx
Railway Preview Environment
Updated at 2026-05-28T20:36:18.635Z |
`buildColumnValueTypeResolver` falls back to a column-name-only lookup, so without a kind guard a same-named testset / application / metrics column could inherit an evaluator output's string `metricType` and be incorrectly hidden from the scenario table and filter dropdown. Restrict the string-suppression check to `evaluator` leaves in both `useEtlColumns` and `ScenarioFilterBar`.
|
Actionable comments posted: 0 |
mmabrouk
left a comment
There was a problem hiding this comment.
Thanks @ardaerzin
The fix is not correct from a product level. We should not hide the evaluator reasoning column from the table, just show the string like we used to.
I am working on the other fixes, I will merge as is if no update until then, we can fix it later
Per product feedback: hiding string-type evaluator outputs from the scenario table is the wrong move. The column should stay visible and render the actual string (rendering fix lands separately). The filter dropdown should keep these fields selectable for the same reason. Reverts the table-column filter from `useEtlColumns` and the filter-dropdown filter from `ScenarioFilterBar`, restoring both files to their main-branch state. Drawer behavior is unchanged either way (it reads `columnResult` directly).
…lter fields" This reverts commit 0a50ab3.
…able
String-typed evaluator outputs (e.g. an LLM-judge's `reasoning` field) only
land in the metric layer as a `{type: "string", count: N}` placeholder —
the backend can't build a distribution over arbitrary text, so it only
records that *some* string was emitted. The actual string lives on the
annotation trace; the focus drawer already resolves it from there. The
scenario table was hitting the placeholder via `resolveFromMetric` first,
short-circuiting the composed `resolveFromTrace` fallback, and rendering
the raw stats blob via `JSON.stringify`.
Three coordinated changes:
- `resolveFromMetric` (agenta-entities/evaluationRun/etl) detects the
bare `{type: "string", count}` placeholder shape and returns `null` so
the composed `resolveFromTrace` resolver picks the real string out of
the annotation trace. Distribution-bearing `{type: "string", freq: …}`
shapes are returned as-is (they carry real data). Covered by two new
unit tests in `resolveMappings.test.ts`.
- `EtlResolvedCell.SLICES_BY_KIND.evaluator` now includes `"traces"` so
the annotation trace gets materialized for evaluator cells. Both the
cell-level materializer request and the slice-loading check now
iterate every result's `trace_id`, not just the first — a scenario
carries multiple traces (invocation, annotation, …) and the
annotation isn't always result[0].
- Reverts the column-build hiding in `useEtlColumns` + the
`columnResult` plumbing in `Table.tsx` introduced earlier. The column
stays visible and renders the actual string now that the resolver
chain works correctly.
Filter dropdown hiding in `ScenarioFilterBar` is intentionally kept —
filtering on free-form LLM-generated reasoning isn't a useful product
operation.
|
Actionable comments posted: 0 |
…uncation Two coordinated fixes so long strings (e.g. an evaluator's `reasoning`) fill the cell across multiple lines and truncate cleanly at the bottom: - Drop `ellipsis: true` from the `useEtlColumns` column spec. Antd applies `white-space: nowrap` when it is set, forcing all cell content onto a single line and overriding the `EtlResolvedCell` body's `-webkit-line-clamp`. The column header has its own ellipsis-ing inside the `Tooltip` span, so the header is unaffected. - Re-tune `MAX_LINES_BY_HEIGHT` in `EtlResolvedCell` to match what actually fits inside `.scenario-table-cell` at each row-height variant (3 / 6 / 12 lines, down from 4 / 9 / 18). With the previous values the clamp point sat past the parent's `overflow: hidden` cut, so the ellipsis was never visible. Matching the line count to the visible area places the ellipsis on the last fully-rendered line.
done
|
|
Actionable comments posted: 0 |
`computeColumnGroup`'s default label for an evaluator group is `slugToTitle(evaluatorSlug)`, which leaks the slug shape into the header (e.g. "with-reasoning-jifn" rendered as "With Reasoning Jifn") whenever a real evaluator name is known. The testset and application headers were already resolving real entity names via reference query atoms; the evaluator branch fell through to the slug-derived fallback. `EtlColumnHeader` now also subscribes to `evaluationEvaluatorsByRunQueryAtomFamily(runId)` and, for evaluator-kind groups, matches by `refs.evaluator.id` first and then by slug (`refs.evaluator.slug` / `refs.evaluator_revision.slug`), surfacing the evaluator's real `name`. Falls back to the existing slug-titled label when no match is found (run not loaded yet, ad-hoc evaluator with no definition, etc). `useEtlColumns` now threads `runId` into each header.
Headers now read as self-describing pairs: "Testset: completion_testset" "Application: comp-1" "Evaluator: With Reasoning" Each kind branch builds the label from `Kind: ` + the resolved entity name. The slug-derived fallback for testset / application reads from `group.slug` (not `group.label`) — the default `computeColumnGroup` label already embeds the kind word, so reusing it here would render "Testset: Testset completion-tst".
|
Thanks @ardaerzin |

Summary
String-typed evaluator outputs (e.g. an LLM-judge
reasoningfield) resolve through the metric layer as{type: "string", count, ...}stats blobs.unwrapStatsForCompareonly unwrapsbinaryandnumericvariants, so the raw stats object falls through toJSON.stringifyin the cell renderer and is shown as{"type":"string","count":...}in every row. The same blob makes filter predicates impossible — equality checks would compare a user string against a stats object and never match.Drop string-typed leaves at column-build time in
useEtlColumnsand from the filter dropdown inScenarioFilterBarby reusing the existingbuildColumnValueTypeResolver(the metricType-from-evaluator-schema lookup that already powers the filter bar's value-type detection). The focus drawer is unaffected — it readscolumnResult.groups/columnsdirectly viauseFocusDrawerSections, bypassing both code paths, so string metrics still appear there.Testing
Verified locally
reasoning column is not shown
QA follow-up
Checklist
Contributor Resources