feat(frontend): Run-on mode selector for the evaluator playground#4553
Merged
mmabrouk merged 1 commit intoJun 5, 2026
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: c20e40d3-e678-4106-ae9f-f8b1f487c6f8
📒 Files selected for processing (6)
web/oss/src/components/Evaluators/components/ConfigureEvaluator/EvaluatorPlaygroundHeader.tsxweb/oss/src/components/Evaluators/components/ConfigureEvaluator/RunOnSelector.tsxweb/oss/src/components/Evaluators/components/ConfigureEvaluator/SelectAppEmptyState.tsxweb/oss/src/components/Evaluators/components/ConfigureEvaluator/atoms.tsweb/oss/src/components/Evaluators/components/ConfigureEvaluator/index.tsxweb/oss/src/components/pages/evaluations/NewEvaluation/Components/CreateEvaluatorDrawer/index.tsx
b2797ea to
e67b2f0
Compare
Adds a 'Run on' control to the evaluator (LLM-as-a-judge) playground header so the first/empty state explains itself instead of leaving the user with two disconnected loaders. Three modes, each drawing its own data-flow: - Run directly on a test case (Data -> Evaluator -> Score) - Run on an app output (Data -> App -> Output -> Evaluator -> Score) - default - Run on a trace (Trace -> Evaluator -> Score) - disabled for now The mode is persisted per project; a connected app forces effective 'app' mode. In app mode with no app connected, the run panel hides the testcases and shows a centered 'Select an app' empty state (shared with the evaluator-creation drawer). All colors come from the antd theme token so it follows light/dark mode. Prompt playground is intentionally untouched.
e67b2f0 to
f0d60d1
Compare
Contributor
Railway Preview Environment
Updated at 2026-06-05T11:42:28.018Z |
ardaerzin
approved these changes
Jun 5, 2026
72c020f
into
fe-fix/app-workflow-router-unification-regression-fix
42 of 46 checks passed
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why
The evaluator (LLM-as-a-judge) playground's empty/first state doesn't explain itself. The header offers two disconnected loaders — an app picker and a test-set picker — with no indication of how they relate or what you're supposed to do first. New users open it and stall: do I load a test set? an app? both? And the relationship ("the app runs, then the evaluator grades its output") is invisible.
What
Adds a single Run on control to the evaluator playground header that names the data source and draws the resulting data-flow. Three modes:
Data → Evaluator → ScoreData → App → Output → Evaluator → ScoreTrace → Evaluator → Score(disabled for now)Notes on behavior:
Notes on the implementation
Scope
Demo (with sound)
CleanShot.2026-06-05.at.11.06.47.mp4