feat(frontend): Run-on mode selector for the evaluator playground by mmabrouk · Pull Request #4553 · Agenta-AI/agenta

mmabrouk · 2026-06-05T08:44:57Z

Why

The evaluator (LLM-as-a-judge) playground's empty/first state doesn't explain itself. The header offers two disconnected loaders — an app picker and a test-set picker — with no indication of how they relate or what you're supposed to do first. New users open it and stall: do I load a test set? an app? both? And the relationship ("the app runs, then the evaluator grades its output") is invisible.

What

Adds a single Run on control to the evaluator playground header that names the data source and draws the resulting data-flow. Three modes:

Run directly on a test case — Data → Evaluator → Score
Run on an app output (default) — Data → App → Output → Evaluator → Score
Run on a trace — Trace → Evaluator → Score (disabled for now)

Notes on behavior:

I made sure that the default behavior is "Run on an app output.", where the we show a centered "Select an app" empty state. The evaluator can't run until you pick one.

Notes on the implementation

@ardaerzin The mode is persisted per project. This however imo not the best behaviorI considered keying the mode by evaluator id instead, but the app and test-set selections are themselves per project, so moving only the mode would make the three inconsistent. The right behavior is probably to scope all three per app/evaluator — might be worth a follow-up PR to fix.
On known issue is that when the user selects from test set, we show inputs and outputs as Form. However you cannot fill the inputs as Form, you need to switch to JSON first. I did not want to touch the defaults there since I was afraid of breaking something else. Maybe we would want to change the default to JSON always if the data is empty? wdyt @ardaerzin

Scope

Evaluator playground only. The prompt/completion playground is intentionally untouched.

Demo (with sound)

CleanShot.2026-06-05.at.11.06.47.mp4

vercel · 2026-06-05T08:45:03Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
agenta-documentation	Ready	Preview, Comment	Jun 5, 2026 9:05am

coderabbitai · 2026-06-05T08:45:05Z

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately summarizes the main change: introducing a 'Run-on mode selector' feature for the evaluator playground header, which is the primary focus of the changeset.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Description check	✅ Passed	The PR description clearly explains the motivation (unclear first state), what is being added (Run on control with three modes), behavioral notes, implementation notes, and scope limits.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fe-feat/evaluator-run-on-mode

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: c20e40d3-e678-4106-ae9f-f8b1f487c6f8

📥 Commits

Reviewing files that changed from the base of the PR and between de548da and b2797ea.

📒 Files selected for processing (6)

web/oss/src/components/Evaluators/components/ConfigureEvaluator/EvaluatorPlaygroundHeader.tsx
web/oss/src/components/Evaluators/components/ConfigureEvaluator/RunOnSelector.tsx
web/oss/src/components/Evaluators/components/ConfigureEvaluator/SelectAppEmptyState.tsx
web/oss/src/components/Evaluators/components/ConfigureEvaluator/atoms.ts
web/oss/src/components/Evaluators/components/ConfigureEvaluator/index.tsx
web/oss/src/components/pages/evaluations/NewEvaluation/Components/CreateEvaluatorDrawer/index.tsx

Adds a 'Run on' control to the evaluator (LLM-as-a-judge) playground header so the first/empty state explains itself instead of leaving the user with two disconnected loaders. Three modes, each drawing its own data-flow: - Run directly on a test case (Data -> Evaluator -> Score) - Run on an app output (Data -> App -> Output -> Evaluator -> Score) - default - Run on a trace (Trace -> Evaluator -> Score) - disabled for now The mode is persisted per project; a connected app forces effective 'app' mode. In app mode with no app connected, the run panel hides the testcases and shows a centered 'Select an app' empty state (shared with the evaluator-creation drawer). All colors come from the antd theme token so it follows light/dark mode. Prompt playground is intentionally untouched.

github-actions · 2026-06-05T09:11:08Z

Railway Preview Environment


Status	Destroyed (PR closed)

Updated at 2026-06-05T11:42:28.018Z

dosubot Bot added the size:XL This PR changes 500-999 lines, ignoring generated files. label Jun 5, 2026

dosubot Bot added Feature Request New feature or request Frontend labels Jun 5, 2026

vercel Bot deployed to Preview June 5, 2026 08:45 View deployment

coderabbitai Bot reviewed Jun 5, 2026

View reviewed changes

Comment thread web/oss/src/components/Evaluators/components/ConfigureEvaluator/RunOnSelector.tsx

mmabrouk force-pushed the fe-feat/evaluator-run-on-mode branch from b2797ea to e67b2f0 Compare June 5, 2026 09:01

vercel Bot deployed to Preview June 5, 2026 09:03 View deployment

mmabrouk force-pushed the fe-feat/evaluator-run-on-mode branch from e67b2f0 to f0d60d1 Compare June 5, 2026 09:04

vercel Bot deployed to Preview June 5, 2026 09:05 View deployment

mmabrouk requested a review from ardaerzin June 5, 2026 09:09

ardaerzin approved these changes Jun 5, 2026

View reviewed changes

mmabrouk merged commit 72c020f into fe-fix/app-workflow-router-unification-regression-fix Jun 5, 2026
42 of 46 checks passed

mmabrouk mentioned this pull request Jun 5, 2026

feat(frontend): Run-on modes in the evaluator creation drawer (shared controls) #4557

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(frontend): Run-on mode selector for the evaluator playground#4553

feat(frontend): Run-on mode selector for the evaluator playground#4553
mmabrouk merged 1 commit into
fe-fix/app-workflow-router-unification-regression-fixfrom
fe-feat/evaluator-run-on-mode

mmabrouk commented Jun 5, 2026 •

edited

Loading

Uh oh!

vercel Bot commented Jun 5, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented Jun 5, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

github-actions Bot commented Jun 5, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

mmabrouk commented Jun 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why

What

Notes on behavior:

Notes on the implementation

Scope

Demo (with sound)

Uh oh!

vercel Bot commented Jun 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai Bot commented Jun 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions Bot commented Jun 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Railway Preview Environment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mmabrouk commented Jun 5, 2026 •

edited

Loading

vercel Bot commented Jun 5, 2026 •

edited

Loading

coderabbitai Bot commented Jun 5, 2026 •

edited

Loading

github-actions Bot commented Jun 5, 2026 •

edited

Loading