Skip to content

feat(frontend): Run-on mode selector for the evaluator playground#4553

Merged
mmabrouk merged 1 commit into
fe-fix/app-workflow-router-unification-regression-fixfrom
fe-feat/evaluator-run-on-mode
Jun 5, 2026
Merged

feat(frontend): Run-on mode selector for the evaluator playground#4553
mmabrouk merged 1 commit into
fe-fix/app-workflow-router-unification-regression-fixfrom
fe-feat/evaluator-run-on-mode

Conversation

@mmabrouk
Copy link
Copy Markdown
Member

@mmabrouk mmabrouk commented Jun 5, 2026

Why

The evaluator (LLM-as-a-judge) playground's empty/first state doesn't explain itself. The header offers two disconnected loaders — an app picker and a test-set picker — with no indication of how they relate or what you're supposed to do first. New users open it and stall: do I load a test set? an app? both? And the relationship ("the app runs, then the evaluator grades its output") is invisible.

What

Adds a single Run on control to the evaluator playground header that names the data source and draws the resulting data-flow. Three modes:

  • Run directly on a test caseData → Evaluator → Score
  • Run on an app output (default)Data → App → Output → Evaluator → Score
  • Run on a traceTrace → Evaluator → Score (disabled for now)

Notes on behavior:

  1. I made sure that the default behavior is "Run on an app output.", where the we show a centered "Select an app" empty state. The evaluator can't run until you pick one.

Notes on the implementation

  • @ardaerzin The mode is persisted per project. This however imo not the best behaviorI considered keying the mode by evaluator id instead, but the app and test-set selections are themselves per project, so moving only the mode would make the three inconsistent. The right behavior is probably to scope all three per app/evaluator — might be worth a follow-up PR to fix.
  • On known issue is that when the user selects from test set, we show inputs and outputs as Form. However you cannot fill the inputs as Form, you need to switch to JSON first. I did not want to touch the defaults there since I was afraid of breaking something else. Maybe we would want to change the default to JSON always if the data is empty? wdyt @ardaerzin

Scope

  • Evaluator playground only. The prompt/completion playground is intentionally untouched.

Demo (with sound)

CleanShot.2026-06-05.at.11.06.47.mp4

@vercel
Copy link
Copy Markdown

vercel Bot commented Jun 5, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
agenta-documentation Ready Ready Preview, Comment Jun 5, 2026 9:05am

Request Review

@dosubot dosubot Bot added the size:XL This PR changes 500-999 lines, ignoring generated files. label Jun 5, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jun 5, 2026

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main change: introducing a 'Run-on mode selector' feature for the evaluator playground header, which is the primary focus of the changeset.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description check ✅ Passed The PR description clearly explains the motivation (unclear first state), what is being added (Run on control with three modes), behavioral notes, implementation notes, and scope limits.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fe-feat/evaluator-run-on-mode

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1


ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: c20e40d3-e678-4106-ae9f-f8b1f487c6f8

📥 Commits

Reviewing files that changed from the base of the PR and between de548da and b2797ea.

📒 Files selected for processing (6)
  • web/oss/src/components/Evaluators/components/ConfigureEvaluator/EvaluatorPlaygroundHeader.tsx
  • web/oss/src/components/Evaluators/components/ConfigureEvaluator/RunOnSelector.tsx
  • web/oss/src/components/Evaluators/components/ConfigureEvaluator/SelectAppEmptyState.tsx
  • web/oss/src/components/Evaluators/components/ConfigureEvaluator/atoms.ts
  • web/oss/src/components/Evaluators/components/ConfigureEvaluator/index.tsx
  • web/oss/src/components/pages/evaluations/NewEvaluation/Components/CreateEvaluatorDrawer/index.tsx

Adds a 'Run on' control to the evaluator (LLM-as-a-judge) playground header
so the first/empty state explains itself instead of leaving the user with two
disconnected loaders. Three modes, each drawing its own data-flow:

- Run directly on a test case  (Data -> Evaluator -> Score)
- Run on an app output         (Data -> App -> Output -> Evaluator -> Score) - default
- Run on a trace               (Trace -> Evaluator -> Score) - disabled for now

The mode is persisted per project; a connected app forces effective 'app' mode.
In app mode with no app connected, the run panel hides the testcases and shows a
centered 'Select an app' empty state (shared with the evaluator-creation drawer).
All colors come from the antd theme token so it follows light/dark mode.

Prompt playground is intentionally untouched.
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 5, 2026

Railway Preview Environment

Status Destroyed (PR closed)

Updated at 2026-06-05T11:42:28.018Z

@mmabrouk mmabrouk merged commit 72c020f into fe-fix/app-workflow-router-unification-regression-fix Jun 5, 2026
42 of 46 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Feature Request New feature or request Frontend size:XL This PR changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants