Skip to content

fix(frontend): re-enable full-page playground for evaluator workflows#4474

Open
ardaerzin wants to merge 5 commits into
release/v0.100.7from
fe-fix/app-workflow-router-unification-regression-fix
Open

fix(frontend): re-enable full-page playground for evaluator workflows#4474
ardaerzin wants to merge 5 commits into
release/v0.100.7from
fe-fix/app-workflow-router-unification-regression-fix

Conversation

@ardaerzin
Copy link
Copy Markdown
Contributor

Summary

PR #4384 disabled EVALUATOR_FULL_PAGE_NAV_ENABLED because the app-style playground was a regression for evaluators (lost the upstream-app connection) and app-scoped observability defaulted to "invocation" instead of "annotation" for evaluator workflows. This change addresses both blockers and re-enables the flow by default.

Playground

  • added app chaining for evaluator workflows
  • minor ui fixes

Observability

  • fixed and improved filtering for evaluator workflows

QA follow-up

  • full app pages router tests for evaluator workflows, and checking against reasons why we disabled this feature after its initial release

Demo

Checklist

  • I have included a video or screen recording for UI changes, or marked Demo as N/A
  • Relevant tests pass locally
  • Relevant linting and formatting pass locally
  • I have signed the CLA, or I will sign it when the bot prompts me

Contributor Resources

PR #4384 disabled EVALUATOR_FULL_PAGE_NAV_ENABLED because the app-style
playground was a regression for evaluators (lost the upstream-app
connection) and app-scoped observability defaulted to "invocation"
instead of "annotation" for evaluator workflows. This change addresses
both blockers and re-enables the flow by default.

Playground
- ConfigureEvaluatorPage: upstream app workflow can be connected via
  EntityPicker (skip-variant adapter, filtered to non-evaluator
  non-feedback workflows). Disconnect affordance on the picker
  trigger and as a popup footer.
- Standalone evaluator runs no longer require an upstream app
  (TestsetDropdown is always available; runDisabled gate removed).
- Playground chain traces now write evaluator references
  (evaluator / evaluator_variant / evaluator_revision slots) so the
  per-evaluator observability page can find them. EntityPicker
  search bar respects a new parentLabel option so app pickers no
  longer show "Search evaluator..."

Observability filters
- Per-workflow-kind trace_type default extracted into
  @agenta/entities (defaultTraceTypeForWorkflow): annotation for
  evaluators, invocation otherwise. Pure helper unit-tested with
  vitest.
- References scope filter adapts to the effective trace_type:
  evaluators with trace_type=annotation pin to references.evaluator,
  invocation pins to references.application, and "no trace_type"
  ORs across both slots so all traces mentioning the evaluator
  surface.
- Dialog reconciliation: live label flip while editing trace_type
  in the filter dialog ("Application ID" / "Evaluator ID") via an
  opt-in reconcileFilterRows callback on Filters; observability
  page provides an evaluator-workflow-aware reconciler.
- Filter persistence across reloads: per-app via atomWithStorage
  under "agenta:observability:filters", with __global__ fallback
  for project-level pages. Both userFilters and traceTypeChoice
  share one packed storage atom.
- Cleaner state machine for trace_type intent: tagged union
  (default / value / cleared) replaces the dual-atom dance that
  could silently revert.
- application_id URL param dropped for evaluator workflows; the
  query is gated on workflow context being settled to avoid
  firing with the wrong scope.

Tests
- vitest unit tests for defaultTraceTypeForWorkflow.
- Playwright acceptance for full-page playground: post-create
  nav, row click for LLM and declarative evaluators, direct URL,
  sidebar switcher; fixes the previously broken
  select-app-and-run test for the new flow.
@vercel
Copy link
Copy Markdown

vercel Bot commented May 28, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
agenta-documentation Ready Ready Preview, Comment May 29, 2026 12:48pm

Request Review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 28, 2026

Review Change Stack

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: b479207d-0410-4d83-af93-b4dfc2944ce8

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

This PR enables Phase 5 evaluator full-page playground navigation. It flips EVALUATOR_FULL_PAGE_NAV_ENABLED to true and updates routing, state management, and observability filter behavior to support evaluators as first-class playground entities with full-page rendering, app connection controls, and workflow-aware trace type defaults.

Changes

Evaluator Full-Page Navigation & Observability Integration

Layer / File(s) Summary
Trace type defaults helper and workflow schema extensions
web/packages/agenta-entities/src/workflow/core/traceTypeDefault.ts, web/packages/agenta-entities/src/workflow/core/schema.ts, web/packages/agenta-entities/src/workflow/core/index.ts, web/packages/agenta-entities/src/workflow/index.ts, web/packages/agenta-entities/tests/unit/traceTypeDefault.test.ts
New module defines soft-default trace_type behavior ("annotation" for evaluator/traces, "invocation" for app workflows, null otherwise); schema docs clarify workflow_slug/workflow_variant_slug; exports and tests added.
Feature flag and router navigation
web/oss/src/state/workflow/flags.ts, web/oss/src/components/PlaygroundRouter/index.tsx
EVALUATOR_FULL_PAGE_NAV_ENABLED flips to true and PlaygroundRouter conditionally renders ConfigureEvaluatorPage when workflowKind is "evaluator" (excluding feedback evaluators).
App connection state management
web/oss/src/components/Evaluators/components/ConfigureEvaluator/atoms.ts
selectedAppLabelAtom becomes derived from node graph depth; connectAppToEvaluatorAtom persists only after graph mutations succeed; new disconnectAppFromEvaluatorAtom clears selection and removes downstream node.
Evaluator header UI with app controls
web/oss/src/components/Evaluators/components/ConfigureEvaluator/EvaluatorPlaygroundHeader.tsx
Adds app disconnect button (in popover footer and as icon), manages disconnect callback, and renders TestsetDropdown unconditionally with updated rationale.
ConfigureEvaluatorPage for full-page mode
web/oss/src/components/Evaluators/components/ConfigureEvaluator/index.tsx
Removes run-disabled gating and inline app-picker prompt; wires handleAppSelect for app connection; sets parentLabel: "Application" on workflow adapter.
Evaluators registry row-click simplified
web/oss/src/components/Evaluators/index.tsx
Removes hasFullPagePlaygroundUX predicate; routes non-archived evaluators directly to full-page when flag enabled and workflowId present.
Drawer navigation and post-create flow
web/oss/src/components/WorkflowRevisionDrawerWrapper/index.tsx, web/oss/src/components/pages/evaluations/NewEvaluation/Components/CreateEvaluatorDrawer/index.tsx
Restores persisted app via connectApp (not direct label writes); simplifies post-create eligibility to flag + presence checks; adds parentLabel: "Application" to drawer configs.
Sidebar evaluator switcher gating
web/oss/src/components/Sidebar/components/WorkflowEntityCard.tsx
Uses nonArchivedEvaluatorsAtom directly gated by feature flag instead of removed fullPagePlaygroundEvaluatorsAtom.
Trace type state persistence and derivation
web/oss/src/state/newObservability/atoms/controls.ts
Introduces TraceTypeChoice (default/value/cleared) and effectiveTraceTypeAtomFamily derived from stored choice plus workflow defaults; persists per-app/per-tab in filtersByAppAtom.
Filter regeneration and scope composition
web/oss/src/state/newObservability/atoms/controls.ts, web/oss/src/state/newObservability/atoms/queries.ts
filtersAtomFamily regenerates permanent scope filter (with evaluator-specific reference mapping), appends derived trace_type row, then user filters; tracesQueryAtom uses effectiveAppId and blocks while resolving.
Filter UI reconciliation for evaluators
web/oss/src/components/Filters/Filters.tsx, web/oss/src/components/Filters/types.d.ts, web/oss/src/components/pages/observability/assets/filters/fieldAdapter.ts, web/oss/src/components/pages/observability/components/ObservabilityHeader/index.tsx
Filters.tsx adds reconcileFilterRows prop for display-only projection; field adapter adds referenceCategory and de-duplicates values; ObservabilityHeader implements reconciler to remap reference categories based on derived trace_type for evaluator workflows.
Workflow adapter labeling
web/packages/agenta-entity-ui/src/selection/adapters/workflowRevisionRelationAdapter.ts, web/oss/src/components/Playground/Components/PlaygroundVariantConfig/assets/PlaygroundVariantConfigHeader.tsx
CreateWorkflowRevisionAdapterOptions accepts parentLabel (defaults "Evaluator"); applied as "Application" in evaluator contexts to customize UI labels and messages.
Evaluator trace reference construction
web/packages/agenta-playground/src/state/execution/executionRunner.ts
Adds buildEvaluatorSelfReferences helper to construct references.evaluator* fields; merges with upstream references for non-root execution stages.
Playwright test coverage
web/oss/tests/playwright/acceptance/evaluators/tests.ts, web/oss/tests/playwright/acceptance/evaluators/index.ts
Exports new test ID constants and LLM-as-a-judge template name; rewrites playground test for full-page flow; adds comprehensive acceptance tests for post-create/row-click navigation, direct URLs, and sidebar switcher.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

  • Agenta-AI/agenta#4384: Directly continues this PR's predecessor by flipping EVALUATOR_FULL_PAGE_NAV_ENABLED to true and implementing gated routing/navigation changes across PlaygroundRouter, EvaluatorsRegistry, WorkflowRevisionDrawerWrapper, and sidebar evaluator switcher.
  • Agenta-AI/agenta#4274: Touches evaluator/create flow wiring near WorkflowRevisionDrawerWrapper; changes are related to persisted app selection and drawer commit handling.
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 33.33% which is insufficient. The required threshold is 60.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The PR title clearly and concisely describes the main change: re-enabling the full-page playground for evaluator workflows, which is the primary objective stated in the PR description and embodied across multiple file changes.
Description check ✅ Passed The PR description is well-structured and directly related to the changeset, explaining the rationale (fixing regressions from PR #4384), the changes made (app chaining, UI fixes, improved filtering), and QA follow-up items.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fe-fix/app-workflow-router-unification-regression-fix

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@ardaerzin ardaerzin marked this pull request as ready for review May 28, 2026 11:20
@dosubot dosubot Bot added size:XXL This PR changes 1000+ lines, ignoring generated files. Frontend labels May 28, 2026
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
web/oss/src/components/Evaluators/components/ConfigureEvaluator/atoms.ts (1)

165-178: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Persisted app selection can get stale on failed connect/disconnect edge paths.

persistedAppSelectionAtom is written before the primary-node swap succeeds, and disconnect exits early without clearing persisted state when no downstream node is found. That can rehydrate an app selection that is no longer actually connected.

Proposed fix
 export const connectAppToEvaluatorAtom = atom(
@@
-        // Persist across sessions. The picker display label is derived from
-        // the depth-0 node's `label` via `selectedAppLabelAtom`, so no extra
-        // write needed here.
-        set(persistedAppSelectionAtom, {appRevisionId, appLabel})
-
         // Replace primary node with app
         const nodeId = set(playgroundController.actions.changePrimaryNode, {
             type: "workflow",
             id: appRevisionId,
             label: appLabel,
         })
 
         if (!nodeId) return
+        // Persist only after graph mutation succeeds.
+        set(persistedAppSelectionAtom, {appRevisionId, appLabel})
@@
 export const disconnectAppFromEvaluatorAtom = atom(null, (get, set) => {
     const nodes = get(playgroundController.selectors.nodes())
     const downstreamEvaluator = nodes.find((n) => n.depth > 0)
-    if (!downstreamEvaluator) return
+    if (!downstreamEvaluator) {
+        set(persistedAppSelectionAtom, null)
+        return
+    }

Also applies to: 208-225


ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: ce60569f-f33c-480b-a472-4ceb822d0b1e

📥 Commits

Reviewing files that changed from the base of the PR and between 0b9012d and 048d662.

📒 Files selected for processing (25)
  • web/oss/src/components/Evaluators/components/ConfigureEvaluator/EvaluatorPlaygroundHeader.tsx
  • web/oss/src/components/Evaluators/components/ConfigureEvaluator/atoms.ts
  • web/oss/src/components/Evaluators/components/ConfigureEvaluator/index.tsx
  • web/oss/src/components/Evaluators/index.tsx
  • web/oss/src/components/Filters/Filters.tsx
  • web/oss/src/components/Filters/types.d.ts
  • web/oss/src/components/Playground/Components/PlaygroundVariantConfig/assets/PlaygroundVariantConfigHeader.tsx
  • web/oss/src/components/PlaygroundRouter/index.tsx
  • web/oss/src/components/Sidebar/components/WorkflowEntityCard.tsx
  • web/oss/src/components/WorkflowRevisionDrawerWrapper/index.tsx
  • web/oss/src/components/pages/evaluations/NewEvaluation/Components/CreateEvaluatorDrawer/index.tsx
  • web/oss/src/components/pages/observability/assets/filters/fieldAdapter.ts
  • web/oss/src/components/pages/observability/components/ObservabilityHeader/index.tsx
  • web/oss/src/state/newObservability/atoms/controls.ts
  • web/oss/src/state/newObservability/atoms/queries.ts
  • web/oss/src/state/workflow/flags.ts
  • web/oss/tests/playwright/acceptance/evaluators/index.ts
  • web/oss/tests/playwright/acceptance/evaluators/tests.ts
  • web/packages/agenta-entities/src/workflow/core/index.ts
  • web/packages/agenta-entities/src/workflow/core/schema.ts
  • web/packages/agenta-entities/src/workflow/core/traceTypeDefault.ts
  • web/packages/agenta-entities/src/workflow/index.ts
  • web/packages/agenta-entities/tests/unit/traceTypeDefault.test.ts
  • web/packages/agenta-entity-ui/src/selection/adapters/workflowRevisionRelationAdapter.ts
  • web/packages/agenta-playground/src/state/execution/executionRunner.ts

Comment thread web/oss/src/components/PlaygroundRouter/index.tsx Outdated
Comment thread web/oss/src/components/Sidebar/components/WorkflowEntityCard.tsx Outdated
Comment thread web/oss/src/state/newObservability/atoms/controls.ts
Comment thread web/oss/tests/playwright/acceptance/evaluators/index.ts Outdated
CodeRabbit flagged 5 issues on the evaluator-full-page rollout PR.
This commit addresses each:

1. PlaygroundRouter — `is_feedback` evaluators skip the full-page swap.
   `workflowKind === "evaluator"` was too broad. Human/feedback
   evaluators are drawer-only in /evaluators (they capture human input,
   they don't run), so routing them to ConfigureEvaluatorPage produced
   a run-controls UI for a workflow with nothing to run. Added a
   `flags.is_feedback` exclusion next to the workflowKind check.

2. Sidebar — switcher filters out `is_feedback` evaluators.
   `nonArchivedEvaluatorsAtom` only filters by `deleted_at` and
   includes human evaluators; the switcher was exposing entries that,
   when clicked, would land on the (now-correctly-gated) generic
   <Playground /> for a feedback workflow. Filtered the list at the
   switcher boundary.

3. controls.ts — handle array-valued `trace_type` for in/not_in.
   The dialog dispatches `{operator: "in", value: ["annotation"]}` for
   the IN operator family, but the intent setter only normalized
   scalars — so the user's choice was silently dropped to
   `{kind: "cleared"}`. Normalize to an array, filter to enum values,
   and collapse single-value arrays back to a scalar. Multi-value
   selections (which mean "no filter" for a 2-value enum) still map
   to `cleared`.

4. Playwright — drop stale `[data-row-key]` poll in select-app-and-run.
   The test asserted post-create navigation to /apps/<id>/playground
   AFTER polling for the new row in the evaluators table — but the
   redirect wins first, the table disappears, and the poll became a
   timing-dependent failure. Removed the registry-side wait;
   evaluator-in-registry assertion is covered by the
   post-create-row-click test alongside.

5. ConfigureEvaluator/atoms.ts — fix persistedAppSelectionAtom race.
   `connectAppToEvaluatorAtom` persisted the app selection BEFORE
   `changePrimaryNode` ran, so a failed swap (returns `null` with no
   primary to swap from) left a stale localStorage record that the
   next mount re-hydrated into a phantom "connected" state. Moved the
   persist call to after both graph mutations succeed.
   `disconnectAppFromEvaluatorAtom` early-returned on no-downstream
   without clearing the persisted state, allowing the same phantom
   record to survive a disconnect attempt. Clear it on that branch
   too.

No behavior change for the happy-path full-page flow — these all
narrow edge cases the reviewer flagged.
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 28, 2026

Actionable comments posted: 0

…ssion-fix

Resolves a single conflict in
`web/packages/agenta-entities/src/workflow/core/schema.ts` —
release v0.100.4 added `artifact_slug` / `variant_slug` to the
revision schema alongside the `workflow_slug` /
`workflow_variant_slug` fields this branch had introduced for
emitting evaluator references on playground chain runs.

Both sides added `workflow_slug` and `workflow_variant_slug`
with overlapping intent; resolution keeps all four fields
and merges the two doc comments into one that covers both
purposes (parent-workflow identification for ID-less callers
+ evaluator chain-trace emission).

No source behavior change — schema is additive on both sides.
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 28, 2026

Actionable comments posted: 0

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 28, 2026

Railway Preview Environment

Preview URL https://gateway-production-e120.up.railway.app/w
Image tag pr-4474-0748c8b
Status Failed
Railway logs Open logs
Logs View workflow run
Updated at 2026-05-29T12:55:30.139Z

@junaway junaway changed the base branch from main to release/v0.100.7 May 29, 2026 12:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Frontend size:XXL This PR changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants