Skip to content

feat(studio): real-time eval progress via SSE push #997

@christso

Description

@christso

Summary

AgentV Studio currently polls `/api/runs` every 5 seconds to pick up new results. This means:

  • Runs started from the CLI appear after up to a 5-second delay
  • There is no per-test-case live progress — you only see a run after it fully completes
  • No visual indication that an eval is running

Convex Evals shows pending status while evals are in-flight and updates stats in real time as each test case finishes. We should match this.

Motivation

  • A user starts `agentv eval run` in the terminal — Studio should immediately show the run as in-progress and update pass/fail counts as each test resolves
  • Eliminates the awkward wait after pressing ▶ Run Eval in Studio where nothing appears to happen for several seconds
  • Makes Studio feel live and trustworthy rather than stale

Design

Server-Sent Events endpoint

Add a `GET /api/events` SSE endpoint in `serve.ts` that pushes events to connected Studio clients:

event: run_started
data: {"run_id": "...", "eval_file": "...", "target": "...", "total": 12}

event: test_result
data: {"run_id": "...", "test_id": "...", "score": 0.95, "status": "PASS", "passed": 5, "total": 12}

event: run_completed
data: {"run_id": "...", "passed": 10, "failed": 2, "pass_rate": 0.833}

The orchestrator already emits per-test results — hook into those to broadcast SSE events.

Studio client

Replace the 5-second polling interval on `runListOptions` with an SSE listener. On `run_started`, immediately add an in-progress row to the run list. On `test_result`, update the row's pass/fail counts live. On `run_completed`, mark the row as done and trigger a full refresh.

Fallback to 5-second polling if SSE connection drops or is unavailable.

Pending status indicator

While a run is in-progress, show a spinner or pulsing indicator in the run list row instead of the ✓/✗ status dot.

Acceptance Signals

All signals must be verified manually using `agent-browser` — no mocking.

  • Start `agentv studio` and open it in agent-browser. Run `agentv eval run` from a separate terminal. Within 1 second of the CLI command starting, a new in-progress row appears in Studio's Recent Runs tab without any manual refresh.
  • As each test case completes, the Passed/Failed/Total counts on the in-progress row update live (verified by agent-browser snapshotting the row mid-run and confirming counts change between snapshots).
  • When the eval run finishes, the row transitions from in-progress to a final ✓/✗ status dot and the Pass Rate pill shows the final score.
  • If the SSE connection is lost (kill and restart the studio server), the client falls back to polling and the run list still updates within 10 seconds.
  • Pressing ▶ Run Eval inside Studio shows an in-progress row immediately on submit — no visible delay.
  • Opening Studio while a run is already mid-flight shows the in-progress row with current partial counts (server broadcasts current state on SSE connect).

Metadata

Metadata

Assignees

No one assigned

    Labels

    wuiRelates to the browser dashboard / web UI runtime

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions