feat(studio): Stop run button + graceful CLI interrupt by christso · Pull Request #1228 · EntityProcess/agentv

christso · 2026-05-07T06:00:10Z

Closes #1222.

Simplified scope

Per the user's revision: make Ctrl+C work, expose the same kill via HTTP, add a button that calls it. No AbortSignal threading, no "let in-flight tests finish", no staged shutdown, no new status words.

CLI: process.on('SIGINT'|'SIGTERM') walks a child-tracker registry that providers populate, sends SIGTERM to each, and exits. Partial index.jsonl already row-by-row durable.
Studio API: POST /api/eval/run/:id/stop calls child.kill('SIGTERM'). Existing child.on('close') flips status. Idempotent — terminal runs return {stopped: false, reason: 'already_terminal'}.
Studio UI: Neutral-styled Stop button (⏸ Stop) on /jobs/:runId with optimistic "Stopping…" label. Part of the stop→resume workflow, not a destructive cancel.
Resume detection for partial runs: persist planned_test_count in benchmark.json.metadata at run start; client-side comparison results.length < planned_test_count. No is_resumable/resume_reason.

Plan: docs/plans/1222-stop-run.md.

Drive-by: fixed two pre-existing biome format errors that were blocking pre-push hooks.

Test plan

CLI: SIGINT during a multi-test eval kills tracked providers; partial index.jsonl preserved.
Server: POST /api/eval/run/:id/stop returns 200 / 403 / 404 paths; idempotent on terminal runs.
UI: Stop button visibility matrix; clicking shows "Stopping…" until status flips.
UI: Run with 5/10 ok rows shows Resume; complete run does not.
Manual red/green UAT documented.

🤖 Generated with Claude Code

Track long-lived provider subprocesses (claude, codex, pi, copilot, vscode) in a per-process registry and walk it from a top-level signal handler in cli.ts. Without this, Studio's child.kill('SIGTERM') against the CLI orphans grandchildren — the Node parent exits but the OS does not propagate the signal. Plan in docs/plans/1222-stop-run.md.

- Pre-existing format errors in two studio routes block any push; fixed with biome --write so CI passes. - child-tracker uses a structural Killable type to avoid a tsup dts resolution failure on `node:child_process` re-export.

cloudflare-workers-and-pages · 2026-05-07T06:00:43Z

Deploying agentv with Cloudflare Pages

Latest commit:	`e1cfcc0`
Status:	✅ Deploy successful!
Preview URL:	https://2bc6ebb2.agentv.pages.dev
Branch Preview URL:	https://feat-1222-stop-run.agentv.pages.dev

View logs

- DELETE /api/eval/run/:id (and benchmark-scoped variant) SIGTERMs the spawned CLI. The CLI's own signal handler walks the child registry added in the previous commit and kills grandchildren before exiting. Existing child.on('close') flips status — no new 'stopping' state. - StopRunButton on /jobs/:runId, hidden when terminal or read-only. Optimistic 'Stopping…' label until the next status poll observes a terminal state. - planned_test_count persisted in benchmark.json.metadata via a stub written at run start. Resume-action visibility now triggers when results.length < planned_test_count even with no execution_error rows — covers Stop-button / Ctrl+C cases. - Narrow tests: shouldShowStopButton matrix, partial-run resume helper, DELETE 404/403 routes for both base + benchmark-scoped paths. Happy path of DELETE→SIGTERM is covered by manual UAT.

Stop is part of the stop → resume → complete workflow, not a destructive cancel — DELETE semantics were wrong. Switched to POST /api/eval/run/:id/stop (and benchmark-scoped variant), kept the idempotent-on-terminal behavior so clients can fire-and-forget. UI: removed red destructive styling on the Stop button. Now neutral gray with a pause glyph to signal "this is a pause, not a kill."

christso · 2026-05-07T08:38:09Z

Manual red/green UAT — evidence

Red baseline (origin/main @ `0bab7a3`, before these changes)

$ curl -X POST http://localhost:7013/api/eval/run/$RUNID/stop  →  HTTP 404
$ curl -X DELETE http://localhost:7013/api/eval/run/$RUNID     →  HTTP 404

No /stop endpoint exists. Killing the run via terminal (only escape on main) leaves the run dir with only index.jsonl — benchmark.json is never written for an interrupted run because the write is end-of-run-only:

$ ls .agentv/results/runs/uat-red/
index.jsonl                # benchmark.json absent

shouldShowResumeActions on main returns false for any partial run that has no execution_error rows:

export function shouldShowResumeActions(results, isReadOnly): boolean {
  if (isReadOnly) return false;
  return results.some((r) => r.executionStatus === 'execution_error');
}

So a run interrupted after a few clean passes is invisible to Resume. This is the gap the issue called out.

Green (this branch)

G1 — CLI Ctrl+C kills providers, partial JSONL preserved, stub benchmark.json carries planned_test_count:

$ bun apps/cli/src/cli.ts eval examples/features/basic --target azure --workers 1 \
    --output .agentv/results/runs/uat-g1 &
PID=...
1/7   ✅ code-review-javascript | azure | 100% PASS | 3925/5941ms
1/7   🔄 code-gen-python-comprehensive | azure
$ kill -INT $PID
[PID] Exit 130                  # SIGINT-conventional exit code
$ wc -l .agentv/results/runs/uat-g1/index.jsonl
1
$ cat .agentv/results/runs/uat-g1/benchmark.json
{
  "metadata": {
    "eval_file": ".../basic/evals/dataset.eval.yaml",
    "planned_test_count": 7,
    ...
  },
  ...
}

G2 — POST /api/eval/run/:id/stop terminates the spawned CLI:

$ curl -X POST http://localhost:7011/api/eval/run \
    -d '{"suite_filter":"examples/features/basic","target":"azure","workers":1,"output":".agentv/results/runs/uat-g2"}'
{"id":"studio-20260507-103458-3uq2","status":"running"}
# wait 12s — 1 test passes
$ curl -X POST http://localhost:7011/api/eval/run/studio-20260507-103458-3uq2/stop
{"stopped":true,"status":"running"}
# wait 3s for close handler
$ curl http://localhost:7011/api/eval/status/studio-20260507-103458-3uq2
{"status":"failed","exit_code":143,...}      # SIGTERM-conventional exit code
$ wc -l .agentv/results/runs/uat-g2/index.jsonl
1                                            # partial JSONL preserved

Idempotency / 404 / 403 on POST /stop:

# Terminal run — idempotent, returns 200 not 409
$ curl -X POST http://localhost:7011/api/eval/run/$RUNID/stop
{"stopped":false,"reason":"already_terminal","status":"failed"}

# Unknown id
$ curl -o /dev/null -w 'HTTP %{http_code}\n' \
    -X POST http://localhost:7011/api/eval/run/no-such-id/stop
HTTP 404

# Read-only mode (both base and benchmark-scoped paths)
$ curl -o /dev/null -w 'HTTP %{http_code}\n' \
    -X POST http://localhost:7012/api/eval/run/anything/stop
HTTP 403
$ curl -o /dev/null -w 'HTTP %{http_code}\n' \
    -X POST http://localhost:7012/api/benchmarks/x/eval/run/anything/stop
HTTP 403

Run-detail API surfaces planned_test_count for the partial run, and the recorded row is ok (proving the new partial-run path triggered Resume, NOT the existing execution_error path):

$ curl http://localhost:7011/api/runs/uat-g2
results count: 1   planned_test_count: 7   suite_filter: .../dataset.eval.yaml   run_dir: .agentv/results/runs/uat-g2

$ python3 -c 'import json; print(json.loads(open(".../uat-g2/index.jsonl").readline()))'
execution_status: ok   test_id: code-review-javascript   score: 1

shouldShowResumeActions(results, isReadOnly=false, plannedTestCount=7) → true because 1 < 7. Unit-tested in apps/studio/src/components/resume-run-helpers.test.ts.

Pre-push hooks

Build: ✅
Typecheck: ✅
Lint: ✅
Test (2400+): ✅
Validate eval YAMLs: ✅

christso added 2 commits May 7, 2026 07:49

christso added 2 commits May 7, 2026 08:16

christso marked this pull request as ready for review May 7, 2026 08:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(studio): Stop run button + graceful CLI interrupt#1228

feat(studio): Stop run button + graceful CLI interrupt#1228
christso wants to merge 4 commits intomainfrom
feat/1222-stop-run

christso commented May 7, 2026 •

edited

Loading

Uh oh!

cloudflare-workers-and-pages Bot commented May 7, 2026 •

edited

Loading

Uh oh!

christso commented May 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

christso commented May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Simplified scope

Test plan

Uh oh!

cloudflare-workers-and-pages Bot commented May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Deploying agentv with Cloudflare Pages

Uh oh!

christso commented May 7, 2026

Manual red/green UAT — evidence

Red baseline (origin/main @ 0bab7a3, before these changes)

Green (this branch)

Pre-push hooks

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

christso commented May 7, 2026 •

edited

Loading

cloudflare-workers-and-pages Bot commented May 7, 2026 •

edited

Loading

Red baseline (origin/main @ `0bab7a3`, before these changes)