Skip to content

feat(studio): Stop run button + graceful CLI interrupt#1228

Open
christso wants to merge 4 commits intomainfrom
feat/1222-stop-run
Open

feat(studio): Stop run button + graceful CLI interrupt#1228
christso wants to merge 4 commits intomainfrom
feat/1222-stop-run

Conversation

@christso
Copy link
Copy Markdown
Collaborator

@christso christso commented May 7, 2026

Closes #1222.

Simplified scope

Per the user's revision: make Ctrl+C work, expose the same kill via HTTP, add a button that calls it. No AbortSignal threading, no "let in-flight tests finish", no staged shutdown, no new status words.

  1. CLI: process.on('SIGINT'|'SIGTERM') walks a child-tracker registry that providers populate, sends SIGTERM to each, and exits. Partial index.jsonl already row-by-row durable.
  2. Studio API: POST /api/eval/run/:id/stop calls child.kill('SIGTERM'). Existing child.on('close') flips status. Idempotent — terminal runs return {stopped: false, reason: 'already_terminal'}.
  3. Studio UI: Neutral-styled Stop button (⏸ Stop) on /jobs/:runId with optimistic "Stopping…" label. Part of the stop→resume workflow, not a destructive cancel.
  4. Resume detection for partial runs: persist planned_test_count in benchmark.json.metadata at run start; client-side comparison results.length < planned_test_count. No is_resumable/resume_reason.

Plan: docs/plans/1222-stop-run.md.

Drive-by: fixed two pre-existing biome format errors that were blocking pre-push hooks.

Test plan

  • CLI: SIGINT during a multi-test eval kills tracked providers; partial index.jsonl preserved.
  • Server: POST /api/eval/run/:id/stop returns 200 / 403 / 404 paths; idempotent on terminal runs.
  • UI: Stop button visibility matrix; clicking shows "Stopping…" until status flips.
  • UI: Run with 5/10 ok rows shows Resume; complete run does not.
  • Manual red/green UAT documented.

🤖 Generated with Claude Code

christso added 2 commits May 7, 2026 07:49
Track long-lived provider subprocesses (claude, codex, pi, copilot,
vscode) in a per-process registry and walk it from a top-level signal
handler in cli.ts. Without this, Studio's child.kill('SIGTERM') against
the CLI orphans grandchildren — the Node parent exits but the OS does
not propagate the signal.

Plan in docs/plans/1222-stop-run.md.
- Pre-existing format errors in two studio routes block any push; fixed
  with biome --write so CI passes.
- child-tracker uses a structural Killable type to avoid a tsup dts
  resolution failure on `node:child_process` re-export.
@cloudflare-workers-and-pages
Copy link
Copy Markdown

cloudflare-workers-and-pages Bot commented May 7, 2026

Deploying agentv with  Cloudflare Pages  Cloudflare Pages

Latest commit: e1cfcc0
Status: ✅  Deploy successful!
Preview URL: https://2bc6ebb2.agentv.pages.dev
Branch Preview URL: https://feat-1222-stop-run.agentv.pages.dev

View logs

christso added 2 commits May 7, 2026 08:16
- DELETE /api/eval/run/:id (and benchmark-scoped variant) SIGTERMs the
  spawned CLI. The CLI's own signal handler walks the child registry
  added in the previous commit and kills grandchildren before exiting.
  Existing child.on('close') flips status — no new 'stopping' state.

- StopRunButton on /jobs/:runId, hidden when terminal or read-only.
  Optimistic 'Stopping…' label until the next status poll observes a
  terminal state.

- planned_test_count persisted in benchmark.json.metadata via a stub
  written at run start. Resume-action visibility now triggers when
  results.length < planned_test_count even with no execution_error
  rows — covers Stop-button / Ctrl+C cases.

- Narrow tests: shouldShowStopButton matrix, partial-run resume helper,
  DELETE 404/403 routes for both base + benchmark-scoped paths. Happy
  path of DELETE→SIGTERM is covered by manual UAT.
Stop is part of the stop → resume → complete workflow, not a
destructive cancel — DELETE semantics were wrong. Switched to
POST /api/eval/run/:id/stop (and benchmark-scoped variant), kept the
idempotent-on-terminal behavior so clients can fire-and-forget.

UI: removed red destructive styling on the Stop button. Now neutral
gray with a pause glyph to signal "this is a pause, not a kill."
@christso
Copy link
Copy Markdown
Collaborator Author

christso commented May 7, 2026

Manual red/green UAT — evidence

Red baseline (origin/main @ 0bab7a3, before these changes)

$ curl -X POST http://localhost:7013/api/eval/run/$RUNID/stop  →  HTTP 404
$ curl -X DELETE http://localhost:7013/api/eval/run/$RUNID     →  HTTP 404

No /stop endpoint exists. Killing the run via terminal (only escape on main) leaves the run dir with only index.jsonlbenchmark.json is never written for an interrupted run because the write is end-of-run-only:

$ ls .agentv/results/runs/uat-red/
index.jsonl                # benchmark.json absent

shouldShowResumeActions on main returns false for any partial run that has no execution_error rows:

export function shouldShowResumeActions(results, isReadOnly): boolean {
  if (isReadOnly) return false;
  return results.some((r) => r.executionStatus === 'execution_error');
}

So a run interrupted after a few clean passes is invisible to Resume. This is the gap the issue called out.

Green (this branch)

G1 — CLI Ctrl+C kills providers, partial JSONL preserved, stub benchmark.json carries planned_test_count:

$ bun apps/cli/src/cli.ts eval examples/features/basic --target azure --workers 1 \
    --output .agentv/results/runs/uat-g1 &
PID=...
1/7   ✅ code-review-javascript | azure | 100% PASS | 3925/5941ms
1/7   🔄 code-gen-python-comprehensive | azure
$ kill -INT $PID
[PID] Exit 130                  # SIGINT-conventional exit code
$ wc -l .agentv/results/runs/uat-g1/index.jsonl
1
$ cat .agentv/results/runs/uat-g1/benchmark.json
{
  "metadata": {
    "eval_file": ".../basic/evals/dataset.eval.yaml",
    "planned_test_count": 7,
    ...
  },
  ...
}

G2 — POST /api/eval/run/:id/stop terminates the spawned CLI:

$ curl -X POST http://localhost:7011/api/eval/run \
    -d '{"suite_filter":"examples/features/basic","target":"azure","workers":1,"output":".agentv/results/runs/uat-g2"}'
{"id":"studio-20260507-103458-3uq2","status":"running"}
# wait 12s — 1 test passes
$ curl -X POST http://localhost:7011/api/eval/run/studio-20260507-103458-3uq2/stop
{"stopped":true,"status":"running"}
# wait 3s for close handler
$ curl http://localhost:7011/api/eval/status/studio-20260507-103458-3uq2
{"status":"failed","exit_code":143,...}      # SIGTERM-conventional exit code
$ wc -l .agentv/results/runs/uat-g2/index.jsonl
1                                            # partial JSONL preserved

Idempotency / 404 / 403 on POST /stop:

# Terminal run — idempotent, returns 200 not 409
$ curl -X POST http://localhost:7011/api/eval/run/$RUNID/stop
{"stopped":false,"reason":"already_terminal","status":"failed"}

# Unknown id
$ curl -o /dev/null -w 'HTTP %{http_code}\n' \
    -X POST http://localhost:7011/api/eval/run/no-such-id/stop
HTTP 404

# Read-only mode (both base and benchmark-scoped paths)
$ curl -o /dev/null -w 'HTTP %{http_code}\n' \
    -X POST http://localhost:7012/api/eval/run/anything/stop
HTTP 403
$ curl -o /dev/null -w 'HTTP %{http_code}\n' \
    -X POST http://localhost:7012/api/benchmarks/x/eval/run/anything/stop
HTTP 403

Run-detail API surfaces planned_test_count for the partial run, and the recorded row is ok (proving the new partial-run path triggered Resume, NOT the existing execution_error path):

$ curl http://localhost:7011/api/runs/uat-g2
results count: 1   planned_test_count: 7   suite_filter: .../dataset.eval.yaml   run_dir: .agentv/results/runs/uat-g2

$ python3 -c 'import json; print(json.loads(open(".../uat-g2/index.jsonl").readline()))'
execution_status: ok   test_id: code-review-javascript   score: 1

shouldShowResumeActions(results, isReadOnly=false, plannedTestCount=7)true because 1 < 7. Unit-tested in apps/studio/src/components/resume-run-helpers.test.ts.

Pre-push hooks

  • Build: ✅
  • Typecheck: ✅
  • Lint: ✅
  • Test (2400+): ✅
  • Validate eval YAMLs: ✅

@christso christso marked this pull request as ready for review May 7, 2026 08:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat(studio): add Stop run button + graceful CLI interrupt — pairs with eval resume

1 participant