Phase 3 (commits 5-13): runner registry full lifecycle — dispatcher, callbacks, dashboard, runner SDK, CLI, e2e, docs, version bump by risjai · Pull Request #154 · agentoptics/rewind

risjai · 2026-04-28T05:18:11Z

Summary

Closes the loop on Phase 3 — the dashboard becomes a control surface instead of a viewer. Operators register long-lived runner processes; clicking "Run replay" on a session step HMAC-signs a webhook to the runner; the runner executes the agent under a replay context; progress streams back to the dashboard live via WebSocket.

This is commits 5-13 of the 13-commit Phase 3 plan, continuing from commits 1-4 already merged in PRs #150, #151, #152, #153.

What ships

commit	What
5	Webhook dispatcher (HMAC signing, 5s timeout) + lease reaper (30s scan)
6	`POST /api/sessions/{sid}/replay-jobs` (dual-shape A/B) + `POST /api/replay-jobs/{id}/events` (X-Rewind-Runner-Auth, ownership check, atomic event ingestion) + WS broadcast
7	Web `RunnersPage` (list / register / regenerate / remove + raw-token reveal modal)
8	Web `RunReplayButton` + `ReplayJobModal` + `useReplayJob` hook + `replay_job_update` WebSocket frame
9	Python `rewind_agent.runner` library (`RunnerConfig`, `compute_signature`, `verify_signature`, `ProgressReporter`, `asgi_handler`, `@handle_replay`) + `ExplicitClient.attach_replay_context` + env-var bootstrap
10	`rewind runners {list, add, remove, regenerate-token}` CLI
11	E2E test exercising the full lifecycle against a real binary + stub runner; `REWIND_ALLOW_LOOPBACK_WEBHOOKS` dev escape hatch
12	`docs/runners.md` operator how-to + decision matrix updated 3-way → 5-way (also bundles deferred Phase 2 doc)
13	Rust workspace 0.13.0 → 0.14.0 across 4 mirror files (Python SDK + MCP versions stay per CLAUDE.md track-2)

API / wire format

Outbound dispatch (Rewind → runner):
```
POST <runner.webhook_url>
X-Rewind-Job-Id:
X-Rewind-Signature: sha256= # HMAC-SHA256(raw_token, job_id || \n || raw_body)
Content-Type: application/json
{ "job_id", "session_id", "replay_context_id", "base_url" }
```

Inbound events (runner → Rewind, mounts OUTSIDE bearer middleware):
```
POST /api/replay-jobs/{id}/events
X-Rewind-Runner-Auth: # SHA-256 lookup; ownership check on job.runner_id
{ "event_type": "started|progress|completed|errored", "step_number?", "progress_total?", ... }
```

Reference vectors pinned both sides: crates/rewind-web/src/dispatcher.rs::compute_signature_matches_python_reference ↔ python/tests/test_runner.py::test_compute_signature_matches_rust_reference.

State machine + reaper

Same SQL-level terminal-state guard from #152 round 2 protects every transition. Reaper marks dispatched-but-never-started → errored stage='dispatch' and in_progress-without-heartbeat → errored stage='lease_expired'.

Bootstrap (server)

```bash
export REWIND_RUNNER_SECRET_KEY="$(openssl rand -base64 32)" # required
export REWIND_PUBLIC_URL="https://rewind.example\" # for non-local
rewind web
```

Unset key → 503 with bootstrap message. Set-but-malformed → server panics at startup (per #153 MEDIUM 5 fix).

Test plan

cargo test --workspace — green
cargo clippy on rewind-web --lib, rewind-store --all-targets, rewind-cli — clean (pre-existing test-file warnings on recording_api_tests / transcript_sync_tests unchanged from master)
24 runners_tests + 12 new replay_jobs_tests integration tests
481 pytest pass, 1 skipped (Python — includes 19 new test_runner.py + 1 e2e)
web build green (426 KB / 119 KB gz; 1705 modules)
E2E flow verified end-to-end via python/tests/e2e/test_runner_replay_flow.py
All previous phase work intact (Phase 1 + 2 + 3 commits 1-4)

Notes for reviewers

9 logical commits (5-13). Each commit builds on its predecessor and is reviewable on its own.
The REWIND_ALLOW_LOOPBACK_WEBHOOKS env escape is documented as DEV-ONLY in both the registration handler and docs/runners.md. Production deployments should never set it.
cancellation is fire-and-forget in v1 (operator marks errored; runner not notified). Cooperative cancel deferred to v3.1 per the plan.
polling mode is still rejected at registration (deferred to v3.1, per feat(rust): /api/runners CRUD + AES-256-GCM crypto module (Phase 3, commit 4/13) #153 HIGH 1).
Token rotation + runner deletion still refuse while non-terminal jobs exist (per feat(rust): /api/runners CRUD + AES-256-GCM crypto module (Phase 3, commit 4/13) #153 HIGH 3 + MEDIUM 4).

Made with Cursor

dispatcher.rs: - Dispatcher::dispatch() decrypts runner token, builds canonical body, signs with HMAC-SHA256(raw_token, job_id || \n || body), POSTs to runner.webhook_url with X-Rewind-Job-Id + X-Rewind-Signature headers, 5s timeout. - Success → state pending->dispatched, sets dispatch_deadline_at (10s) + lease_expires_at (5min). - Any failure → state pending->errored, error_stage='dispatch'. - State transitions go through Store::advance_replay_job_state (commit 3's atomic SQL guard). - 9 unit tests including a Python HMAC reference cross-check so the runner SDK in commit 9 stays bit-compatible. reaper.rs: - Background tokio task spawned by WebServer::run(). - 30s scan: dispatch deadline expired (dispatched but no Started in 10s) -> errored stage='dispatch'; lease expired (in_progress but no heartbeat in 5min) -> errored stage='lease_expired'. - Each transition broadcasts on the StoreEvent bus so the dashboard sees lease failures live. - 3 tests covering both expiry classes and the no-op-when-fresh case. Store helpers: - set_dispatch_deadline_and_lease() (called by dispatcher). - list_dispatch_deadline_expired() (used by reaper). WebSocket: - StoreEvent::ReplayJobUpdated variant added. - ws.rs forwards it to clients subscribed to the owning session via new ServerMessage::ReplayJobUpdate frame. AppState: - dispatcher: Option<Dispatcher>, base_url: String. - bootstrap_base_url() reads REWIND_PUBLIC_URL with sane local default; documented for ops. Existing test fixtures (api_tests, auth_tests, hooks_tests, recording_api_tests, transcript_sync_tests) updated for the new AppState fields. Tests: 106 lib + 141 integration green; ruff clean; pytest unaffected. Made-with: Cursor

…ase 3, commit 6/13) POST /api/sessions/{sid}/replay-jobs (bearer-protected): - Dual-shape body via serde(untagged): - Shape A: {runner_id, source_timeline_id, at_step, strict_match?} → server forks timeline + creates replay context + creates job atomically. - Shape B: {runner_id, replay_context_id} → reuse-existing-context; validates context belongs to session and isn't already in-flight via new Store::count_in_flight_jobs_for_replay_context (one-job- per-context invariant prevents cursor races). - Both fire dispatcher in a tokio::spawn (fire-and-forget) and return 202 Accepted with the job_id immediately. - 503 if crypto key not configured; 404 on unknown session/runner; 400 on invalid timeline; 409 on inactive runner / context-in-use. GET /api/sessions/{sid}/replay-jobs — list jobs for a session. GET /api/replay-jobs/{id} — fetch a single job. DELETE /api/replay-jobs/{id} — operator cancel (force errored with stage='cancelled'; v1 doesn't notify the runner). POST /api/replay-jobs/{id}/events (mounted OUTSIDE bearer middleware): - Auth: X-Rewind-Runner-Auth: <token> header. Hashed and looked up via auth_token_hash index. Ownership check: job.runner_id must match the authenticated runner's id (403 otherwise). - Wraps Store::record_replay_job_event_atomic from #152 round 2 → full state-machine guard. Illegal transitions return 409 with accepted=false. - Emits StoreEvent::ReplayJobUpdated → WebSocket broadcast. - touch_runner_last_seen() for liveness tracking. Dispatcher refactor: - dispatch() is now PURE: returns DispatchOutcome instead of holding a Store reference across .await. Caller (HTTP handler) applies the outcome via Dispatcher::apply_outcome inside a briefly-held Mutex lock. Closes the lock-across-await footgun that clippy::await_holding_lock would flag. WebServer.build_router: - runner_callback_routes mounted alongside /_rewind/health (outside the protected layer) so X-Rewind-Runner-Auth replaces bearer. Tests: 12 new replay_jobs_tests (full event lifecycle: shape-B dispatch with stub runner, 500-response failure, context-in-use, unknown session, inactive runner, runner-auth missing, runner-auth unknown, ownership mismatch, started transition, progress update, completed transition, illegal-transition 409). Stub runner is a tokio TcpListener-backed axum server. Storage: count_in_flight_jobs_for_replay_context (used by shape-B contention check). All checks: 250+ rewind-web tests pass; clippy clean on new code. Made-with: Cursor

…nv bootstrap (Phase 3, commit 9/13) ExplicitClient.attach_replay_context(session_id, replay_context_id): - Sets the contextvars without creating a new replay context. - Resolves Phase 3 plan HIGH #4: runners receive an existing context in their dispatch payload and must not create a duplicate. intercept.install(): - Bootstraps from REWIND_SESSION_ID + REWIND_REPLAY_CONTEXT_ID env vars before patching transports. Both must be set together; either-but-not-both logs a WARN. Lets a runner subprocess be configured purely via env without touching the SDK. rewind_agent/runner.py (new, ~370 LOC): - RunnerConfig.from_env() — reads REWIND_RUNNER_TOKEN + REWIND_URL. - compute_signature(token, job_id, body) — pure HMAC-SHA256 matching the Rust dispatcher's wire format. Cross-checked against the same Python reference vector both crates pin against (drift between Rust + Python = silent dispatch failure). - verify_signature(config, headers, body_bytes) — case-insensitive header lookup, constant-time hmac.compare_digest. Returns (ok, reason) tuple. - DispatchPayload — typed dataclass mirroring the Rust DispatchBody. - ProgressReporter — async wrapper around POST /api/replay-jobs/ {id}/events; httpx-preferred with urllib fallback. Methods: started(), progress(step, total?, payload?), completed(), errored(message, stage?). - asgi_handler(...) — verifies signature, decodes payload, fires user handler as background task, returns (202, {job_id}) immediately so the dispatcher's 5s timeout is satisfied. On handler exception, auto-emits errored event with the exception message + error_stage='agent'. - @handle_replay marker decorator for documentation/discovery. Tests (19 new, 480 total pass): - RunnerConfig.from_env: requires token; honors defaults; honors REWIND_URL. - compute_signature: matches Rust reference vector byte-for-byte. - verify_signature: accepts canonical, rejects missing headers, malformed prefix, tampered body, wrong token; case-insensitive. - DispatchPayload decode round-trip. - asgi_handler: invalid signature returns 401 + handler not invoked; invalid body returns 400; valid request → 202 + handler runs + started/progress/completed events emitted in order; user code raising → errored event emitted with the exception message. - attach_replay_context sets contextvars. - intercept.install env-var bootstrap (both vars set; partial vars logged + skipped). ruff clean. Pre-existing 461 tests still pass; new total 480/480 + 1 skip. Made-with: Cursor

…e 3, commit 10/13) Thin HTTP wrapper over /api/runners. All four subcommands take --rewind-url (defaults from REWIND_URL env, falls back to http://127.0.0.1:4800) and --token (defaults from REWIND_AUTH_TOKEN for the dashboard's bearer auth). rewind runners list — pretty-printed table rewind runners add --name N --webhook-url U — returns RAW token ONCE rewind runners remove <id> — 409 surfaced as 'removal blocked' rewind runners regenerate-token <id> — returns NEW raw token The 'add' output mimics the dashboard's RegisterRunnerResponse: prints the raw token in bold yellow, reminds the operator it can't be retrieved later, and prints an export-ready REWIND_RUNNER_TOKEN line for one-line copy-paste into a runner's environment. Build green; cli help working. Made-with: Cursor

…dal (Phase 3, commit 7/13) - RunnersPage.tsx — table view of registered runners with status pill (active/disabled/stale), webhook URL, token preview, last- seen timestamp. TanStack Query keeps it auto-refreshed every 10s. - RegisterRunnerModal — form with name + webhook_url; surfaces server-side validation errors (SSRF rejection, polling-deferred, malformed URL) inline. - RawTokenModal — one-shot reveal of the raw auth token with Copy-to-clipboard button and a yellow 'cannot be retrieved' warning. Shows export-ready REWIND_RUNNER_TOKEN line for paste. - Sidebar — new 'Runners' nav entry (Server icon). - App.tsx + use-store.ts — 'runners' added to view union. - lib/api.ts — runner CRUD + replay-job endpoints + types (RunnerView, RegisterRunnerResponse, ReplayJobView, CreateReplayJobBody — re-exported for commit 8 consumers). Web build green (419 KB gz 117 KB; 1703 modules). Made-with: Cursor

…se 3, commit 8/13) RunReplayButton.tsx exports both the inline button and the ReplayJobModal. Modal flow: pick active runner -> optional strict_match -> POST shape A (server forks + creates context atomically) -> live progress. use-replay-job.ts: hook backing the modal. dispatch() POSTs the job, subscribes via useWebSocket for replay_job_update frames matching this job_id, exposes state/progress/error that update live. use-websocket.ts: new ReplayJobUpdateData type + onReplayJobUpdate callback. StepTimeline.tsx: per-step Run-replay button (cyan, hover-revealed) alongside the existing Fork button. SessionView.tsx: modalState union extended with 'runReplay'. Both StepTimeline and ActivityTimeline forward onRunReplay (ActivityTimeline forwarding is a placeholder; UI integration deferred). UX: - No-runners fallback shows CLI invocation + link to Runners page. - In-flight: progress bar, state pill, fork-timeline-id surfaced. - Terminal: success/error icon swap, modal stays open for inspection. - Cancel: DELETE /api/replay-jobs/{id} (operator cancel; v1 fire-and-forget). Web build green (426 KB / 119 KB gz; 1705 modules). Made-with: Cursor

…ev escape (Phase 3, commit 11/13) python/tests/e2e/test_runner_replay_flow.py exercises the complete dispatch + callback lifecycle against a real Rewind binary: 1. Spawn the rewind release binary with REWIND_RUNNER_SECRET_KEY + REWIND_AUTH_TOKEN + REWIND_ALLOW_LOOPBACK_WEBHOOKS=1. 2. Stand up a stub runner HTTP server on a free port. 3. Register the runner via POST /api/runners with bearer auth; capture raw_token from one-shot response. 4. Seed a session via POST /api/sessions/start + record one llm-call so a fork target exists. 5. Dispatch shape A (server forks + creates context) via POST /api/sessions/{sid}/replay-jobs at_step=1. 6. Wait for stub runner to receive the dispatch; verify X-Rewind-Job-Id + X-Rewind-Signature headers + reconstructed HMAC against the wire format the dispatcher uses. 7. Stub posts started -> progress(1/2) -> progress(2/2) -> completed events back via POST /api/replay-jobs/{id}/events with X-Rewind-Runner-Auth header. 8. Poll job state until completed; assert progress_step=2, progress_total=2. Test skips when ./target/release/rewind isn't built (handles multiple working directory layouts via _resolve_rewind_bin). REWIND_ALLOW_LOOPBACK_WEBHOOKS=1 escape hatch in the register handler bypasses the SSRF guard for localhost webhooks. NEVER set in production - any operator who can reach the dashboard could then dispatch webhooks at localhost / cloud-metadata IPs. Strictly for dev/CI where the runner stub lives on localhost. 481 python tests pass (was 480, +1 e2e). Made-with: Cursor

…ase 3, commit 12/13) docs/runners.md — full operator-facing guide: - Bootstrap (REWIND_RUNNER_SECRET_KEY + REWIND_PUBLIC_URL). - Register from dashboard or CLI; raw-token surface discipline. - Implement runner side (FastAPI/asgi_handler example); REWIND_RUNNER_TOKEN env; alternative env-var bootstrap path. - Dispatch from dashboard or curl; both API shapes (A: server-fork, B: reuse-context). - Full state machine table (pending -> dispatched -> in_progress -> completed/errored). - Security model: outbound HMAC recipe, inbound runner-auth + ownership check, SSRF guard + REWIND_ALLOW_LOOPBACK_WEBHOOKS dev escape. - Lease + reaper semantics + dispatch deadline. - Troubleshooting: SSRF rejection, polling-deferred, active-jobs block on remove/rotate, signature mismatch, ownership 403. docs/recording.md — decision matrix updated from 'three ways to record' to **'five ways to record + replay'**: 1. Direct mode (Python init()) 2. HTTP intercept 3. cached_llm_call decorator (bundled from deferred Phase 2 doc) 4. Proxy mode (any language) 5. Dashboard runner (Phase 3) docs/getting-started.md — terminal pointer updated to highlight the 5-way matrix and link runners.md. Made-with: Cursor

…ommit 13/13) Per CLAUDE.md track-1 versioning rules: - Rust workspace + crates: 0.13.0 -> 0.14.0 (v0.13.0 was released as GitHub tag on Apr 27 and crates/ + web/src/ both changed in this PR — schema migration in commit 3, dispatcher + reaper in commit 5, dispatch + event endpoints in commit 6, web UI in 7+8). Cargo.toml + Cargo.lock + python/rewind_cli.py CLI_VERSION + python-mcp/rewind_mcp_cli.py CLI_VERSION all moved together. Per CLAUDE.md track-2 (Python SDK): - python/pyproject.toml stays at 0.15.0 (PyPI is at 0.14.8 — 0.15.0 is unpublished, so changes ride the existing version). - python-mcp/pyproject.toml stays at 0.13.0 (no MCP API change AND PyPI is at 0.12.10 — unpublished, no bump needed). Also: ruff --fix on three unused-import warnings in the new python/tests/e2e/test_runner_replay_flow.py (asyncio, sys, typing.Optional). All 481 pytest pass, web build green, Rust tests green, clippy clean on the touched crates. Made-with: Cursor

vercel · 2026-04-28T05:18:15Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
rewind	Ready	Preview, Comment	Apr 28, 2026 7:03am

risjai

Santa Verdict: NAUGHTY

Both independent Santa reviewers returned FAIL. CI is green now, and this PR covers a lot of the Phase 3 lifecycle, but there are still blocking correctness/security issues in the runner replay path.

Blocking Findings

Shape A accepts strict_match but does not apply it. CreateReplayJobShapeA includes strict_match, but after create_replay_context(...) the code never calls set_replay_context_strict_match. The dashboard can request strict replay and still get warn-on-divergence behavior. Add the setter and an integration test proving strict runner replay returns/raises on divergence.
Runner replay does not bind live recordings to the fork timeline. ExplicitClient.attach_replay_context() only sets _session_id and _replay_context_id; it does not set _timeline_id. The dispatch payload also omits fork_timeline_id, and the SDK does not resolve the replay context's timeline before running the agent. Cache misses after the fork can therefore record new live steps into the root timeline instead of the replay fork. Include the fork/context timeline in the payload or make attach_replay_context() fetch context metadata and set _timeline_id before any live miss can record.
Shape B reuse does not validate cursor state or context expiry. It checks session ownership and in-flight jobs, but does not reject replay contexts whose cursor was already advanced (current_step != from_step) or expired/near-expired contexts. A reused context can start from the wrong ordinal. Add current-step and TTL validation as described in the plan.
Runner SDK ignores the dispatch payload base_url for callbacks. The payload includes base_url, but ProgressReporter builds event URLs from RunnerConfig.rewind_base_url / REWIND_URL. Dispatches only report progress correctly if the runner separately has matching config. Default the reporter to payload.base_url, with config override only when explicitly requested.
Outbound webhook HMACs are replayable. The signature covers job_id || body, but has no timestamp, delivery id, expiry window, or runner-side job-id/delivery dedupe before scheduling user code. A captured signed webhook can be replayed to launch the agent again even if later event callbacks are rejected. Add X-Rewind-Timestamp + tolerance window and runner-side idempotency keyed by job_id or delivery_id before invoking the handler.
SSRF validation is registration-time only and is vulnerable to DNS rebinding. validate_export_endpoint() is called when the runner URL is registered, but Dispatcher later uses reqwest against the stored URL and explicitly does not revalidate. If DNS changes after registration, the dispatcher can POST to private/link-local/metadata IPs. Revalidate/pin resolution at dispatch time, or use a connector/resolver that rejects private/reserved targets on connect. Strongly consider requiring HTTPS for non-loopback webhooks.
Dashboard progress can miss fast job updates. The hook/modal subscribes after job creation, and the WebSocket callback setup can miss transient ReplayJobUpdated broadcasts for fast jobs that complete before the modal is listening for the returned job_id. The UI should fetch current job state after dispatch and/or subscribe before dispatch with a correlation token; WebSocket events should be treated as invalidation, not the only source of truth.

Also Noted

The implementation reintroduces cancellation via API/UI despite the revised plan explicitly deferring cancellation until a cooperative runner protocol exists. Remove it for v1 or make it clear that this is only server-side abandonment and cannot stop the runner/agent.
The E2E path uses a custom stub runner that reads payload.base_url directly, so it does not exercise the shipped rewind_agent.runner SDK behavior where the callback URL bug lives.

Passes

CI is green.
Versioning looks consistent with the repo rules.
Runner token registration/regeneration and basic callback ownership checks are covered better than the earlier PRs.

Recommended Fix Set

Wire strict_match into Shape A replay context creation.
Ensure runner replay sets the fork timeline context before live misses can record.
Validate Shape B current_step == from_step and context freshness/TTL.
Make ProgressReporter use dispatch base_url by default.
Add dispatch freshness/idempotency protections to HMAC webhooks.
Revalidate/pin webhook destinations at dispatch time against SSRF policy.
Add tests for strict Shape A, fork-timeline live miss, Shape B consumed/expired contexts, SDK callback base URL, HMAC replay prevention, DNS rebinding/dispatch-time SSRF, and fast UI completion updates.

…ase_url, HMAC replay, dispatch SSRF, fast UI updates, remove cancel 7 blocking + 2 noted findings from the Santa review. All addressed. F1 — strict_match accepted but ignored. Shape A now calls set_replay_context_strict_match after create_replay_context when the request body has strict_match=true. New tests: shape_a_strict_match_true_sets_replay_context_strict_flag and shape_a_strict_match_omitted_defaults_to_warn_only. F2 — fork timeline binding. Dispatch payload now carries replay_context_timeline_id (server-side). Shape B return also populates fork_timeline_id with the existing context's timeline (was null previously). Python ExplicitClient.attach_replay_context takes an optional timeline_id and sets the _timeline_id contextvar so live cache misses record into the fork timeline rather than the root. DispatchPayload tolerates missing field for back-compat. F3 — Shape B cursor + TTL validation. Refuses contexts whose cursor != from_step (already mid-replay) and contexts older than REPLAY_CONTEXT_TTL_SECS (1h, mirrors WebServer cleanup task). ReplayContextRow gains last_accessed_at; Store helper _test_set_replay_context_last_accessed enables fast TTL-expiry tests. New: shape_b_rejects_context_whose_cursor_already_advanced, shape_b_rejects_expired_context. F4 — ProgressReporter ignored payload.base_url. Constructor now takes optional base_url; asgi_handler builds the reporter from payload.base_url so events POST back to the dispatcher's server URL (which may differ from the runner's local REWIND_URL behind a proxy). New: test_progress_reporter_prefers_explicit_base_url_over_config + test_progress_reporter_falls_back_to_config_when_base_url_omitted. F5 — HMAC webhooks replayable. Signed input now includes a unix- seconds timestamp: HMAC-SHA256(token, ts || \\n || job_id || \\n || body). New X-Rewind-Timestamp header sent + verified. Runner SDK enforces ±5min tolerance window AND a process-local seen-cache of recent job_ids — captured signed dispatches inside the tolerance window can't relaunch the agent (200 + duplicate=true on second arrival). compute_signature gains a timestamp parameter; legacy two-line input still works for backward compat. Reference vectors re-pinned in both Rust + Python tests. F6 — SSRF only at registration. Dispatcher::dispatch now re-runs url_guard::validate_export_endpoint at dispatch time, closing the DNS-rebind window between registration and dispatch. Residual race between this check and reqwest's connection-time re-resolve documented as a known limitation requiring a custom hyper connector (deferred). REWIND_ALLOW_LOOPBACK_WEBHOOKS dev escape applies to both checks. F7 — UI race on fast jobs. useReplayJob hook now treats WebSocket frames as INVALIDATION (refetch GET /api/replay-jobs/{id}) rather than the source of truth. After dispatch returns 202, immediately fetch the current job state to seed UI in case the job already advanced past pending before the WS subscription wired up. Fixes the missing-fast-update class of bugs. N1 — cancellation removed (deferred to v3.1 per the plan). Dropped DELETE /api/replay-jobs/{id}, the cancel button, the API client method, and updated docs/runners.md. Operators kill the runner process externally; the lease reaper marks the job errored. N2 — E2E now verifies the F2 + F5 wire-format additions (replay_context_timeline_id field + X-Rewind-Timestamp header + 3-line signature recipe). The SDK code paths are exercised by unit tests in test_runner.py (test_progress_reporter_prefers_*, test_verify_signature_rejects_stale_timestamp, test_asgi_handler_replay_attack_short_circuits_with_duplicate) which run faster + more reliably than spinning a dedicated event loop per HTTP request inside HTTPServer threads. Tests: 16 replay_jobs_tests pass (was 12, +4); 28 runner unit tests pass (was 19, +9 for F4/F5/F2 coverage); 108 rewind-web lib tests (adds 2 new dispatcher tests covering timestamp variation + 3-line signing input); 102 rewind-store tests (the new _test_set_replay_context_last_accessed helper unused at unit-level); 490 pytest pass; clippy + ruff clean; web build green. Made-with: Cursor

risjai · 2026-04-28T05:56:22Z

All 7 blockers + 2 noted findings addressed in 27bdf09.

#	Severity	Finding → Fix
F1	BLOCKER	strict_match accepted but ignored → Shape A now calls `set_replay_context_strict_match` after creating the context. Tests: `shape_a_strict_match_true_sets_replay_context_strict_flag`, `shape_a_strict_match_omitted_defaults_to_warn_only`.
F2	BLOCKER	Live recordings not bound to fork timeline → dispatch payload now carries `replay_context_timeline_id`; Shape B response also returns it. Python `attach_replay_context(session_id, replay_context_id, timeline_id=...)` sets `_timeline_id` contextvar. Tolerant decode for back-compat.
F3	BLOCKER	Shape B doesn't validate cursor/TTL → refuses contexts whose `current_step != from_step` or older than 1h TTL. New `Store::_test_set_replay_context_last_accessed` helper for fast TTL-expiry tests. Tests: `shape_b_rejects_context_whose_cursor_already_advanced`, `shape_b_rejects_expired_context`.
F4	BLOCKER	ProgressReporter ignored payload base_url → constructor takes optional `base_url`; `asgi_handler` builds it from `payload.base_url`. Falls back to `config.rewind_base_url` for direct/test usage. Tests: `test_progress_reporter_prefers_explicit_base_url_over_config`, `test_progress_reporter_falls_back_to_config_when_base_url_omitted`.
F5	BLOCKER	HMAC webhooks replayable → signed input now `HMAC-SHA256(token, timestamp \\n job_id \\n body)`. New `X-Rewind-Timestamp` header with ±5min tolerance enforced by SDK. Process-local seen-cache of recent `job_id`s defeats replays inside the tolerance window (200 + `duplicate=true`). Reference vectors re-pinned both sides. Tests: `test_verify_signature_rejects_stale_timestamp`, `test_verify_signature_rejects_future_timestamp`, `test_asgi_handler_replay_attack_short_circuits_with_duplicate` + matching Rust `compute_signature_changes_when_timestamp_changes`.
F6	BLOCKER	Registration-time SSRF only → `Dispatcher::dispatch` re-runs `url_guard::validate_export_endpoint` at dispatch time. Closes the DNS-rebind window between registration and dispatch. Residual race against reqwest's connect-time re-resolve documented as a known limitation requiring a custom hyper connector (deferred to v3.1).
F7	BLOCKER	UI misses fast updates → `useReplayJob` treats WebSocket frames as INVALIDATION (refetch `GET /api/replay-jobs/{id}`) rather than source-of-truth. After dispatch returns 202, immediately fetches current job state to seed UI. Out-of-order / missing frames both resolve to "latest server state".
N1	NOTED	Cancellation shouldn't have shipped (deferred to v3.1) → removed `DELETE /api/replay-jobs/{id}` endpoint, the dashboard cancel button, the API client method, and updated `docs/runners.md` to point operators at "kill the runner process; lease reaper marks errored".
N2	NOTED	E2E doesn't exercise SDK base_url logic → SDK code paths now have explicit unit-test coverage in `test_runner.py` (the F4 reporter tests above + replay-attack idempotency). The e2e was rewritten to verify the F2 (`replay_context_timeline_id`) + F5 (`X-Rewind-Timestamp` + 3-line signing recipe) wire-format additions; the raw stub stays for test-runtime stability (a dedicated event loop per HTTPServer request was flaky).

Test posture after the fix:

16 replay_jobs_tests pass (was 12, +4 for F1 + F3)
28 runner unit tests pass (was 19, +9 for F4 + F5 + F2 coverage)
108 rewind-web lib tests (adds 2 new dispatcher tests covering timestamp variation + 3-line signing input)
102 rewind-store tests (_test_set_replay_context_last_accessed helper exposed publicly with _test_ prefix + docstring contract)
490 pytest pass (1 skip), ruff clean, clippy clean on touched crates, web build green

Wire-format changes (operators take note):

Outbound webhooks now carry X-Rewind-Timestamp and the signed input includes it. Pre-Phase 3 (commits 5-13): runner registry full lifecycle — dispatcher, callbacks, dashboard, runner SDK, CLI, e2e, docs, version bump #154 runners verifying with the 2-line input will see signature mismatches on dispatches from a v0.14.x server. The Python SDK falls back to the legacy 2-line input only when the timestamp header is absent (defensive — production dispatchers always emit it).
Dispatch payload gains replay_context_timeline_id (not breaking; old runners ignore unknown fields).

CI will pick up the push. Ready for re-review.

risjai

Santa Re-Review: still NAUGHTY

Most of the previous blockers are addressed and CI is green, but I still see one remaining correctness blocker plus one important subprocess/bootstrap gap.

Remaining Blocker

Runner callback race can still drop the started event and leave jobs stuck. The Python runner asgi_handler() schedules _run() before returning 202; _run() immediately calls reporter.started(). The server only transitions pending -> dispatched after the dispatcher receives the runner’s 202 and applies the outcome. If started arrives while the job is still pending, the state machine rejects it (Started is only legal from dispatched). ProgressReporter only logs the 409, so the handler continues and later progress/completed events can also be rejected or leave the job in dispatched until the reaper marks it errored.

Suggested fixes:

Move job to dispatched before sending the webhook, then revert/mark errored if dispatch fails; or
Allow authenticated started to transition pending -> in_progress for this dispatch race; or
Change the runner SDK to send started only after a server-ack/state poll confirms dispatched.

Also Still Concerning

Env-var bootstrap still doesn’t carry the timeline id. _install.py reads REWIND_SESSION_ID and REWIND_REPLAY_CONTEXT_ID, but not a replay context timeline id. So subprocess-based agents using env bootstrap can still miss _timeline_id, meaning live misses may record into the wrong timeline. The direct SDK path supports attach_replay_context(..., timeline_id=...), but the env path does not.

Please add REWIND_REPLAY_CONTEXT_TIMELINE_ID (or equivalent) to the dispatch/subprocess environment and pass it through to attach_replay_context() during bootstrap.

Two findings from the second Santa pass on PR #154. BLOCKER — runner callback race drops `started`. The Python SDK's asgi_handler() schedules _run() (which immediately calls reporter.started()) before the dispatcher's apply_outcome flips state pending->dispatched. Under the strict state machine from #152 round 2, the server rejected `Started` from `Pending` as illegal: runner saw 409, ProgressReporter only logged it, the run proceeded anyway, and the job sat in `pending` until the reaper marked it errored. Fix: transition pending->dispatched SYNCHRONOUSLY in the HTTP handler before tokio::spawn. The runner now ALWAYS finds the job in `dispatched` when it tries to call back. apply_outcome is restructured: success path only sets deadline+lease (state already advanced); failure path transitions dispatched->errored (legal startup-failure transition per the existing state machine). If `started` lands first (very fast runners), the failure-path transition becomes a no-op (SQL guard refuses Dispatched->Errored when current state is in_progress) and the run proceeds. The response body's `state` field also flips from `pending` to `dispatched` accordingly — dashboards see the real state immediately without polling. CONCERNING — env-var bootstrap missing timeline. _install.py recognized REWIND_SESSION_ID + REWIND_REPLAY_CONTEXT_ID but not REWIND_REPLAY_CONTEXT_TIMELINE_ID, so subprocess-based agents using env bootstrap re-introduced the F2 bug for that path. intercept._install._bootstrap_replay_context_from_env now reads the env var and forwards it to attach_replay_context(timeline_id=...). A WARN log fires when the env var is missing. Tests: - 18 replay_jobs_tests pass (was 16, +2 new): create_replay_job_response_state_is_dispatched_not_pending and started_event_accepted_immediately_after_dispatch_response prove the race is closed. - 30 runner unit tests pass (was 28, +1 new env-var-with-timeline test; existing test updated to assert _timeline_id stays None when REWIND_REPLAY_CONTEXT_TIMELINE_ID is absent). - 108 rewind-web lib tests, 102 rewind-store, 491 pytest, ruff + clippy clean, web build green. docs/runners.md updated with REWIND_REPLAY_CONTEXT_TIMELINE_ID recommendation in the env-var bootstrap example. Made-with: Cursor

risjai · 2026-04-28T06:59:58Z

Both findings from the round-2 Santa pass addressed in f795f2b.

#	Severity	Finding → Fix
BLOCKER	Runner callback race drops `started`	`create_replay_job` now transitions `pending → dispatched` SYNCHRONOUSLY before tokio::spawn. The runner ALWAYS finds the job in `dispatched` when it tries to call back. apply_outcome restructured: success path only sets deadline+lease; failure path transitions `dispatched → errored` (legal startup-failure transition per #152 round 2). If `started` lands first (very fast runners), the failure-path transition becomes a no-op (SQL guard refuses `Dispatched → Errored` when current state is `in_progress`) and the run proceeds. Response `state` field flips from `pending` to `dispatched` accordingly — dashboards see the real state immediately.
CONCERNING	Env-var bootstrap missing timeline	`_install._bootstrap_replay_context_from_env` now reads `REWIND_REPLAY_CONTEXT_TIMELINE_ID` and forwards it to `attach_replay_context(timeline_id=...)`. WARN log fires when the env var is missing (subprocess-bootstrap path matches the direct SDK path). Documented in `docs/runners.md` with example `subprocess.run(env=...)` snippet.

Picked option (a) from the suggested fix set ("Move job to dispatched before sending the webhook, then revert/mark errored if dispatch fails") — preserves the strict state machine from #152 round 2 without re-widening it. Options (b) and (c) would either weaken the state machine (b) or require runner-side polling (c, additional round-trip per dispatch).

Subtle property: if a very fast runner emits started AFTER the pre-spawn transition but BEFORE the dispatcher's HTTP call returns, the apply_outcome failure transition (Dispatched → Errored) becomes a no-op because the SQL guard WHERE state NOT IN ('completed','errored') matches but the current state is now in_progress, not dispatched — so the UPDATE matches in_progress rows too. Wait, that means a started-then-failed-dispatch combo could mark in_progress jobs errored. Let me re-verify the guard...

Actually the guard is WHERE id = ? AND state NOT IN ('completed', 'errored') — it matches pending, dispatched, or in_progress. So a failure-path Errored transition WOULD overwrite an in_progress job. That's a small follow-up to consider: tighten the failure-path guard to only target dispatched state. Filed for follow-up; non-blocking for this PR since the practical race window (between pre-spawn transition and apply_outcome) is the duration of the dispatch HTTP call, ~tens to hundreds of ms.

Test posture:

18 replay_jobs_tests pass (was 16, +2): create_replay_job_response_state_is_dispatched_not_pending and started_event_accepted_immediately_after_dispatch_response prove the race is closed.
30 runner unit tests pass (was 28, +1): new test_install_bootstraps_with_timeline_id_from_env covers the env-var path; existing test_install_bootstraps_from_env updated to assert _timeline_id stays None when the env var is absent.
491 pytest pass; clippy + ruff clean; web build green.

CI will pick up the push.

…view #154 round 2 follow-up) Self-caught follow-up to f795f2b. The pre-spawn pending->dispatched transition opened a race where the runner's `started` event could land while the dispatcher's HTTP call is still in flight, advancing the job to in_progress. If that HTTP call THEN failed, the failure-path apply_outcome's advance_replay_job_state(Errored) matched the in_progress row (the existing SQL guard only excludes terminal states) and clobbered legitimate started state. Fix: new Store::mark_dispatched_job_as_errored helper with the strict guard `WHERE id = ? AND state = 'dispatched'`. The failure-path apply_outcome now uses this. UPDATE is a no-op when state has advanced to in_progress, preserving the legitimate run. Includes a regression test (failed_dispatch_does_not_clobber_already_in_progress_job) that inserts an in_progress row directly, calls mark_dispatched_job_as_errored, and asserts the row is unchanged. 19 replay_jobs_tests pass (was 18, +1). Full suite still green. Made-with: Cursor

risjai · 2026-04-28T07:02:57Z

Self-caught follow-up to my previous comment, pushed as 023bd3f.

I called out the in_progress-clobber risk in my reply ("that means a started-then-failed-dispatch combo could mark in_progress jobs errored"), then realized I should fix it now rather than file as follow-up. The risk is real — pre-spawn transition + fast runner + failed dispatch HTTP call would have the failure path's advance_replay_job_state(Errored) match the in_progress row.

Fix: new Store::mark_dispatched_job_as_errored helper with the strict guard WHERE id = ? AND state = 'dispatched'. The failure-path apply_outcome uses this; UPDATE is a no-op when state has already advanced to in_progress, so the legitimate run continues.

New test failed_dispatch_does_not_clobber_already_in_progress_job inserts an in_progress row directly, calls the helper, and asserts the row is unchanged.

19 replay_jobs_tests pass (was 18, +1). Full suite remains green.

risjai added 9 commits April 28, 2026 10:00

risjai commented Apr 28, 2026

View reviewed changes

vercel Bot deployed to Preview April 28, 2026 05:56 View deployment

risjai commented Apr 28, 2026

View reviewed changes

vercel Bot deployed to Preview April 28, 2026 06:59 View deployment

vercel Bot deployed to Preview April 28, 2026 07:03 View deployment

risjai merged commit 7fe91f8 into master Apr 28, 2026
7 checks passed

risjai deleted the feat/phase-3-rest branch April 28, 2026 07:05

risjai mentioned this pull request May 11, 2026

fix(replay): dispatch source_timeline_id so handler reads edited steps #167

Merged

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Phase 3 (commits 5-13): runner registry full lifecycle — dispatcher, callbacks, dashboard, runner SDK, CLI, e2e, docs, version bump#154

Phase 3 (commits 5-13): runner registry full lifecycle — dispatcher, callbacks, dashboard, runner SDK, CLI, e2e, docs, version bump#154
risjai merged 12 commits into
masterfrom
feat/phase-3-rest

risjai commented Apr 28, 2026

Uh oh!

vercel Bot commented Apr 28, 2026 •

edited

Loading

Uh oh!

risjai left a comment

Uh oh!

risjai commented Apr 28, 2026

Uh oh!

risjai left a comment

Uh oh!

risjai commented Apr 28, 2026

Uh oh!

risjai commented Apr 28, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

risjai commented Apr 28, 2026

Summary

What ships

API / wire format

State machine + reaper

Bootstrap (server)

Test plan

Notes for reviewers

Uh oh!

vercel Bot commented Apr 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

risjai left a comment

Choose a reason for hiding this comment

Santa Verdict: NAUGHTY

Blocking Findings

Also Noted

Passes

Recommended Fix Set

Uh oh!

risjai commented Apr 28, 2026

Uh oh!

risjai left a comment

Choose a reason for hiding this comment

Santa Re-Review: still NAUGHTY

Remaining Blocker

Also Still Concerning

Uh oh!

risjai commented Apr 28, 2026

Uh oh!

risjai commented Apr 28, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

vercel Bot commented Apr 28, 2026 •

edited

Loading