Skip to content

Phase 3 (commits 5-13): runner registry full lifecycle — dispatcher, callbacks, dashboard, runner SDK, CLI, e2e, docs, version bump#154

Merged
risjai merged 12 commits into
masterfrom
feat/phase-3-rest
Apr 28, 2026
Merged

Phase 3 (commits 5-13): runner registry full lifecycle — dispatcher, callbacks, dashboard, runner SDK, CLI, e2e, docs, version bump#154
risjai merged 12 commits into
masterfrom
feat/phase-3-rest

Conversation

@risjai
Copy link
Copy Markdown
Collaborator

@risjai risjai commented Apr 28, 2026

Summary

Closes the loop on Phase 3 — the dashboard becomes a control surface instead of a viewer. Operators register long-lived runner processes; clicking "Run replay" on a session step HMAC-signs a webhook to the runner; the runner executes the agent under a replay context; progress streams back to the dashboard live via WebSocket.

This is commits 5-13 of the 13-commit Phase 3 plan, continuing from commits 1-4 already merged in PRs #150, #151, #152, #153.

What ships

commit What
5 Webhook dispatcher (HMAC signing, 5s timeout) + lease reaper (30s scan)
6 POST /api/sessions/{sid}/replay-jobs (dual-shape A/B) + POST /api/replay-jobs/{id}/events (X-Rewind-Runner-Auth, ownership check, atomic event ingestion) + WS broadcast
7 Web RunnersPage (list / register / regenerate / remove + raw-token reveal modal)
8 Web RunReplayButton + ReplayJobModal + useReplayJob hook + replay_job_update WebSocket frame
9 Python rewind_agent.runner library (RunnerConfig, compute_signature, verify_signature, ProgressReporter, asgi_handler, @handle_replay) + ExplicitClient.attach_replay_context + env-var bootstrap
10 rewind runners {list, add, remove, regenerate-token} CLI
11 E2E test exercising the full lifecycle against a real binary + stub runner; REWIND_ALLOW_LOOPBACK_WEBHOOKS dev escape hatch
12 docs/runners.md operator how-to + decision matrix updated 3-way → 5-way (also bundles deferred Phase 2 doc)
13 Rust workspace 0.13.0 → 0.14.0 across 4 mirror files (Python SDK + MCP versions stay per CLAUDE.md track-2)

API / wire format

Outbound dispatch (Rewind → runner):
```
POST <runner.webhook_url>
X-Rewind-Job-Id:
X-Rewind-Signature: sha256= # HMAC-SHA256(raw_token, job_id || \n || raw_body)
Content-Type: application/json
{ "job_id", "session_id", "replay_context_id", "base_url" }
```

Inbound events (runner → Rewind, mounts OUTSIDE bearer middleware):
```
POST /api/replay-jobs/{id}/events
X-Rewind-Runner-Auth: # SHA-256 lookup; ownership check on job.runner_id
{ "event_type": "started|progress|completed|errored", "step_number?", "progress_total?", ... }
```

Reference vectors pinned both sides: crates/rewind-web/src/dispatcher.rs::compute_signature_matches_python_referencepython/tests/test_runner.py::test_compute_signature_matches_rust_reference.

State machine + reaper

Same SQL-level terminal-state guard from #152 round 2 protects every transition. Reaper marks dispatched-but-never-started → errored stage='dispatch' and in_progress-without-heartbeat → errored stage='lease_expired'.

Bootstrap (server)

```bash
export REWIND_RUNNER_SECRET_KEY="$(openssl rand -base64 32)" # required
export REWIND_PUBLIC_URL="https://rewind.example\" # for non-local
rewind web
```

Unset key → 503 with bootstrap message. Set-but-malformed → server panics at startup (per #153 MEDIUM 5 fix).

Test plan

  • cargo test --workspace — green
  • cargo clippy on rewind-web --lib, rewind-store --all-targets, rewind-cli — clean (pre-existing test-file warnings on recording_api_tests / transcript_sync_tests unchanged from master)
  • 24 runners_tests + 12 new replay_jobs_tests integration tests
  • 481 pytest pass, 1 skipped (Python — includes 19 new test_runner.py + 1 e2e)
  • web build green (426 KB / 119 KB gz; 1705 modules)
  • E2E flow verified end-to-end via python/tests/e2e/test_runner_replay_flow.py
  • All previous phase work intact (Phase 1 + 2 + 3 commits 1-4)

Notes for reviewers

Made with Cursor

risjai added 9 commits April 28, 2026 10:00
dispatcher.rs:
  - Dispatcher::dispatch() decrypts runner token, builds canonical
    body, signs with HMAC-SHA256(raw_token, job_id || \n || body),
    POSTs to runner.webhook_url with X-Rewind-Job-Id +
    X-Rewind-Signature headers, 5s timeout.
  - Success → state pending->dispatched, sets dispatch_deadline_at
    (10s) + lease_expires_at (5min).
  - Any failure → state pending->errored, error_stage='dispatch'.
  - State transitions go through Store::advance_replay_job_state
    (commit 3's atomic SQL guard).
  - 9 unit tests including a Python HMAC reference cross-check
    so the runner SDK in commit 9 stays bit-compatible.

reaper.rs:
  - Background tokio task spawned by WebServer::run().
  - 30s scan: dispatch deadline expired (dispatched but no Started
    in 10s) -> errored stage='dispatch'; lease expired (in_progress
    but no heartbeat in 5min) -> errored stage='lease_expired'.
  - Each transition broadcasts on the StoreEvent bus so the
    dashboard sees lease failures live.
  - 3 tests covering both expiry classes and the no-op-when-fresh
    case.

Store helpers:
  - set_dispatch_deadline_and_lease() (called by dispatcher).
  - list_dispatch_deadline_expired() (used by reaper).

WebSocket:
  - StoreEvent::ReplayJobUpdated variant added.
  - ws.rs forwards it to clients subscribed to the owning session
    via new ServerMessage::ReplayJobUpdate frame.

AppState:
  - dispatcher: Option<Dispatcher>, base_url: String.
  - bootstrap_base_url() reads REWIND_PUBLIC_URL with sane local
    default; documented for ops.

Existing test fixtures (api_tests, auth_tests, hooks_tests,
recording_api_tests, transcript_sync_tests) updated for the new
AppState fields.

Tests: 106 lib + 141 integration green; ruff clean; pytest unaffected.
Made-with: Cursor
…ase 3, commit 6/13)

POST /api/sessions/{sid}/replay-jobs (bearer-protected):
  - Dual-shape body via serde(untagged):
    - Shape A: {runner_id, source_timeline_id, at_step, strict_match?}
      → server forks timeline + creates replay context + creates job
        atomically.
    - Shape B: {runner_id, replay_context_id} → reuse-existing-context;
      validates context belongs to session and isn't already in-flight
      via new Store::count_in_flight_jobs_for_replay_context (one-job-
      per-context invariant prevents cursor races).
  - Both fire dispatcher in a tokio::spawn (fire-and-forget) and
    return 202 Accepted with the job_id immediately.
  - 503 if crypto key not configured; 404 on unknown session/runner;
    400 on invalid timeline; 409 on inactive runner / context-in-use.

GET /api/sessions/{sid}/replay-jobs — list jobs for a session.
GET /api/replay-jobs/{id} — fetch a single job.
DELETE /api/replay-jobs/{id} — operator cancel (force errored
  with stage='cancelled'; v1 doesn't notify the runner).

POST /api/replay-jobs/{id}/events (mounted OUTSIDE bearer middleware):
  - Auth: X-Rewind-Runner-Auth: <token> header. Hashed and looked
    up via auth_token_hash index. Ownership check: job.runner_id
    must match the authenticated runner's id (403 otherwise).
  - Wraps Store::record_replay_job_event_atomic from #152 round 2 →
    full state-machine guard. Illegal transitions return 409 with
    accepted=false.
  - Emits StoreEvent::ReplayJobUpdated → WebSocket broadcast.
  - touch_runner_last_seen() for liveness tracking.

Dispatcher refactor:
  - dispatch() is now PURE: returns DispatchOutcome instead of
    holding a Store reference across .await. Caller (HTTP handler)
    applies the outcome via Dispatcher::apply_outcome inside a
    briefly-held Mutex lock. Closes the lock-across-await footgun
    that clippy::await_holding_lock would flag.

WebServer.build_router:
  - runner_callback_routes mounted alongside /_rewind/health (outside
    the protected layer) so X-Rewind-Runner-Auth replaces bearer.

Tests: 12 new replay_jobs_tests (full event lifecycle: shape-B
dispatch with stub runner, 500-response failure, context-in-use,
unknown session, inactive runner, runner-auth missing, runner-auth
unknown, ownership mismatch, started transition, progress update,
completed transition, illegal-transition 409). Stub runner is a
tokio TcpListener-backed axum server.

Storage: count_in_flight_jobs_for_replay_context (used by shape-B
contention check).

All checks: 250+ rewind-web tests pass; clippy clean on new code.

Made-with: Cursor
…nv bootstrap (Phase 3, commit 9/13)

ExplicitClient.attach_replay_context(session_id, replay_context_id):
  - Sets the contextvars without creating a new replay context.
  - Resolves Phase 3 plan HIGH #4: runners receive an existing
    context in their dispatch payload and must not create a duplicate.

intercept.install():
  - Bootstraps from REWIND_SESSION_ID + REWIND_REPLAY_CONTEXT_ID env
    vars before patching transports. Both must be set together;
    either-but-not-both logs a WARN. Lets a runner subprocess be
    configured purely via env without touching the SDK.

rewind_agent/runner.py (new, ~370 LOC):
  - RunnerConfig.from_env() — reads REWIND_RUNNER_TOKEN + REWIND_URL.
  - compute_signature(token, job_id, body) — pure HMAC-SHA256
    matching the Rust dispatcher's wire format. Cross-checked
    against the same Python reference vector both crates pin
    against (drift between Rust + Python = silent dispatch failure).
  - verify_signature(config, headers, body_bytes) — case-insensitive
    header lookup, constant-time hmac.compare_digest. Returns
    (ok, reason) tuple.
  - DispatchPayload — typed dataclass mirroring the Rust DispatchBody.
  - ProgressReporter — async wrapper around POST /api/replay-jobs/
    {id}/events; httpx-preferred with urllib fallback. Methods:
    started(), progress(step, total?, payload?), completed(),
    errored(message, stage?).
  - asgi_handler(...) — verifies signature, decodes payload,
    fires user handler as background task, returns (202, {job_id})
    immediately so the dispatcher's 5s timeout is satisfied. On
    handler exception, auto-emits errored event with the exception
    message + error_stage='agent'.
  - @handle_replay marker decorator for documentation/discovery.

Tests (19 new, 480 total pass):
  - RunnerConfig.from_env: requires token; honors defaults; honors REWIND_URL.
  - compute_signature: matches Rust reference vector byte-for-byte.
  - verify_signature: accepts canonical, rejects missing headers,
    malformed prefix, tampered body, wrong token; case-insensitive.
  - DispatchPayload decode round-trip.
  - asgi_handler: invalid signature returns 401 + handler not invoked;
    invalid body returns 400; valid request → 202 + handler runs +
    started/progress/completed events emitted in order; user code
    raising → errored event emitted with the exception message.
  - attach_replay_context sets contextvars.
  - intercept.install env-var bootstrap (both vars set; partial vars
    logged + skipped).

ruff clean. Pre-existing 461 tests still pass; new total 480/480 + 1 skip.

Made-with: Cursor
…e 3, commit 10/13)

Thin HTTP wrapper over /api/runners. All four subcommands take
--rewind-url (defaults from REWIND_URL env, falls back to
http://127.0.0.1:4800) and --token (defaults from REWIND_AUTH_TOKEN
for the dashboard's bearer auth).

  rewind runners list                       — pretty-printed table
  rewind runners add --name N --webhook-url U — returns RAW token ONCE
  rewind runners remove <id>                — 409 surfaced as 'removal blocked'
  rewind runners regenerate-token <id>      — returns NEW raw token

The 'add' output mimics the dashboard's RegisterRunnerResponse:
prints the raw token in bold yellow, reminds the operator it can't
be retrieved later, and prints an export-ready REWIND_RUNNER_TOKEN
line for one-line copy-paste into a runner's environment.

Build green; cli help working.

Made-with: Cursor
…dal (Phase 3, commit 7/13)

- RunnersPage.tsx — table view of registered runners with status
  pill (active/disabled/stale), webhook URL, token preview, last-
  seen timestamp. TanStack Query keeps it auto-refreshed every 10s.
- RegisterRunnerModal — form with name + webhook_url; surfaces
  server-side validation errors (SSRF rejection, polling-deferred,
  malformed URL) inline.
- RawTokenModal — one-shot reveal of the raw auth token with
  Copy-to-clipboard button and a yellow 'cannot be retrieved'
  warning. Shows export-ready REWIND_RUNNER_TOKEN line for paste.
- Sidebar — new 'Runners' nav entry (Server icon).
- App.tsx + use-store.ts — 'runners' added to view union.
- lib/api.ts — runner CRUD + replay-job endpoints + types
  (RunnerView, RegisterRunnerResponse, ReplayJobView,
  CreateReplayJobBody — re-exported for commit 8 consumers).

Web build green (419 KB gz 117 KB; 1703 modules).

Made-with: Cursor
…se 3, commit 8/13)

RunReplayButton.tsx exports both the inline button and the ReplayJobModal.
Modal flow: pick active runner -> optional strict_match -> POST shape A
(server forks + creates context atomically) -> live progress.

use-replay-job.ts: hook backing the modal. dispatch() POSTs the job,
subscribes via useWebSocket for replay_job_update frames matching this
job_id, exposes state/progress/error that update live.

use-websocket.ts: new ReplayJobUpdateData type + onReplayJobUpdate callback.

StepTimeline.tsx: per-step Run-replay button (cyan, hover-revealed)
alongside the existing Fork button.

SessionView.tsx: modalState union extended with 'runReplay'. Both
StepTimeline and ActivityTimeline forward onRunReplay (ActivityTimeline
forwarding is a placeholder; UI integration deferred).

UX:
- No-runners fallback shows CLI invocation + link to Runners page.
- In-flight: progress bar, state pill, fork-timeline-id surfaced.
- Terminal: success/error icon swap, modal stays open for inspection.
- Cancel: DELETE /api/replay-jobs/{id} (operator cancel; v1 fire-and-forget).

Web build green (426 KB / 119 KB gz; 1705 modules).

Made-with: Cursor
…ev escape (Phase 3, commit 11/13)

python/tests/e2e/test_runner_replay_flow.py exercises the complete dispatch
+ callback lifecycle against a real Rewind binary:

  1. Spawn the rewind release binary with REWIND_RUNNER_SECRET_KEY +
     REWIND_AUTH_TOKEN + REWIND_ALLOW_LOOPBACK_WEBHOOKS=1.
  2. Stand up a stub runner HTTP server on a free port.
  3. Register the runner via POST /api/runners with bearer auth;
     capture raw_token from one-shot response.
  4. Seed a session via POST /api/sessions/start + record one
     llm-call so a fork target exists.
  5. Dispatch shape A (server forks + creates context) via POST
     /api/sessions/{sid}/replay-jobs at_step=1.
  6. Wait for stub runner to receive the dispatch; verify
     X-Rewind-Job-Id + X-Rewind-Signature headers + reconstructed
     HMAC against the wire format the dispatcher uses.
  7. Stub posts started -> progress(1/2) -> progress(2/2) -> completed
     events back via POST /api/replay-jobs/{id}/events with
     X-Rewind-Runner-Auth header.
  8. Poll job state until completed; assert progress_step=2,
     progress_total=2.

Test skips when ./target/release/rewind isn't built (handles
multiple working directory layouts via _resolve_rewind_bin).

REWIND_ALLOW_LOOPBACK_WEBHOOKS=1 escape hatch in the register
handler bypasses the SSRF guard for localhost webhooks. NEVER set
in production - any operator who can reach the dashboard could
then dispatch webhooks at localhost / cloud-metadata IPs. Strictly
for dev/CI where the runner stub lives on localhost.

481 python tests pass (was 480, +1 e2e).

Made-with: Cursor
…ase 3, commit 12/13)

docs/runners.md — full operator-facing guide:
- Bootstrap (REWIND_RUNNER_SECRET_KEY + REWIND_PUBLIC_URL).
- Register from dashboard or CLI; raw-token surface discipline.
- Implement runner side (FastAPI/asgi_handler example);
  REWIND_RUNNER_TOKEN env; alternative env-var bootstrap path.
- Dispatch from dashboard or curl; both API shapes (A: server-fork,
  B: reuse-context).
- Full state machine table (pending -> dispatched -> in_progress
  -> completed/errored).
- Security model: outbound HMAC recipe, inbound runner-auth +
  ownership check, SSRF guard + REWIND_ALLOW_LOOPBACK_WEBHOOKS
  dev escape.
- Lease + reaper semantics + dispatch deadline.
- Troubleshooting: SSRF rejection, polling-deferred, active-jobs
  block on remove/rotate, signature mismatch, ownership 403.

docs/recording.md — decision matrix updated from 'three ways to
record' to **'five ways to record + replay'**:
  1. Direct mode (Python init())
  2. HTTP intercept
  3. cached_llm_call decorator (bundled from deferred Phase 2 doc)
  4. Proxy mode (any language)
  5. Dashboard runner (Phase 3)

docs/getting-started.md — terminal pointer updated to highlight
the 5-way matrix and link runners.md.

Made-with: Cursor
…ommit 13/13)

Per CLAUDE.md track-1 versioning rules:
- Rust workspace + crates: 0.13.0 -> 0.14.0 (v0.13.0 was released
  as GitHub tag on Apr 27 and crates/ + web/src/ both changed in
  this PR — schema migration in commit 3, dispatcher + reaper in
  commit 5, dispatch + event endpoints in commit 6, web UI in 7+8).
  Cargo.toml + Cargo.lock + python/rewind_cli.py CLI_VERSION +
  python-mcp/rewind_mcp_cli.py CLI_VERSION all moved together.

Per CLAUDE.md track-2 (Python SDK):
- python/pyproject.toml stays at 0.15.0 (PyPI is at 0.14.8 — 0.15.0
  is unpublished, so changes ride the existing version).
- python-mcp/pyproject.toml stays at 0.13.0 (no MCP API change AND
  PyPI is at 0.12.10 — unpublished, no bump needed).

Also: ruff --fix on three unused-import warnings in the new
python/tests/e2e/test_runner_replay_flow.py (asyncio, sys,
typing.Optional). All 481 pytest pass, web build green, Rust tests
green, clippy clean on the touched crates.

Made-with: Cursor
@vercel
Copy link
Copy Markdown

vercel Bot commented Apr 28, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
rewind Ready Ready Preview, Comment Apr 28, 2026 7:03am

Copy link
Copy Markdown
Collaborator Author

@risjai risjai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Santa Verdict: NAUGHTY

Both independent Santa reviewers returned FAIL. CI is green now, and this PR covers a lot of the Phase 3 lifecycle, but there are still blocking correctness/security issues in the runner replay path.

Blocking Findings

  • Shape A accepts strict_match but does not apply it. CreateReplayJobShapeA includes strict_match, but after create_replay_context(...) the code never calls set_replay_context_strict_match. The dashboard can request strict replay and still get warn-on-divergence behavior. Add the setter and an integration test proving strict runner replay returns/raises on divergence.

  • Runner replay does not bind live recordings to the fork timeline. ExplicitClient.attach_replay_context() only sets _session_id and _replay_context_id; it does not set _timeline_id. The dispatch payload also omits fork_timeline_id, and the SDK does not resolve the replay context's timeline before running the agent. Cache misses after the fork can therefore record new live steps into the root timeline instead of the replay fork. Include the fork/context timeline in the payload or make attach_replay_context() fetch context metadata and set _timeline_id before any live miss can record.

  • Shape B reuse does not validate cursor state or context expiry. It checks session ownership and in-flight jobs, but does not reject replay contexts whose cursor was already advanced (current_step != from_step) or expired/near-expired contexts. A reused context can start from the wrong ordinal. Add current-step and TTL validation as described in the plan.

  • Runner SDK ignores the dispatch payload base_url for callbacks. The payload includes base_url, but ProgressReporter builds event URLs from RunnerConfig.rewind_base_url / REWIND_URL. Dispatches only report progress correctly if the runner separately has matching config. Default the reporter to payload.base_url, with config override only when explicitly requested.

  • Outbound webhook HMACs are replayable. The signature covers job_id || body, but has no timestamp, delivery id, expiry window, or runner-side job-id/delivery dedupe before scheduling user code. A captured signed webhook can be replayed to launch the agent again even if later event callbacks are rejected. Add X-Rewind-Timestamp + tolerance window and runner-side idempotency keyed by job_id or delivery_id before invoking the handler.

  • SSRF validation is registration-time only and is vulnerable to DNS rebinding. validate_export_endpoint() is called when the runner URL is registered, but Dispatcher later uses reqwest against the stored URL and explicitly does not revalidate. If DNS changes after registration, the dispatcher can POST to private/link-local/metadata IPs. Revalidate/pin resolution at dispatch time, or use a connector/resolver that rejects private/reserved targets on connect. Strongly consider requiring HTTPS for non-loopback webhooks.

  • Dashboard progress can miss fast job updates. The hook/modal subscribes after job creation, and the WebSocket callback setup can miss transient ReplayJobUpdated broadcasts for fast jobs that complete before the modal is listening for the returned job_id. The UI should fetch current job state after dispatch and/or subscribe before dispatch with a correlation token; WebSocket events should be treated as invalidation, not the only source of truth.

Also Noted

  • The implementation reintroduces cancellation via API/UI despite the revised plan explicitly deferring cancellation until a cooperative runner protocol exists. Remove it for v1 or make it clear that this is only server-side abandonment and cannot stop the runner/agent.
  • The E2E path uses a custom stub runner that reads payload.base_url directly, so it does not exercise the shipped rewind_agent.runner SDK behavior where the callback URL bug lives.

Passes

  • CI is green.
  • Versioning looks consistent with the repo rules.
  • Runner token registration/regeneration and basic callback ownership checks are covered better than the earlier PRs.

Recommended Fix Set

  • Wire strict_match into Shape A replay context creation.
  • Ensure runner replay sets the fork timeline context before live misses can record.
  • Validate Shape B current_step == from_step and context freshness/TTL.
  • Make ProgressReporter use dispatch base_url by default.
  • Add dispatch freshness/idempotency protections to HMAC webhooks.
  • Revalidate/pin webhook destinations at dispatch time against SSRF policy.
  • Add tests for strict Shape A, fork-timeline live miss, Shape B consumed/expired contexts, SDK callback base URL, HMAC replay prevention, DNS rebinding/dispatch-time SSRF, and fast UI completion updates.

…ase_url, HMAC replay, dispatch SSRF, fast UI updates, remove cancel

7 blocking + 2 noted findings from the Santa review. All addressed.

F1 — strict_match accepted but ignored. Shape A now calls
set_replay_context_strict_match after create_replay_context when
the request body has strict_match=true. New tests:
shape_a_strict_match_true_sets_replay_context_strict_flag and
shape_a_strict_match_omitted_defaults_to_warn_only.

F2 — fork timeline binding. Dispatch payload now carries
replay_context_timeline_id (server-side). Shape B return also
populates fork_timeline_id with the existing context's timeline
(was null previously). Python ExplicitClient.attach_replay_context
takes an optional timeline_id and sets the _timeline_id contextvar
so live cache misses record into the fork timeline rather than the
root. DispatchPayload tolerates missing field for back-compat.

F3 — Shape B cursor + TTL validation. Refuses contexts whose
cursor != from_step (already mid-replay) and contexts older than
REPLAY_CONTEXT_TTL_SECS (1h, mirrors WebServer cleanup task).
ReplayContextRow gains last_accessed_at; Store helper
_test_set_replay_context_last_accessed enables fast TTL-expiry
tests. New: shape_b_rejects_context_whose_cursor_already_advanced,
shape_b_rejects_expired_context.

F4 — ProgressReporter ignored payload.base_url. Constructor now
takes optional base_url; asgi_handler builds the reporter from
payload.base_url so events POST back to the dispatcher's server
URL (which may differ from the runner's local REWIND_URL behind
a proxy). New: test_progress_reporter_prefers_explicit_base_url_over_config
+ test_progress_reporter_falls_back_to_config_when_base_url_omitted.

F5 — HMAC webhooks replayable. Signed input now includes a unix-
seconds timestamp: HMAC-SHA256(token, ts || \\n || job_id || \\n
|| body). New X-Rewind-Timestamp header sent + verified. Runner SDK
enforces ±5min tolerance window AND a process-local seen-cache of
recent job_ids — captured signed dispatches inside the tolerance
window can't relaunch the agent (200 + duplicate=true on second
arrival). compute_signature gains a timestamp parameter; legacy
two-line input still works for backward compat. Reference vectors
re-pinned in both Rust + Python tests.

F6 — SSRF only at registration. Dispatcher::dispatch now re-runs
url_guard::validate_export_endpoint at dispatch time, closing the
DNS-rebind window between registration and dispatch. Residual race
between this check and reqwest's connection-time re-resolve
documented as a known limitation requiring a custom hyper connector
(deferred). REWIND_ALLOW_LOOPBACK_WEBHOOKS dev escape applies to
both checks.

F7 — UI race on fast jobs. useReplayJob hook now treats WebSocket
frames as INVALIDATION (refetch GET /api/replay-jobs/{id}) rather
than the source of truth. After dispatch returns 202, immediately
fetch the current job state to seed UI in case the job already
advanced past pending before the WS subscription wired up. Fixes
the missing-fast-update class of bugs.

N1 — cancellation removed (deferred to v3.1 per the plan). Dropped
DELETE /api/replay-jobs/{id}, the cancel button, the API client
method, and updated docs/runners.md. Operators kill the runner
process externally; the lease reaper marks the job errored.

N2 — E2E now verifies the F2 + F5 wire-format additions
(replay_context_timeline_id field + X-Rewind-Timestamp header +
3-line signature recipe). The SDK code paths are exercised by
unit tests in test_runner.py (test_progress_reporter_prefers_*,
test_verify_signature_rejects_stale_timestamp,
test_asgi_handler_replay_attack_short_circuits_with_duplicate)
which run faster + more reliably than spinning a dedicated event
loop per HTTP request inside HTTPServer threads.

Tests: 16 replay_jobs_tests pass (was 12, +4); 28 runner unit tests
pass (was 19, +9 for F4/F5/F2 coverage); 108 rewind-web lib tests
(adds 2 new dispatcher tests covering timestamp variation +
3-line signing input); 102 rewind-store tests (the new
_test_set_replay_context_last_accessed helper unused at unit-level);
490 pytest pass; clippy + ruff clean; web build green.

Made-with: Cursor
@risjai
Copy link
Copy Markdown
Collaborator Author

risjai commented Apr 28, 2026

All 7 blockers + 2 noted findings addressed in 27bdf09.

# Severity Finding → Fix
F1 BLOCKER strict_match accepted but ignored → Shape A now calls set_replay_context_strict_match after creating the context. Tests: shape_a_strict_match_true_sets_replay_context_strict_flag, shape_a_strict_match_omitted_defaults_to_warn_only.
F2 BLOCKER Live recordings not bound to fork timeline → dispatch payload now carries replay_context_timeline_id; Shape B response also returns it. Python attach_replay_context(session_id, replay_context_id, timeline_id=...) sets _timeline_id contextvar. Tolerant decode for back-compat.
F3 BLOCKER Shape B doesn't validate cursor/TTL → refuses contexts whose current_step != from_step or older than 1h TTL. New Store::_test_set_replay_context_last_accessed helper for fast TTL-expiry tests. Tests: shape_b_rejects_context_whose_cursor_already_advanced, shape_b_rejects_expired_context.
F4 BLOCKER ProgressReporter ignored payload base_url → constructor takes optional base_url; asgi_handler builds it from payload.base_url. Falls back to config.rewind_base_url for direct/test usage. Tests: test_progress_reporter_prefers_explicit_base_url_over_config, test_progress_reporter_falls_back_to_config_when_base_url_omitted.
F5 BLOCKER HMAC webhooks replayable → signed input now HMAC-SHA256(token, timestamp \\n job_id \\n body). New X-Rewind-Timestamp header with ±5min tolerance enforced by SDK. Process-local seen-cache of recent job_ids defeats replays inside the tolerance window (200 + duplicate=true). Reference vectors re-pinned both sides. Tests: test_verify_signature_rejects_stale_timestamp, test_verify_signature_rejects_future_timestamp, test_asgi_handler_replay_attack_short_circuits_with_duplicate + matching Rust compute_signature_changes_when_timestamp_changes.
F6 BLOCKER Registration-time SSRF only → Dispatcher::dispatch re-runs url_guard::validate_export_endpoint at dispatch time. Closes the DNS-rebind window between registration and dispatch. Residual race against reqwest's connect-time re-resolve documented as a known limitation requiring a custom hyper connector (deferred to v3.1).
F7 BLOCKER UI misses fast updates → useReplayJob treats WebSocket frames as INVALIDATION (refetch GET /api/replay-jobs/{id}) rather than source-of-truth. After dispatch returns 202, immediately fetches current job state to seed UI. Out-of-order / missing frames both resolve to "latest server state".
N1 NOTED Cancellation shouldn't have shipped (deferred to v3.1) → removed DELETE /api/replay-jobs/{id} endpoint, the dashboard cancel button, the API client method, and updated docs/runners.md to point operators at "kill the runner process; lease reaper marks errored".
N2 NOTED E2E doesn't exercise SDK base_url logic → SDK code paths now have explicit unit-test coverage in test_runner.py (the F4 reporter tests above + replay-attack idempotency). The e2e was rewritten to verify the F2 (replay_context_timeline_id) + F5 (X-Rewind-Timestamp + 3-line signing recipe) wire-format additions; the raw stub stays for test-runtime stability (a dedicated event loop per HTTPServer request was flaky).

Test posture after the fix:

  • 16 replay_jobs_tests pass (was 12, +4 for F1 + F3)
  • 28 runner unit tests pass (was 19, +9 for F4 + F5 + F2 coverage)
  • 108 rewind-web lib tests (adds 2 new dispatcher tests covering timestamp variation + 3-line signing input)
  • 102 rewind-store tests (_test_set_replay_context_last_accessed helper exposed publicly with _test_ prefix + docstring contract)
  • 490 pytest pass (1 skip), ruff clean, clippy clean on touched crates, web build green

Wire-format changes (operators take note):

CI will pick up the push. Ready for re-review.

Copy link
Copy Markdown
Collaborator Author

@risjai risjai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Santa Re-Review: still NAUGHTY

Most of the previous blockers are addressed and CI is green, but I still see one remaining correctness blocker plus one important subprocess/bootstrap gap.

Remaining Blocker

  • Runner callback race can still drop the started event and leave jobs stuck. The Python runner asgi_handler() schedules _run() before returning 202; _run() immediately calls reporter.started(). The server only transitions pending -> dispatched after the dispatcher receives the runner’s 202 and applies the outcome. If started arrives while the job is still pending, the state machine rejects it (Started is only legal from dispatched). ProgressReporter only logs the 409, so the handler continues and later progress/completed events can also be rejected or leave the job in dispatched until the reaper marks it errored.

Suggested fixes:

  • Move job to dispatched before sending the webhook, then revert/mark errored if dispatch fails; or
  • Allow authenticated started to transition pending -> in_progress for this dispatch race; or
  • Change the runner SDK to send started only after a server-ack/state poll confirms dispatched.

Also Still Concerning

  • Env-var bootstrap still doesn’t carry the timeline id. _install.py reads REWIND_SESSION_ID and REWIND_REPLAY_CONTEXT_ID, but not a replay context timeline id. So subprocess-based agents using env bootstrap can still miss _timeline_id, meaning live misses may record into the wrong timeline. The direct SDK path supports attach_replay_context(..., timeline_id=...), but the env path does not.

Please add REWIND_REPLAY_CONTEXT_TIMELINE_ID (or equivalent) to the dispatch/subprocess environment and pass it through to attach_replay_context() during bootstrap.

Two findings from the second Santa pass on PR #154.

BLOCKER — runner callback race drops `started`. The Python SDK's
asgi_handler() schedules _run() (which immediately calls
reporter.started()) before the dispatcher's apply_outcome flips
state pending->dispatched. Under the strict state machine from
#152 round 2, the server rejected `Started` from `Pending` as
illegal: runner saw 409, ProgressReporter only logged it, the run
proceeded anyway, and the job sat in `pending` until the reaper
marked it errored.

Fix: transition pending->dispatched SYNCHRONOUSLY in the HTTP
handler before tokio::spawn. The runner now ALWAYS finds the job
in `dispatched` when it tries to call back. apply_outcome is
restructured: success path only sets deadline+lease (state already
advanced); failure path transitions dispatched->errored (legal
startup-failure transition per the existing state machine). If
`started` lands first (very fast runners), the failure-path
transition becomes a no-op (SQL guard refuses Dispatched->Errored
when current state is in_progress) and the run proceeds.

The response body's `state` field also flips from `pending` to
`dispatched` accordingly — dashboards see the real state immediately
without polling.

CONCERNING — env-var bootstrap missing timeline. _install.py
recognized REWIND_SESSION_ID + REWIND_REPLAY_CONTEXT_ID but not
REWIND_REPLAY_CONTEXT_TIMELINE_ID, so subprocess-based agents
using env bootstrap re-introduced the F2 bug for that path.
intercept._install._bootstrap_replay_context_from_env now reads
the env var and forwards it to attach_replay_context(timeline_id=...).
A WARN log fires when the env var is missing.

Tests:
- 18 replay_jobs_tests pass (was 16, +2 new):
  create_replay_job_response_state_is_dispatched_not_pending and
  started_event_accepted_immediately_after_dispatch_response prove
  the race is closed.
- 30 runner unit tests pass (was 28, +1 new env-var-with-timeline
  test; existing test updated to assert _timeline_id stays None
  when REWIND_REPLAY_CONTEXT_TIMELINE_ID is absent).
- 108 rewind-web lib tests, 102 rewind-store, 491 pytest, ruff
  + clippy clean, web build green.

docs/runners.md updated with REWIND_REPLAY_CONTEXT_TIMELINE_ID
recommendation in the env-var bootstrap example.

Made-with: Cursor
@risjai
Copy link
Copy Markdown
Collaborator Author

risjai commented Apr 28, 2026

Both findings from the round-2 Santa pass addressed in f795f2b.

# Severity Finding → Fix
BLOCKER Runner callback race drops started create_replay_job now transitions pending → dispatched SYNCHRONOUSLY before tokio::spawn. The runner ALWAYS finds the job in dispatched when it tries to call back. apply_outcome restructured: success path only sets deadline+lease; failure path transitions dispatched → errored (legal startup-failure transition per #152 round 2). If started lands first (very fast runners), the failure-path transition becomes a no-op (SQL guard refuses Dispatched → Errored when current state is in_progress) and the run proceeds. Response state field flips from pending to dispatched accordingly — dashboards see the real state immediately.
CONCERNING Env-var bootstrap missing timeline _install._bootstrap_replay_context_from_env now reads REWIND_REPLAY_CONTEXT_TIMELINE_ID and forwards it to attach_replay_context(timeline_id=...). WARN log fires when the env var is missing (subprocess-bootstrap path matches the direct SDK path). Documented in docs/runners.md with example subprocess.run(env=...) snippet.

Picked option (a) from the suggested fix set ("Move job to dispatched before sending the webhook, then revert/mark errored if dispatch fails") — preserves the strict state machine from #152 round 2 without re-widening it. Options (b) and (c) would either weaken the state machine (b) or require runner-side polling (c, additional round-trip per dispatch).

Subtle property: if a very fast runner emits started AFTER the pre-spawn transition but BEFORE the dispatcher's HTTP call returns, the apply_outcome failure transition (Dispatched → Errored) becomes a no-op because the SQL guard WHERE state NOT IN ('completed','errored') matches but the current state is now in_progress, not dispatched — so the UPDATE matches in_progress rows too. Wait, that means a started-then-failed-dispatch combo could mark in_progress jobs errored. Let me re-verify the guard...

Actually the guard is WHERE id = ? AND state NOT IN ('completed', 'errored') — it matches pending, dispatched, or in_progress. So a failure-path Errored transition WOULD overwrite an in_progress job. That's a small follow-up to consider: tighten the failure-path guard to only target dispatched state. Filed for follow-up; non-blocking for this PR since the practical race window (between pre-spawn transition and apply_outcome) is the duration of the dispatch HTTP call, ~tens to hundreds of ms.

Test posture:

  • 18 replay_jobs_tests pass (was 16, +2): create_replay_job_response_state_is_dispatched_not_pending and started_event_accepted_immediately_after_dispatch_response prove the race is closed.
  • 30 runner unit tests pass (was 28, +1): new test_install_bootstraps_with_timeline_id_from_env covers the env-var path; existing test_install_bootstraps_from_env updated to assert _timeline_id stays None when the env var is absent.
  • 491 pytest pass; clippy + ruff clean; web build green.

CI will pick up the push.

…view #154 round 2 follow-up)

Self-caught follow-up to f795f2b. The pre-spawn pending->dispatched
transition opened a race where the runner's `started` event could
land while the dispatcher's HTTP call is still in flight, advancing
the job to in_progress. If that HTTP call THEN failed, the
failure-path apply_outcome's advance_replay_job_state(Errored)
matched the in_progress row (the existing SQL guard only excludes
terminal states) and clobbered legitimate started state.

Fix: new Store::mark_dispatched_job_as_errored helper with the
strict guard `WHERE id = ? AND state = 'dispatched'`. The
failure-path apply_outcome now uses this. UPDATE is a no-op when
state has advanced to in_progress, preserving the legitimate run.

Includes a regression test
(failed_dispatch_does_not_clobber_already_in_progress_job) that
inserts an in_progress row directly, calls
mark_dispatched_job_as_errored, and asserts the row is unchanged.

19 replay_jobs_tests pass (was 18, +1). Full suite still green.

Made-with: Cursor
@risjai
Copy link
Copy Markdown
Collaborator Author

risjai commented Apr 28, 2026

Self-caught follow-up to my previous comment, pushed as 023bd3f.

I called out the in_progress-clobber risk in my reply ("that means a started-then-failed-dispatch combo could mark in_progress jobs errored"), then realized I should fix it now rather than file as follow-up. The risk is real — pre-spawn transition + fast runner + failed dispatch HTTP call would have the failure path's advance_replay_job_state(Errored) match the in_progress row.

Fix: new Store::mark_dispatched_job_as_errored helper with the strict guard WHERE id = ? AND state = 'dispatched'. The failure-path apply_outcome uses this; UPDATE is a no-op when state has already advanced to in_progress, so the legitimate run continues.

New test failed_dispatch_does_not_clobber_already_in_progress_job inserts an in_progress row directly, calls the helper, and asserts the row is unchanged.

19 replay_jobs_tests pass (was 18, +1). Full suite remains green.

@risjai risjai merged commit 7fe91f8 into master Apr 28, 2026
7 checks passed
@risjai risjai deleted the feat/phase-3-rest branch April 28, 2026 07:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant