Add direct recording mode — no proxy needed#4
Merged
Conversation
`rewind_agent.init()` now records LLM calls directly to ~/.rewind/ by monkey-patching OpenAI/Anthropic SDK clients in-process. No proxy server, no env vars, no second terminal. Supports sync + async, streaming + non-streaming. Recorded sessions are immediately readable by `rewind show`, `rewind inspect`, and the MCP server — full compatibility with the Rust store format. New files: - store.py: Pure Python store (sqlite3 + SHA-256 blob store) - recorder.py: Monkey-patching recorder with streaming wrappers - tests/: 28 tests covering store, recorder, concurrency, and error resilience Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This was referenced Apr 9, 2026
Merged
risjai
added a commit
that referenced
this pull request
Apr 12, 2026
- Fix #1 (BUG): Thread session_name through _init_proxy so fallthrough creates the session with the caller's name, not hardcoded "default" - Fix #2 (DESIGN): _init_proxy returns bool instead of mutating _mode via hidden side effect; init() handles the mode switch - Fix #4: Remove dead urllib.error import - Fix #5: Gate proxy health endpoint on GET method only - Fix #6: Add version field to proxy health response for consistency with rewind-web - Fix #7: Add 5 new tests — slow proxy timeout, non-Rewind server rejection, session name preservation, uninit after fallthrough, init() mode assertion - Fix #8: Validate health response body (check "status": "ok") to prevent false positives from non-Rewind services on the same port Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
risjai
added a commit
that referenced
this pull request
Apr 12, 2026
- Fix #1 (HIGH): Add `from __future__ import annotations` for Python 3.9 compat — `str | None` syntax is 3.10+ - Fix #2 (MEDIUM): Remove APITimeoutError from detection — timeouts may originate from slow upstream, not dead proxy. False trips when proxy is healthy but LLM is slow. - Fix #3 (MEDIUM): Skip record_success() for streaming calls — stream object returned doesn't mean data has flowed. Proxy could die mid-stream after failure count was reset. - Fix #4 (MEDIUM): Add retry-path test — verifies direct recorder stores steps correctly after circuit trips to OPEN - Fix #5 (LOW): Copy `organization` in throwaway OpenAI client for org-scoped API key compatibility Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
risjai
added a commit
that referenced
this pull request
Apr 12, 2026
* feat: mid-session circuit breaker for proxy mode (Phase 3) Add a ProxyCircuitBreaker that detects proxy failure mid-session and transparently falls through to direct recording. The circuit breaker: - Detects failures from LLM SDK connection errors (APIConnectionError, APITimeoutError) — no per-request health checks, zero latency on the happy path - Trips to OPEN after 2 consecutive connection failures - In OPEN state, preemptively skips the proxy and retries via a throwaway client pointing at the original upstream URL (thread-safe, no shared mutable state) - Creates a local Store + Recorder for direct-mode recording while proxy is down (session named "<original> (proxy-fallback)") - After 30s recovery timeout, probes the proxy in HALF_OPEN state - On successful probe, tears down direct resources and resumes proxy mode Design decisions (from review): - Throwaway client for retries (resolves thread-safety issue with base_url swap — review finding #1) - Preemptive skip in OPEN state (zero latency penalty — finding #2) - New session in local store on trip (simple, no cross-store merge — finding #3) - APITimeoutError included in detection (finding #5) Known limitations: - Mid-stream proxy failure cannot be retried (LLM responses are non-deterministic) - Timeout detection takes SDK timeout period (60-120s) × threshold 23 new tests: error detection (7), state machine (13), integration (3). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: address PR #51 review findings - Fix #1 (HIGH): Add `from __future__ import annotations` for Python 3.9 compat — `str | None` syntax is 3.10+ - Fix #2 (MEDIUM): Remove APITimeoutError from detection — timeouts may originate from slow upstream, not dead proxy. False trips when proxy is healthy but LLM is slow. - Fix #3 (MEDIUM): Skip record_success() for streaming calls — stream object returned doesn't mean data has flowed. Proxy could die mid-stream after failure count was reset. - Fix #4 (MEDIUM): Add retry-path test — verifies direct recorder stores steps correctly after circuit trips to OPEN - Fix #5 (LOW): Copy `organization` in throwaway OpenAI client for org-scoped API key compatibility Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: move imports to top of file to satisfy ruff E402 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
risjai
added a commit
that referenced
this pull request
Apr 13, 2026
Fixes all items from PR #56 review: #1 (P0 bug) Retry by OpenAI exception type, not string matching: - New `_is_retryable()` checks `openai.RateLimitError`, `openai.APIStatusError(500/502/503)`, `ConnectionError`, `TimeoutError` - "Token limit 500 exceeded" no longer triggers false retries - Added 7 tests including retry-then-succeed integration test #2 (P0 bug) Add --force flag to bypass stale score cache: - `rewind eval score --force` skips the `get_timeline_score` cache check - Upsert in `create_timeline_score` now reachable after code changes #3 (P1 security) Sanitize API keys from stderr before storing: - `sanitize_secrets()` redacts `sk-[a-zA-Z0-9_-]{10,}` patterns - Applied to LLM judge subprocess stderr before writing to reasoning - 3 tests for single key, multiple keys, and clean text #4 (P2) Remove unused `_session_id` from `extract_timeline_output()`: - Public API now takes `(store, timeline_id)` — cleaner signature - All callers (CLI + tests) updated #5 (P2) Compare each fork against main, not just first vs last: - Delta section now shows one line per fork, each compared to main - Works correctly for 3+ timelines #6 (P3) ValueError exits with code 1 in subprocess: - `ValueError` and `RuntimeError` now exit non-zero so Rust surfaces them as errors, not misleading zero scores - Test added for correctness-without-expected path Tests: 26 Rust + 41 Python passing, clippy clean. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This was referenced Apr 14, 2026
risjai
added a commit
that referenced
this pull request
Apr 21, 2026
…I wiring Addresses the review on #133. Must-fix (reviewer #1, #2): - Bump rand from 0.8 → 0.9 so the workspace ships a single rand major version (previously duplicated alongside quinn/proptest/tungstenite/otel) - Drop the generic `B` and `transmute_body` helper from auth_middleware. axum 0.8's from_fn_with_state passes Request<axum::body::Body> directly. Should-fix (reviewer #3, #4, #6): - Scope `?token=<token>` query-param fallback to /api/ws ONLY. Browsers can't set Authorization on WebSocket upgrades; REST must not accept this form because it leaks via Referer, server logs, and browser history. - Fail-closed now also catches IPv6 :: via is_loopback() already returning false for unspecified addrs (added explicit test). - Token file creation is now atomic: OpenOptions::create_new + mode(0o600). Closes the umask window where fs::write created at 0644 before the subsequent chmod tightened it. Also handles AlreadyExists (concurrent writer won the race → re-read their token). UI wiring for token-enabled deployments: - New web/src/lib/auth.ts — minimal localStorage-backed token holder - api.ts injects Authorization: Bearer on every request; prompts the user and retries once on 401, clears on second 401 - use-websocket.ts appends ?token= to the WS URL when a token is stored - SPA static assets still load without a token (only /api/* and /api/ws are gated), so the UI loads the login prompt fine Nits: - Add TODO for `rewind auth rotate` command (follow-up) - Tighten resolve_with_env to pub(crate) — it's test-only, not API surface - Replace tmp.keep() with a returned TempDir guard so test data dirs are cleaned up on drop - Replace flaky 100ms timeout sentinels with health-probe loops that wait for /_rewind/health to return 200 Tests: - 17/17 auth_tests pass (was 12) — new: IPv6 ::, REST rejects ?token=, WS accepts/rejects with ?token=, concurrent token-file race - Full workspace: 360+ tests, 0 failures - Clippy clean on all modified files - Manual smoke: dashboard loads, Bearer-auth REST works, WS upgrade with ?token= returns 101, without returns 401 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This was referenced Apr 21, 2026
risjai
added a commit
that referenced
this pull request
Apr 21, 2026
…own (#3+4) (#135) * feat(security): blob redaction, hop-by-hop filtering, query_raw lockdown Bundles audit ship-order PRs #3 and #4 into a single PR. HIGH-01 + MEDIUM-06 + HIGH-06 (blob redaction): - New `crates/rewind-proxy/src/redact.rs` module with: - `redact_request_body()` — strips values of sensitive JSON keys (api_key, authorization, token, password, secret, credentials, etc.) from request bodies before blob store write - `redact_secrets()` — regex-based pattern matching on response blobs for OpenAI keys (sk-...), AWS access key IDs (AKIA...), Bearer tokens, and long hex tokens (40+ chars). Applied in both buffered and streaming response paths. - 8 unit tests for redaction logic MEDIUM-08 (hop-by-hop headers): - `redact::HOP_BY_HOP_HEADERS` denylist applied in the proxy header- forwarding loop. Strips transfer-encoding, te, trailer, upgrade, proxy-authorization, proxy-connection, keep-alive, expect per RFC 7230 §6.1. HIGH-02 (query_raw lockdown): - Removed PRAGMA and EXPLAIN from the query_raw allowlist — only SELECT and WITH are now permitted. Many PRAGMAs mutate state (journal_mode, foreign_keys, writable_schema). - New `Store::pragma_table_info()` validates the table name against sqlite_master before constructing the query, replacing the unsafe `format!("PRAGMA table_info({})", name)` pattern in the CLI. Audit doc updated: HIGH-01, HIGH-02, HIGH-06, MEDIUM-06, MEDIUM-08 all marked as fixed in PR #135. Ship order table updated. Tests: full workspace 370+/0, clippy clean on touched files. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(security): address PR #135 review — raw SSE redaction, WITH-DML guard, nits Addresses blocking + should-fix items from the code review. 🔴 Blocker resolved: - Raw SSE blob (accumulated_raw) now goes through redact_secrets() before blobs.put. Previously the synthetic response was redacted but the raw copy was written verbatim — a real secret leak on disk. 🟡 Should-fix resolved: - query_raw now checks stmt.readonly() after prepare. Catches WITH-prefixed DML (WITH x AS (...) DELETE/UPDATE/INSERT) that passed the first-word allowlist. SQLite's own read-only classifier is authoritative. - Dropped the over-aggressive [0-9a-f]{40,} regex pattern that was redacting git SHAs, blob store hashes, and request IDs. The three named patterns (sk-, AKIA, Bearer) cover the actual threat. - Binary/compressed response bodies now log a tracing::debug warning instead of silently skipping redaction. Full gzip decompression deferred to a follow-up (requires enabling reqwest gzip feature). 🟢 Nits: - Replaced once_cell::sync::Lazy with std::sync::LazyLock (stdlib); dropped once_cell dep from rewind-proxy - is_hop_by_hop now lowercases internally (matches is_sensitive_header behavior; callers no longer need to pre-lowercase) - Stale doc comment on query_raw cleaned up Tests: 370+/0, clippy clean. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
risjai
added a commit
that referenced
this pull request
Apr 27, 2026
Both Santa reviewers returned FAIL with 6 blocking issues plus failing CI on Rust clippy. This commit addresses all of them. Verifies clean on `cargo clippy -- -D warnings`, `cargo test --workspace` (350+ tests, all green), and `pytest tests/test_intercept_core.py` (32 tests). ## Fix → Finding mapping ### CI clippy (was the gating signal — both Ubuntu and macOS jobs red) - **type_complexity** at `crates/rewind-store/src/db.rs:709` — extracted the 5-tuple return type into a named `ReplayContextRow` struct in `crates/rewind-store/src/models.rs`. Names also document each field at call sites (`ctx.session_id` vs `ctx.0`). - **manual_div_ceil** at `crates/rewind-store/src/envelope.rs:165` — replaced `(bytes.len() + 2) / 3 * 4` with `bytes.len().div_ceil(3) * 4` in the base64 capacity hint. Same arithmetic, idiomatic. ### #2 — strict_match unusable + cursor advances before validation The flag existed on the schema (`replay_contexts.strict_match`) but wasn't reachable from any client API, and `do_replay_lookup` advanced the ordinal cursor before content validation. Strict-mode 409 would consume an ordinal slot, so a retry with the corrected request would land on the wrong recorded step. - `CreateReplayContextRequest` (`crates/rewind-web/src/api.rs`) gains optional `strict_match` field (default false). The handler calls `set_replay_context_strict_match` after creating the context. The response echoes `strict_match` so callers can verify. - `ExplicitClient.start_replay` and `replay_from_iteration` (`python/rewind_agent/explicit.py`) gain `strict_match: bool = False` parameter. When true, sends the field in the request body; older servers that don't recognize it ignore it (forward-compat). - New `Store::peek_next_replay_step(ctx_id) -> u32` (`crates/rewind-store/src/db.rs`) reads `current_step + 1` without mutating. `do_replay_lookup` peeks → validates → only calls `advance_replay_context` on the success path (cache hit served, with or without warn-divergence). Strict 409 returns early without advancing — caller can retry the corrected request without ordinal drift. ### #3 — replay_cache lacked format column, proxy used JSON-shape sniff The `replay_cache` table didn't carry `response_blob_format`, so `crates/rewind-proxy/src/lib.rs` used a fragile heuristic (`raw[0] == b'{' && contains "status:" && contains "headers:" && contains "body:"`) to guess whether a cached blob was envelope-v1. - New v0.13 migration: `ALTER TABLE replay_cache ADD COLUMN response_blob_format INTEGER NOT NULL DEFAULT 0`. Pre-migration cache rows decode as 0 (legacy naked body) preserving back-compat. - `Store::cache_put` signature gains `response_blob_format: u8`. Proxy callers pass `step.response_blob_format` (typically `FORMAT_ENVELOPE_V1` for v0.13+ writes). - `Store::cache_get` reads the column via Option<u8> defensive read so the migration timing window doesn't crash. CacheEntry struct gains `response_blob_format` field. - Proxy instant-replay cache hit path replaces the sniff heuristic with a direct `ResponseEnvelope::from_blob_bytes(cached.response_blob_format, &raw)` call. Replayed step inherits `cached.response_blob_format`. ### #4 — Web API list/detail/preview paths don't unwrap envelope `extract_preview` and `blob_to_json` in `crates/rewind-web/src/api.rs` read response_blob bytes directly and JSON-parsed. With v0.13's envelope writes, those paths would surface envelope wrapper JSON (`{status, headers, body}`) instead of the inner model response. - Renamed `blob_to_json` → `request_blob_to_json` (request bodies have no envelope wrapper, semantics unchanged) + new `response_blob_to_json` that takes `(store, blob_hash, format)` and unwraps via `ResponseEnvelope::from_blob_bytes` before JSON-parsing. - Internal helper `read_response_blob_unwrapped` shared between `extract_preview` and `response_blob_to_json` so the unwrap policy lives in exactly one place. - `extract_preview` and `extract_preview_from_store` (the public re-export used by `ws.rs`) take `response_blob_format: u8`. All four call sites in `api.rs` and the one in `ws.rs` updated to pass `s.response_blob_format` / `step.response_blob_format`. - Step detail endpoint also updated — the inline ad-hoc parse block (lines 226-240) now goes through the helper. Pre-migration steps (`response_blob_format=0`) round-trip via `ResponseEnvelope::from_blob_bytes`'s legacy fallback (synthetic 200, empty headers, body=raw bytes), so any blob recorded before v0.13 parses identically to today. ### #5 — Header scrubbing incomplete `HOP_BY_HOP_HEADERS` in `crates/rewind-store/src/redact.rs` did not include `Connection` itself, did not strip headers nominated by the `Connection` value (RFC 7230 §6.1), and `scrub_response_headers` did not strip `x-api-key`. - Added `connection` to `HOP_BY_HOP_HEADERS`. - New `redact::connection_nominated_headers(headers)` helper parses the Connection value and returns the lower-cased set of nominated header names. Empty when no Connection header is present. - `envelope::scrub_response_headers` now materializes the input, computes nominated headers, and strips: - Hop-by-hop (incl. Connection itself) - Headers nominated by Connection's value - Set-Cookie, Authorization (existing) - x-api-key (new — same rationale as Authorization for token-based providers like OpenAI / Anthropic / Azure) - Two new tests for Connection-nominated header stripping (header order doesn't matter — Connection can appear before OR after the nominated header). x-api-key added to existing scrubber tests. ### #6 — Python streaming detection fragile for large JSON bodies `json.loads(sniff)` on the truncated 8 KB sniff window would raise JSONDecodeError on any body large enough to exceed the window — even when `"stream": true` was at the very top of the body. The detector silently returned False, meaning streaming requests would not get the synthetic-SSE replay path. - `python/rewind_agent/intercept/_core.py` replaces JSON parsing with a regex match on `(?<![\w_])"stream"\s*:\s*true(?![\w])`. Handles whitespace variants (`"stream":true`, `"stream" : true`, etc.), matches partial bodies, can't false-positive on nested keys (`"upstream"`, `"my_stream"`, `"stream_options"`). - The regex is also faster than json.loads for the negative case (no parsed-object allocation thrown away). - 3 new regression tests: - Truncated body with stream:true at top → still detects True (was False under json.loads) - Nested keys do NOT false-positive (upstream, my_stream, etc.) - Huge 1 MB body with stream:true at top → still detects True ## Other touch points - `python/rewind_agent/explicit.py::start_replay` accepts and forwards `strict_match: bool = False` parameter; only sends the field in the POST body when true so older servers ignore it gracefully. - Replay_cache schema migration runs idempotently — the `let _ = ALTER TABLE` pattern matches the v0.5/0.6/0.12 migrations. ## Tests - `cargo clippy -- -D warnings` — clean (matches CI's invocation; see `.github/workflows/ci.yml`). - `cargo test --workspace` — 350+ tests, all green. - rewind-store: 50 (was 46; added 4 regression tests for Santa #2 and #3 — peek doesn't advance, advance after peek increments, strict_match round-trips, cache_format round-trips). - rewind-web: 80 (no regressions despite the helper rename). - rewind-proxy: 12 + 5 integration (no regressions). - `python -m pytest tests/test_intercept_core.py` — 32 tests, all green (was 30; added 3 regression tests for Santa #6: truncated body with whitespace variants, nested key false-positives, 1 MB body). ## Versions No version bumps — the original PR's 0.13.0 / 0.15.0 still apply. Fix commits don't change semver category. Made-with: Cursor
Merged
6 tasks
risjai
added a commit
that referenced
this pull request
Apr 27, 2026
Self-audit while answering "is Phase 0 complete?" surfaced 7 additional read sites that bypassed the v0.13 envelope unwrap policy. Santa review #4 was scoped to `crates/rewind-web/src/api.rs`, but the broader principle ("unwrap envelope before surfacing response_blob to a user") applies everywhere a `Step.response_blob` is rendered. Without these fixes, v0.13+ proxy-recorded sessions would show {status, headers, body} wrapper JSON in the CLI, MCP, TUI, replay-diff, and OTel export surfaces — a user-visible regression that would land in v0.13 GA. ## Approach: centralized helper, route every read through it Added `Store::read_step_response_body(&Step) -> Option<Vec<u8>>` and `Store::read_step_response_json(&Step) -> Option<serde_json::Value>` in `crates/rewind-store/src/db.rs`. Both consult `step.response_blob_format` and dispatch to `ResponseEnvelope::from_blob_bytes`, which: - format=0 (legacy / explicit-API): returns raw bytes as the body - format=1 (envelope-v1): unwraps to the inner body - unknown: degrades to legacy fallback (forward-compat) Pre-migration data round-trips unchanged because the column DEFAULT 0 combined with the legacy fallback means the helper produces the exact same bytes as a direct `blobs.get(&step.response_blob)` read used to. ## Fixed surfaces (one commit, atomic) - **`rewind-cli` `view` command** (`main.rs:1106-1114`): replaced the manual `blobs.get → utf8 → from_str` chain with `read_step_response_json`. The recorded-session preview line now shows the model response on v0.13+ proxy-recorded data instead of the envelope wrapper. - **`rewind-cli` `failures` command** (`main.rs:3663-3681`): three separate reads (failure_response, preceding_step request, preceding_step response) — request is naked JSON (untouched), response now uses the helper. - **`rewind-mcp` `get_step` tool** (`server.rs:382-385`): MCP responses to LLM agents querying Rewind now contain the actual model response, not the envelope wrapper. Agents that decide based on prior step outputs get correct context. - **`rewind-mcp` `extract_response_preview`** (`server.rs:1547-1582`): changed signature from `(&Store, response_blob: &str)` to `(&Store, step: &Step)` so it can route through the helper. Both call sites updated (`list_steps` summary + nested span steps preview). - **`rewind-tui` Response panel** (`lib.rs:345-349`): introduced `format_response_blob_for_display(step)` that goes through the envelope-aware helper. The legacy `format_blob_for_display(hash, is_request)` retains its raw-bytes path for request blobs (which never carry an envelope) and gains a `debug_assert!` documenting that the `is_request=false` branch should not be used for v0.13+ response data. - **`rewind-replay` `step_summary`** (`lib.rs:307-343`): fork-diff preview text is now extracted from the inner body. Branch fork-diff views show meaningful response previews instead of envelope JSON. - **`rewind-otel` extract pipeline** (`extract.rs`): split the blob resolution loop into request_hashes (naked JSON) + response_refs (hash + format pairs). Response blobs go through `ResponseEnvelope::from_blob_bytes(format, &raw)` so the cached Value handed to `attributes::llm_call_attributes` is the inner model response. `gen_ai.output.messages` and other span attributes exported to Datadog/Jaeger/OTLP collectors now contain real model output instead of envelope wrappers. ## Tests New regression coverage proving the helper's contract and the OTel pipeline behavior. Surface-level tests are unnecessary because every surface now routes through the same helper — the helper tests are the contract for all of them. - `rewind-store` (50 → 54): 4 helper tests - `read_step_response_body_returns_raw_for_legacy_format` — pre-migration format=0 round-trips as raw bytes - `read_step_response_body_unwraps_envelope_format` — format=1 returns inner body; explicit assertions that the wrapper fields (`status`, `headers`, `body`) are NOT in the parsed JSON - `read_step_response_body_returns_none_for_empty_blob` — hook events / in-flight steps with no body return None - `read_step_response_body_degrades_for_unknown_format` — forward-compat: unknown format values fall through to legacy fallback, no panic - `rewind-otel` (54 → 56): 2 extract tests - `extract_unwraps_envelope_format_response_blob` — OTel cache holds inner body; explicit assertions that wrapper fields don't leak into the attribute pipeline - `extract_round_trips_legacy_format_response_blob` — pre-migration data reads identically to v0.12 Total Rust tests: 350+ → 356, all green. Python intercept tests: 32, all green (unchanged). `cargo clippy -- -D warnings` clean. ## Audited (no fix required) - **`rewind-proxy::redact` relocation external compat** (PR test plan item, was unverified): `rewind-proxy/src/lib.rs:5` re-exports the module via `pub use rewind_store::redact;`, so any out-of-tree consumer still imports `rewind_proxy::redact::*` correctly. No in-tree consumers depend on the old path. - **Pre-migration upgrade scenario** (PR test plan item, was unverified): the `read_step_response_body_returns_raw_for_legacy_format` test simulates this exactly — a step with `response_blob_format=0` and `request_hash=None` (the v0.12.x shape) reads identically to before. Combined with the OTel `extract_round_trips_legacy_format` test, the upgrade path is mechanically verified. ## What's still NOT in this PR (deferred to Tier 1) - Tier 1 transport adapters (httpx, requests, aiohttp). Land in next PR off this branch. - Tier 2 decorator (`cached_llm_call` on ExplicitClient). - Tier 3 runner registry + dashboard "Run replay" button. - Documentation rewrites — held until Tier 1 GA so users hit accurate guidance. - Ray-agent port to `intercept.install()`. Made-with: Cursor
risjai
added a commit
that referenced
this pull request
Apr 27, 2026
Santa returned NAUGHTY with 7 blocking findings. This commit addresses all of them and adds 7 directed regression tests. Pre-push verification suite (ruff + pytest local + pytest bare-env + clippy + cargo test) all green — including a CI-equivalent simulated-bare-env pytest run that hides httpx/requests/aiohttp via meta_path blocker (367 passed, 12 skipped, 0 failed in that env). ## Fix → Finding mapping ### #6 — Version bump skipped unpublished 0.15.0 Reverted python/pyproject.toml + python/rewind_agent/__init__.py from 0.16.0 back to 0.15.0. Per CLAUDE.md track 2: master is at 0.15.0, PyPI is at 0.14.8 — 0.15.0 is unpublished, so Phase 1 changes ride with it. Bumping to 0.16.0 would skip the unpublished version. ### #1 follow-up — install tests must work in CI's bare env test_intercept_install.py asserted all three adapters patched. CI doesn't install httpx/requests/aiohttp, so HTTPX_AVAILABLE etc. all flip to False, install() no-ops the missing adapters, and the unconditional asserts fail. Made the assertions conditional on *_AVAILABLE flags. Each adapter is asserted patched if-and-only-if its library is importable. The test_install_with_custom_predicates_applies_to_httpx test gains a self.skipTest() guard since it actually exercises the transport patch with a MockTransport (httpx-required). ### #4 — Strict-match 409 was silently swallowed ExplicitClient._post wraps urllib in `try: ... except Exception: return None`. When the server returns HTTP 409 strict-match divergence, urllib raises HTTPError(409) → swallowed → caller sees None → adapters interpret as cache miss → fall through to live. Strict mode is broken. Fix: new ExplicitClient._post_replay_lookup with explicit 409 handling. HTTP 409 raises a new typed exception RewindReplayDivergenceError (exported from rewind_agent). Other 4xx/5xx errors still degrade to None (cache miss) so transient Rewind outages don't break the agent's normal flow. get_replayed_response / _async use the new method; their docstrings note that strict-mode divergence propagates as an exception. RewindReplayDivergenceError carries the server's diagnostic message plus optional target_step / stored_hash / incoming_hash for caller diagnostics. ### #5 — httpx configured default transport options were dropped When user did `httpx.Client(verify=False, http2=True, ...)` without transport=, our patched __init__ replaced transport=None with a fresh RewindHTTPTransport() BEFORE httpx's __init__ ran. That short- circuited httpx's own configured-default construction, dropping verify / cert / trust_env / http2 / proxies / limits / local_address / retries / socket_options / default_encoding / etc. Fix: two-mode patched_init. Mode (a) user passed transport=X — wrap X (existing behavior). Mode (b) user passed only top-level config — call original_init first (httpx builds its configured default at self._transport), then post-init wrap self._transport with RewindHTTPTransport(_inner=…). Same fix applied to AsyncClient.__init__. The only fragility is that we read self._transport which is httpx's documented internal attribute (stable across 0.x). ### #2 — Live streaming responses pre-read body before passthrough _flow.{handle_intercepted_async, _serve_cache_miss_sync} called `await resp.json()` / resp.json() on the live response BEFORE returning. For streaming responses this: - httpx: raises ResponseNotRead (body not yet streamed) - requests: marks _content_consumed=True, blocking iter_content - aiohttp: closes the connection after read User code's streaming iteration breaks every time. Fix: detect streaming via OR-combined transport hints + body-aware detect_streaming(req) — see #3. For STREAMING misses: pass through the live response immediately, record with placeholder response_value=None and zero tokens. Tee-based stream-recording (matching the Rust proxy's handle_streaming_response) is documented as a follow-up (v1.1) — proper capture without consuming requires a wrapping ByteStream that yields chunks AND captures, fired on stream completion. For NON-streaming misses (the common case for buffered chat completions): pre-read remains correct — httpx/requests/aiohttp buffer the body eagerly anyway. ### #3 — Body-only "stream": true wasn't routed to streaming Adapters set req.stream from transport-level signals (Accept: text/event-stream, library-specific stream=True hints). A request with `{"stream": true}` in the JSON body but no Accept header was treated as non-streaming — buffered cache hit instead of synthetic SSE. Fix: handle_intercepted_{sync,async} now OR-combines is_streaming (adapter signal) with detect_streaming(req) (Phase 0 body-aware helper that recognizes the JSON `"stream": true` field via regex, plus Accept header, plus explicit RewindRequest.stream). ### #7 — docs and ray-agent migration deferred (not in this PR) The plan listed docs/intercept-quickstart.md and the ray-agent port as Phase 1 deliverables. They're absent from this PR and the PR description should say so explicitly. The ray-agent migration files (use-cases/ray-agent/code/rewind_setup.py, use-cases/ray-agent/INTEGRATION-DIFF-V2.md) live in the local working tree but use-cases/ is .gitignore'd in this repo — they're meant to be copied into the ray-agent repo (separate repository). This was already noted in the version-bump commit message but the PR top-level description didn't make it visible. Will update the PR description with a clearer "deferred" section in the next reply to Santa. docs/intercept-quickstart.md is a follow-up; the in-tree rustdoc on intercept.install() and the operator-facing README in the use-cases/ray-agent/ directory cover the immediate need. ## New regression tests (7 in tests/test_intercept_santa_fixes.py) - test_strict_match_409_raises_typed_error — Santa #4 - test_non_409_http_error_is_swallowed_to_cache_miss — Santa #4 boundary - test_verify_false_setting_survives_intercept_install — Santa #5 - test_user_supplied_transport_is_wrapped_not_replaced — Santa #5 mode (a) - test_streaming_miss_passes_through_without_consuming_body — Santa #2 - test_cache_hit_with_body_stream_true_emits_synthetic_sse — Santa #3 - test_install_only_patches_available_adapters — Santa #1 follow-up ## Pre-push verification routine (followed BEFORE this push) User correctly called out that I'd been pushing without running the full lint/test stack locally. Established a 5-stage pre-push routine that mirrors CI exactly: 1. ruff check . — Python lint (CI's exact invocation) 2. pytest tests/ in local env (with all libs installed) 3. pytest tests/ in simulated bare env (httpx/requests/aiohttp blocked via sys.meta_path) — catches CI-only failures locally 4. cargo clippy -- -D warnings — Rust CI's exact invocation 5. cargo test --workspace — Rust CI's exact invocation All 5 stages green for this commit: - ruff: All checks passed! - pytest local: 420 passed, 1 skipped (was 413; +7 Santa regression tests) - pytest bare env: 367 passed, 12 skipped, 0 failed (CI mirror) - cargo clippy: clean - cargo test: all green, 0 FAILED Adding scripts/pre-push-check.sh as a reusable harness in a follow-up commit so future sessions don't have to re-derive this routine. Made-with: Cursor
risjai
added a commit
that referenced
this pull request
Apr 27, 2026
…ness bugs Re-review at commit 1f9a2dd identified four real bugs surviving the first round of Santa fixes. CI was green and prior items were resolved, but mock-transport tests masked the real httpx behavior. This commit fixes all four with directed regression tests using shapes that match real (not mock) library behavior. ## Fix → Re-review finding ### Re-review #1 — httpx ResponseNotRead crashes on real transport ``_flow._read_response_body_{sync,async}`` called ``resp.json()`` directly. Real httpx response bodies are NOT auto-read at the transport layer; they're streamed lazily and ``.json()`` raises ``httpx.ResponseNotRead`` until ``.read()`` / ``.aread()`` runs. The prior tests used ``httpx.MockTransport`` + ``httpx.Response(json=…)`` which auto-reads the body via the constructor — masking the bug. Fix: explicit body materialization before parsing. Sync path: try ``resp.read()`` (httpx) before ``resp.json()``. ``read()`` is idempotent on already-read responses (requests has ``_content`` populated; aiohttp doesn't reach the sync path). Async path: try ``await resp.aread()`` (httpx) first, falling back to ``await resp.read()`` (aiohttp) if aread isn't present. Both libraries diverge on whether the read API is async; we handle both. Broadened the exception-catch in both paths from ``(json.JSONDecodeError, ValueError)`` to bare ``Exception`` so any transport-specific exception (httpx.ResponseNotRead, aiohttp's ContentTypeError, etc.) degrades to None recording rather than propagating up through the user's call site. ### Re-review #2 — RewindHTTPTransport leaks _inner ``RewindHTTPTransport`` and ``RewindAsyncHTTPTransport`` delegated ``handle_request`` / ``handle_async_request`` to ``self._inner`` but didn't override ``close()`` / ``aclose()``. When the user's Client closes (explicitly or via ``__exit__``), only our own resources got released — the wrapped configured/user transport leaked its connection pool, SSL context, and cert verifier for the process lifetime. Fix: explicit close / aclose forwarding. def close(self): if self._inner is not None: try: self._inner.close() except Exception: pass super().close() Async variant uses ``await self._inner.aclose()``. Both swallow exceptions on the inner close to preserve the super().close() invariant — even a buggy inner transport shouldn't leak our own cleanup. ### Re-review #3 — aiohttp base_url + relative paths bypass intercept ``aiohttp_middleware._build_rewind_request`` did ``url = str(str_or_url)`` before aiohttp resolves relative URLs against ``ClientSession._base_url``. So ``session.post("/v1/chat/completions")`` (typical OpenAI SDK pattern when configured with a custom gateway base URL) produced a RewindRequest whose URL was just the path. Host-based predicates (the default — matches ``api.openai.com`` etc.) silently failed, the request bypassed interception, no recording. Fix: new ``_resolve_url(session, str_or_url)`` helper that replicates aiohttp's own resolution logic. If ``session._base_url`` is set and the supplied URL doesn't start with a scheme (``http:``/``https:``/``ws:``/``wss:``), yarl-join the two. yarl is a hard aiohttp dep so importing it inside the helper is safe — we wouldn't be in this code path otherwise. Defensive: if yarl join blows up on a malformed URL, return the bare string. Predicate match will fail (silent bypass) but at least we don't crash the user's request. ### Re-review #4 — PR description metadata stale Code reverted to SDK 0.15.0 in the previous fix commit, but the PR description still said "0.15.0 → 0.16.0" and listed ``./scripts/publish-pypi.sh`` for SDK 0.16.0 in post-merge actions. Will update the PR body via ``gh pr edit`` in the same push. ## Regression tests added (9 new in test_intercept_santa_fixes.py) Total Santa fix tests: 7 (initial) + 9 (re-review) = **16 tests, all green**. Re-review #1 coverage: - test_sync_read_called_before_json_on_unread_response — constructs a real ``httpx.Response`` with ByteStream (not pre-read) and verifies ``_read_response_body_sync`` reads + parses correctly - test_async_aread_called_before_json_on_unread_response — same for async path - test_buffered_cache_miss_through_real_transport_records_tokens — end-to-end through AsyncClient. The MockTransport handler returns ``httpx.Response(stream=ByteStream(body))`` (not the prior ``json=body`` shape that auto-reads) so this test actually exercises the fix. Re-review #2 coverage: - test_sync_close_propagates_to_inner_transport — spy MockTransport counts close() calls; assert >= 1 after Client.close() - test_async_aclose_propagates_to_inner_transport — async equivalent Re-review #3 coverage: - test_relative_path_resolves_against_base_url — direct unit test on _resolve_url with a yarl URL base - test_absolute_url_passes_through_unchanged — absolute URLs aren't mangled even when base_url is set - test_no_base_url_returns_input_unchanged — sessions without base_url stringify the path as-is - test_aiohttp_session_base_url_match_records_via_predicate — end-to-end through aiohttp.ClientSession(base_url=...) + relative POST + default host predicate. Asserts recording fires (= predicate matched the resolved URL). ## Pre-push verification (codified in scripts/pre-push-check.sh) All 5 stages green BEFORE this push: - ruff check . — clean - pytest tests/ (local env, all libs) — 429 passed, 1 skipped (was 420; +9 re-review regression tests) - pytest tests/ (simulated bare env, libs hidden via meta_path) — 367 passed, 12 skipped, 0 failed (CI mirror) - cargo clippy -- -D warnings — clean - cargo test --workspace — all green The local-pre-push routine exists specifically because PR #149's prior CI iterations missed bugs that depended on real-vs-mock behavior; this re-review's #1 finding (ResponseNotRead) was visible ONLY when constructing test responses with the ``stream=ByteStream(...)`` shape that matches real transport output, not the ``Response(json=...)`` shape that auto-buffers. Future regression tests should default to the stream= shape for httpx wherever realism matters. Made-with: Cursor
6 tasks
shivam2199
pushed a commit
to shivam2199/rewind
that referenced
this pull request
Apr 29, 2026
…CAL-02) (agentoptics#133) * feat(auth): fail-closed auth for non-loopback web server binds Addresses CRITICAL-02 from the security audit: when Rewind bound to a non-loopback address (documented K8s pattern with REWIND_BIND_HOST=0.0.0.0), all API/WebSocket/OTLP routes were exposed without authentication. Any reachable host could read full LLM prompts/responses, pollute sessions, or chain into the SSRF via the OTel export endpoint. Changes: - New `rewind-web` auth middleware with constant-time token comparison (subtle::ConstantTimeEq) - `WebServer::run()` refuses to start on non-loopback bind without a configured token; `--no-auth` is an explicit escape hatch - New `--auth-token` / `REWIND_AUTH_TOKEN` CLI flags; auto-generates a 64-char hex token at ~/.rewind/auth_token (chmod 0600) on first non-loopback start - Loopback behavior unchanged (backward compat for MCP/SDK/CLI flows) - /_rewind/health bypasses auth (liveness probe) - 6 unit tests + 12 integration tests covering fail-closed, escape hatch, token precedence, file permissions, and per-route enforcement Scope: loopback deployments remain open by default. Loopback WebSocket-CSRF (MEDIUM-09) is partially addressed — fully closed on non-loopback, still open on loopback (follow-up PR). Plan: /Users/jain.r/.claude/plans/squishy-strolling-bachman.md 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(auth): address PR agentoptics#133 review — rand dedup, WS ?token=, TOCTOU, UI wiring Addresses the review on agentoptics#133. Must-fix (reviewer agentoptics#1, agentoptics#2): - Bump rand from 0.8 → 0.9 so the workspace ships a single rand major version (previously duplicated alongside quinn/proptest/tungstenite/otel) - Drop the generic `B` and `transmute_body` helper from auth_middleware. axum 0.8's from_fn_with_state passes Request<axum::body::Body> directly. Should-fix (reviewer agentoptics#3, agentoptics#4, agentoptics#6): - Scope `?token=<token>` query-param fallback to /api/ws ONLY. Browsers can't set Authorization on WebSocket upgrades; REST must not accept this form because it leaks via Referer, server logs, and browser history. - Fail-closed now also catches IPv6 :: via is_loopback() already returning false for unspecified addrs (added explicit test). - Token file creation is now atomic: OpenOptions::create_new + mode(0o600). Closes the umask window where fs::write created at 0644 before the subsequent chmod tightened it. Also handles AlreadyExists (concurrent writer won the race → re-read their token). UI wiring for token-enabled deployments: - New web/src/lib/auth.ts — minimal localStorage-backed token holder - api.ts injects Authorization: Bearer on every request; prompts the user and retries once on 401, clears on second 401 - use-websocket.ts appends ?token= to the WS URL when a token is stored - SPA static assets still load without a token (only /api/* and /api/ws are gated), so the UI loads the login prompt fine Nits: - Add TODO for `rewind auth rotate` command (follow-up) - Tighten resolve_with_env to pub(crate) — it's test-only, not API surface - Replace tmp.keep() with a returned TempDir guard so test data dirs are cleaned up on drop - Replace flaky 100ms timeout sentinels with health-probe loops that wait for /_rewind/health to return 200 Tests: - 17/17 auth_tests pass (was 12) — new: IPv6 ::, REST rejects ?token=, WS accepts/rejects with ?token=, concurrent token-file race - Full workspace: 360+ tests, 0 failures - Clippy clean on all modified files - Manual smoke: dashboard loads, Bearer-auth REST works, WS upgrade with ?token= returns 101, without returns 401 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(auth): use percent-encoding crate; strengthen loopback test Follow-up to the PR agentoptics#133 second-pass review. Blocker resolved: - Replace the hand-rolled percent-decoder in `extract_token` with `percent_encoding::percent_decode_str` (already transitive via reqwest/ url/axum; now an explicit dep). Deletes ~30 lines and fixes two bugs in the hand-rolled version: 1. `+` was translated to space. That's `application/x-www-form-urlencoded` semantics; RFC 3986 query strings treat `+` as a literal. A token with a `+` would fail comparison. 2. `i + 2 < bytes.len()` was off-by-one, so a valid trailing `%AB` escape was treated as literal chars at the end of the query. Regression tests for both bugs: - `ws_upgrade_accepts_token_with_literal_plus` — token `abc+def+ghi` - `ws_upgrade_accepts_token_with_trailing_percent_escape` — token `abc/` encoded as trailing `%2F` Nit addressed: - `run_allows_loopback_bind_without_token` now also hits `/api/sessions` (not just `/_rewind/health`, which bypasses auth regardless). This actually exercises the backward-compat claim that loopback + no-token leaves normally-protected routes open. Not addressed (flagged as follow-up, non-blocking): - Access-log redaction of `?token=` values — no access-log layer exists yet; address when one is introduced - Replace `window.prompt` with a proper modal UI Tests: 19/19 auth (was 17), full workspace 360+/0, clippy clean. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(ui): update api.test.ts for auth wiring + handle missing localStorage CI build was failing because the existing `api.test.ts` expected bare `fetch(path)` calls, but `api.ts` now always routes through `request()` which adds a `headers` argument (empty when no token). Two fixes: 1. Update the test assertions to match the new call shape. Added coverage for the auth injection path: - Authorization: Bearer is sent when a token is stored - POST requests merge auth with Content-Type correctly - Unauthed (loopback default) still passes through 2. `auth.ts` now has an in-memory fallback when `window.localStorage` is unavailable. jsdom in this vitest setup doesn't expose localStorage, so `setToken`/`getToken` silently no-oped. Real browsers still use localStorage; Safari private mode and jsdom fall through to memory. Tests: - 113/113 UI tests pass (was 111) - 19/19 auth_tests still green - UI build succeeds 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
shivam2199
pushed a commit
to shivam2199/rewind
that referenced
this pull request
Apr 29, 2026
…nv bootstrap (Phase 3, commit 9/13) ExplicitClient.attach_replay_context(session_id, replay_context_id): - Sets the contextvars without creating a new replay context. - Resolves Phase 3 plan HIGH agentoptics#4: runners receive an existing context in their dispatch payload and must not create a duplicate. intercept.install(): - Bootstraps from REWIND_SESSION_ID + REWIND_REPLAY_CONTEXT_ID env vars before patching transports. Both must be set together; either-but-not-both logs a WARN. Lets a runner subprocess be configured purely via env without touching the SDK. rewind_agent/runner.py (new, ~370 LOC): - RunnerConfig.from_env() — reads REWIND_RUNNER_TOKEN + REWIND_URL. - compute_signature(token, job_id, body) — pure HMAC-SHA256 matching the Rust dispatcher's wire format. Cross-checked against the same Python reference vector both crates pin against (drift between Rust + Python = silent dispatch failure). - verify_signature(config, headers, body_bytes) — case-insensitive header lookup, constant-time hmac.compare_digest. Returns (ok, reason) tuple. - DispatchPayload — typed dataclass mirroring the Rust DispatchBody. - ProgressReporter — async wrapper around POST /api/replay-jobs/ {id}/events; httpx-preferred with urllib fallback. Methods: started(), progress(step, total?, payload?), completed(), errored(message, stage?). - asgi_handler(...) — verifies signature, decodes payload, fires user handler as background task, returns (202, {job_id}) immediately so the dispatcher's 5s timeout is satisfied. On handler exception, auto-emits errored event with the exception message + error_stage='agent'. - @handle_replay marker decorator for documentation/discovery. Tests (19 new, 480 total pass): - RunnerConfig.from_env: requires token; honors defaults; honors REWIND_URL. - compute_signature: matches Rust reference vector byte-for-byte. - verify_signature: accepts canonical, rejects missing headers, malformed prefix, tampered body, wrong token; case-insensitive. - DispatchPayload decode round-trip. - asgi_handler: invalid signature returns 401 + handler not invoked; invalid body returns 400; valid request → 202 + handler runs + started/progress/completed events emitted in order; user code raising → errored event emitted with the exception message. - attach_replay_context sets contextvars. - intercept.install env-var bootstrap (both vars set; partial vars logged + skipped). ruff clean. Pre-existing 461 tests still pass; new total 480/480 + 1 skip. Made-with: Cursor
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
rewind_agent.init()now records LLM calls directly to~/.rewind/— no proxy server, no env vars, no second terminalatexithandler finalizes sessions on script exitThe UX change
New files
python/rewind_agent/store.py— Pure Python store (sqlite3 + blob store), format-compatible with Rustpython/rewind_agent/recorder.py— Monkey-patching recorder with 4 streaming wrappers (OpenAI/Anthropic × sync/async)python/tests/test_store.py— 14 tests: schema, blobs, threading, timestampspython/tests/test_recorder.py— 15 tests: recording, errors, concurrency, streaming, cost estimationModified files
python/rewind_agent/patch.py—init()now takesmode="direct"(default) ormode="proxy"python/rewind_agent/hooks.py— One-time warning when annotations used in direct modev1 scope (documented gaps)
responses.createAPI not yet patched (niche, phase 2)@step,@tool) not persisted in direct mode (phase 2)Test plan
rewind show latestreads Python-recorded sessions correctlyshow_sessionreturns Python-recorded data🤖 Generated with Claude Code