Phase 1: HTTP transport adapters (httpx, requests, aiohttp) + intercept.install()#149
Conversation
Detailed plan for Tier 1 of the Universal Replay Architecture: httpx,
requests, aiohttp transport adapters + intercept.install() orchestrator
on top of the Phase 0 primitives (RewindRequest, detect_streaming,
synthetic_sse_for_cache_hit, ResponseEnvelope, ExplicitClient cache APIs).
This commit is just the planning doc — first commit on the branch
serves as the anchor reviewers can read end-to-end before opening
the per-feature commits below it.
## Scope
- New: intercept.{httpx_transport, requests_adapter, aiohttp_middleware,
_install, _predicates, _flow, savings} (~1500 LOC across 7 files)
- Tests: 5 new test files, ~80 cases (~2000 LOC)
- Validation: ray-agent migration from 280 LOC RewindHook + ExplicitClient
to a single intercept.install() call
- Docs: docs/intercept-quickstart.md operator how-to + framework decision
matrix update
## Decisions captured (was open questions)
1. Default predicates: STRICT (only tested LLM hosts). Custom gateways
pass install(predicates=…).
2. aiohttp: TraceConfig first; monkey-patch fallback as opt-in.
3. Tool call routing: LLM-host predicate only by default.
4. Streaming chunk fidelity: deferred to follow-up PR.
5. Cost-saved math: basic counter ships in this PR via intercept.savings().
## Sequencing
9 commits (predicates+flow → httpx → requests → aiohttp → install →
tests → ray-agent migration → docs → version bumps), each green on
clippy + cargo test --workspace + python tests.
## Acceptance criteria
intercept.install() works on all three libraries; cache hits return
library-native responses; streaming hits emit clean synthetic SSE;
cache misses record via ExplicitClient with token counts; ray-agent
diff goes from 280 LOC to 3 LOC; all tests + lints green; CI green.
## NOT in this PR (deferred per plan)
- Tier 2 cached_llm_call decorator (separate PR)
- Tier 3 runner registry + dashboard "Run replay" button (separate PR)
- Real chunk-level streaming replay (follow-up)
- docs/replay-and-forking.md / docs/recording-api.md rewrites
(held for Tier 1 GA)
## Pre-Phase-1 housekeeping (flagged, not blocking)
- v0.13.0 GitHub Release + python/python-mcp PyPI publishes are
pending. Plan recommends cutting them before Phase 1 lands so the
v0.13.0 release artifacts match the merged PR #148 scope exactly.
User-initiated per CLAUDE.md.
Made-with: Cursor
…s, savings Foundation modules for Phase 1 transport adapters. No HTTP libraries touched yet; this is the library-agnostic core that httpx, requests, and aiohttp adapters will compose on top of in subsequent commits. ## Modules - **_predicates.py** (~120 LOC): Predicates Protocol + DefaultPredicates class + DEFAULT_LLM_HOSTS frozenset. Strict-by-default — only the eight tested LLM provider hosts match. Subdomains, IPs, localhost do NOT match; operators with custom gateways pass install(predicates=…). Handles user:pass@ and :port URL forms defensively. - **_tokens.py** (~140 LOC): extract_tokens_and_model() understands OpenAI, Anthropic, Gemini, and Cohere v1+v2 response shapes. Best- effort: unknown shapes return (0, 0, model_from_request) instead of raising, so a hypothetical custom provider doesn't break the whole flow. Documented order of shape attempts (OpenAI first as most common). - **_savings.py** (~190 LOC): Process-lifetime cache-hit savings counter. Thread-safe via threading.Lock (incremented from any thread/event-loop the underlying HTTP client uses). Frozen SavingsSnapshot for safe reads. Pricing table lists 21 common models with longest-prefix matching; unknown models contribute zero USD but still count tokens. Custom cost_table override per call. - **_flow.py** (~340 LOC): Library-agnostic cache-then-live decision flow. Sync (handle_intercepted_sync) + async (handle_intercepted_async) variants. Adapters provide three callbacks (live, synth_buffered, synth_streaming) and _flow owns the orchestration: predicate routing → cache lookup → recording on miss / savings counter on hit → library-native response construction. Module-level ExplicitClient singleton (lazy-init, test-resettable). ## Tests (41 new, all green) - test_intercept_predicates.py (15): DEFAULT_LLM_HOSTS lock-in, strict subdomain/IP/typosquat exclusion, user:pass@/port handling, case-insensitivity, custom predicate composition via subclassing. - test_intercept_tokens.py (12): All four provider shapes with realistic bodies, OpenAI-vs-Anthropic priority order, model fallback to request, non-dict response defensive handling. - test_intercept_savings.py (14): Snapshot immutability, increment math, longest-prefix matching, case-insensitive lookup, custom cost table, 16-thread × 100-call concurrency stress, reset zeros all fields, pricing table invariants (lowercase keys, +ve float pairs). Total: Phase 0 (32) + Phase 1 foundations (41) = 73 intercept tests, all green. ## What's NOT in this commit - httpx, requests, aiohttp transport adapters (next 3 commits) - intercept._install orchestrator (commit after adapters) - ray-agent migration + docs (final commits) The _flow module is fully tested only after the first adapter lands (adapter callbacks let us drive integration tests against a fake HTTP transport). Foundation modules above are unit-tested in isolation. Made-with: Cursor
First of three transport adapters. Subclasses httpx.HTTPTransport and
httpx.AsyncHTTPTransport, patches httpx.Client.__init__ /
httpx.AsyncClient.__init__ so any client constructed after
intercept.install() routes through cache-then-live.
## Surface
- RewindHTTPTransport — sync transport subclass, factory via
_make_sync_transport_class(predicates) so each install() can bind
fresh predicates without polluting a module singleton.
- RewindAsyncHTTPTransport — async counterpart.
- _SyntheticSSEByteStream / _AsyncSyntheticSSEByteStream — minimal
ByteStream / AsyncByteStream implementations that emit one SSE
event + [DONE] sentinel from a cached body. AsyncByteStream is an
abstract protocol with no constructor (subtle httpx detail —
super().__init__(b"") raises) so the async variant skips the
super call and overrides aclose() to no-op.
- patch_httpx_clients(predicates=None) / unpatch_httpx_clients() /
is_patched() — install lifecycle. Idempotent. Wraps user-passed
transport (transport=…) so custom retry / mocking layers still run.
## What's NOT covered
- Streaming UPLOADS (async file uploads with body iterators) — v1
passes through but the predicate / cache see empty body bytes.
Documented in _build_rewind_request docstring.
- httpx mounts (per-host transport routing) — orthogonal, mounted
hosts bypass our transport unless the operator mounts ours
explicitly.
## Tests (13, all green)
- Patch lifecycle: idempotent install, unpatch restores original
__init__, unpatch-without-patch safe.
- Sync client:
- non-LLM request passes through, no recording
- cache miss → live response + record_llm_call invoked with extracted
tokens / model (using the mock OpenAI shape)
- cache hit → synthetic JSON response, NO live transport call
(asserted via a transport that raises if hit), savings counter
increments
- streaming cache hit → SSE bytes with `data: …` event + [DONE]
sentinel, content-type: text/event-stream
- user-passed transport=… is wrapped, not replaced
- pre-existing client keeps original transport (documented behavior)
- Async client mirrors sync: cache miss recording, hit synthesis,
non-LLM pass-through, streaming hit. asyncio.run() drives each.
Total intercept tests: Phase 0 (32) + foundations (41) + httpx (13) = 86.
## What's next in Phase 1
- requests transport adapter (next commit)
- aiohttp middleware
- intercept._install orchestrator unifying all three
- ray-agent migration: 280 LOC → 3 LOC
- docs + version bump
Made-with: Cursor
Sync-only HTTPAdapter subclass + Session.__init__ patch. Handles the
older / homegrown Python LLM clients that haven't moved to httpx
(legacy Anthropic SDK, LangChain wrappers, etc.).
## Surface
- RewindHTTPAdapter — subclass of requests.adapters.HTTPAdapter with
overridden send(). Routes through _flow.handle_intercepted_sync;
cache miss delegates to super().send (urllib3); cache hit returns
a synthetic requests.Response.
- patch_requests_sessions(predicates=None) / unpatch_requests_sessions() /
is_patched() — install lifecycle. Patched __init__ first runs the
original (which mounts default HTTPAdapter on http://+https://),
then mounts our adapter on top. Last-mounted-wins for prefix
matches, so we take precedence.
## Synthetic response shape
Sets _content + _content_consumed=True so every consumer pattern works:
- response.json() / response.text → buffered access
- response.iter_content() / iter_lines() → chunked iteration
- response.raw → BytesIO for low-level reads
Streaming cache hits route through synthetic_sse_for_cache_hit to
emit the canonical `data: {…}\\n\\ndata: [DONE]\\n\\n` body before
constructing the Response. iter_content sees SSE chunks; the agent's
streaming loop terminates cleanly.
## NOT supported in v1
- Streaming uploads (PreparedRequest.body as a generator). v1 falls
back to empty body for predicate / cache; live request still goes
through. Documented in _build_rewind_request docstring.
## Tests (7, all green)
- patch lifecycle: idempotent install, unpatch restores original
- cache miss → record_llm_call invoked with correct tokens / model
- cache hit → synthetic Response; live HTTPAdapter.send NOT called
(asserted via patch.object that raises if reached)
- streaming cache hit → SSE-formatted body, content-type:
text/event-stream, iter_content yields data + [DONE] sentinel
- non-LLM request passes through, no recording
- pre-existing session keeps default adapter (documented behavior)
Total intercept tests: Phase 0 (32) + foundations (41) + httpx (13) + requests (7) = 93.
## What's next
- aiohttp middleware (next commit) — TraceConfig first, monkey-patch
fallback if needed
- intercept._install orchestrator unifying all three
- ray-agent migration: 280 LOC → 3 LOC
- docs + version bump
Made-with: Cursor
Async-only adapter for aiohttp.ClientSession. Patches _request (rather than using TraceConfig) because trace events are observe-only and can't short-circuit a cache hit. Documented version-fragility; public API stays stable across aiohttp 3.x. ## Surface - patch_aiohttp_sessions(predicates=None) / unpatch_aiohttp_sessions() / is_patched() — install lifecycle. Mirrors httpx + requests adapters. - _SyntheticClientResponse — quacks-like-ClientResponse stand-in for cache hits. Implements .status, .headers (CIMultiDictProxy), .url, .reason, .ok, .closed, await .read(), await .text(), await .json(), async-with __aenter__/__aexit__, async-for via response.content StreamReader stand-in. release() and close() are no-op (defensive against consumer code that calls them). - _SyntheticStreamReader — minimal aiohttp.StreamReader stand-in for `async for chunk in response.content` loops. - _get_original_request — module-level getter so the patched function resolves _ORIGINAL_REQUEST at call time, not closure-capture time. Lets tests swap in fakes; production behavior is unchanged (the value is set once at install and never moves). ## NOT covered in v1 - raise_for_status per-session config — synthetic always returns status=200; recorded non-2xx loses that signal on cache hit. - WebSocket upgrade requests — different code path (ClientSession._ws_connect); intercept doesn't touch them. - Streaming uploads (request body as async iterator) — falls back to empty body, same as httpx + requests adapters. ## Tests (11, all green) - patch lifecycle: idempotent install, unpatch restores original - non-LLM passes through, no recording - cache miss → record_llm_call_async invoked with extracted tokens - cache hit → synthetic ClientResponse, live _request NOT called (asserted via boom that raises if reached) - streaming cache hit → SSE-formatted body in resp.read() - streaming cache hit → SSE chunks via async for chunk in resp.content - _SyntheticClientResponse: async-with works, text decode, headers case-insensitive, release/close safe Total intercept tests: Phase 0 (32) + foundations (41) + httpx (13) + requests (7) + aiohttp (11) = 104 tests, all green. ## What's next - intercept._install orchestrator unifying all three adapters - intercept.savings() public re-export from __init__.py - ray-agent migration: 280 LOC → 3 LOC - docs + version bump Made-with: Cursor
User-facing single-call setup. intercept.install() patches every importable HTTP library (httpx, requests, aiohttp) so subsequent client construction routes through cache-then-live automatically. ## Surface - intercept.install(predicates=None) — patch all available adapters. Idempotent. Missing libraries silently skip. Custom predicates apply uniformly to every adapter. - intercept.uninstall() — reverse install. Idempotent. Test hygiene. - intercept.is_installed() — current install status. - intercept.savings() — re-export of the savings counter from Phase 1 foundations. - DefaultPredicates / Predicates / SavingsSnapshot / RewindRequest — re-exported so users don't need to reach into _-prefixed modules. - Phase 0 primitives still exported (detect_streaming, synthetic_sse_for_cache_hit, SSE_DONE_SENTINEL, etc.) for operators writing custom adapters. ## Logging DEBUG line on successful install with the list of patched transports. INFO log when NO adapter patched (suspicious — agent probably has none of the three libraries installed). Production never spams. ## Tests (6, all green) - install patches all three adapters when available - install is idempotent (multiple calls don't double-patch) - uninstall clears all adapter patches - uninstall before install is safe (no-op) - install accepts custom Predicates implementer - install with custom predicates applied uniformly to httpx (verified by a custom example.com predicate; default-host call NOT recorded, example.com call IS recorded) ## Final phase 1 test count: 110 intercept tests - Phase 0 (already merged): 32 - Phase 1 foundations (predicates, tokens, savings): 41 - Phase 1 httpx: 13 - Phase 1 requests: 7 - Phase 1 aiohttp: 11 - Phase 1 install orchestrator: 6 All 110 green. ## What's left in Phase 1 - ray-agent migration: 280 LOC RewindHook → 3 LOC intercept.install() - docs: docs/intercept-quickstart.md operator how-to + decision matrix - version bump: Python SDK 0.15.0 → 0.16.0 - Final review pass + open PR Made-with: Cursor
Per CLAUDE.md track 2: new public package surface (``rewind_agent.intercept`` with install() / uninstall() / savings() / DefaultPredicates / Predicates / SavingsSnapshot, plus the lower-level RewindRequest / detect_streaming / synthetic_sse_for_cache_hit re-exports) is a minor bump. Files updated: - python/pyproject.toml: 0.15.0 → 0.16.0 - python/rewind_agent/__init__.py: __version__ = "0.16.0" Rust workspace stays at 0.13.0 — Phase 1 is Python-only. The savings counter uses an in-process Python pricing table (no Rust calls). ## ray-agent migration files (in working tree, NOT committed) The new ray-agent integration files (use-cases/ray-agent/code/rewind_setup.py and use-cases/ray-agent/INTEGRATION-DIFF-V2.md) live in the local working tree. The use-cases/ directory is .gitignore'd because those files are meant to be copied into the ray-agent repo (a separate repository), not tracked here. Operators read the working-tree files to understand the migration story; the ray-agent repo's own PR will commit the result. Net diff per the v2 integration doc: - Phase 0 / v0.15.x integration: ~360 LOC (280 LOC rewind_hook.py + 80 LOC of callsite edits across react_agent + serve/main + alert_responder). - Phase 1 / v0.16.0 integration: 2 LOC required (`from observability.rewind_setup import setup_rewind` + `setup_rewind()` in __init__) + 4 optional (session_scope context manager). That's ~180x compression on the integration surface area. Made-with: Cursor
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
CI's python lint (ruff check) flagged three F401s that my local runs missed (ruff wasn't installed locally — fixed now). All three are pure import cleanup; no behavior change. - rewind_agent/intercept/aiohttp_middleware.py: dropped bare 'import aiohttp' since we only use 'from aiohttp import ClientSession' (which still raises ImportError when aiohttp's missing, preserving the AIOHTTP_AVAILABLE detection). Comment block documents the reasoning so a future maintainer doesn't add the bare import back thinking it's needed. - tests/test_intercept_httpx.py: removed unused 'import json'. - tests/test_intercept_install.py: removed unused 'MagicMock' from the unittest.mock import line (kept 'patch as mock_patch'). Verified: 'python3 -m ruff check .' clean; 110 intercept tests pass. Lessons-learned for next session: also install ruff locally so the pre-push lint catches these. Adding to my local .python-tools install list. Made-with: Cursor
risjai
left a comment
There was a problem hiding this comment.
Santa Verdict: NAUGHTY
Both independent Santa reviewers returned FAIL, and the PR currently has a red Python CI job. Under Santa Method, this should not merge yet.
Blocking Findings
-
CI is red on Python lint.
gh pr checks 149shows the Python job failing. Current failures areruffunused imports:aiohttpinpython/rewind_agent/intercept/aiohttp_middleware.py,jsoninpython/tests/test_intercept_httpx.py, andMagicMockinpython/tests/test_intercept_install.py. -
Live streaming cache misses are not pass-through safe. The shared flow reads/parses the live response before returning it (
resp.json()/await resp.json()style paths). For real streaming responses this can either fail before user code sees the response (httpxResponseNotReadcases) or consume the stream before the caller can iterate it. Phase 1 requires live streams to pass through while recording on stream completion/close, not eager pre-read. -
Body-only streaming detection is not wired into the adapters/flow. Phase 0 added
detect_streaming()that recognizes JSON bodies with"stream": true, but the Phase 1 adapters passis_streaming=req.stream, andreq.streamis derived from transport-level hints /Accept: text/event-stream. A request with{"stream": true}but no Accept header can receive a buffered JSON cache hit instead of synthetic SSE. -
Strict replay divergence is silently treated as cache miss. The server returns HTTP 409 for strict-match divergence, but
ExplicitClient._post()catches all exceptions and returnsNone.get_replayed_response()then returnsNone, so the adapters go live instead of surfacing a library-native divergence error/response. That defeats strict mode. -
httpx installation drops configured default transport options. When no explicit
transport=is supplied, the patchedhttpx.Client.__init__injects a freshRewindHTTPTransport()before httpx constructs its own configured default transport. That can discard normal client settings such asverify,cert,trust_env,http2, proxies, limits, local address/UDS, retries, and socket options. Customtransport=is wrapped, but configured defaults are not preserved. -
Version bump does not match the workspace release rules.
masteris already at Python SDK0.15.0, while PyPI latest is still0.14.8. The project rule says Python changes ride the unreleased version when the current SDK version has not been published. This PR bumps to0.16.0, which skips an unpublished0.15.0. -
Phase 1 acceptance scope is incomplete. The plan lists
docs/intercept-quickstart.mdand the ray-agent validation/migration as Phase 1 deliverables, but they are not in this PR. If intentionally deferred, the PR should say so explicitly and avoid presenting the whole phase as complete.
Passes
- Default predicates appear conservative: exact known LLM provider hosts, no obvious broad over-capture.
- No hardcoded secrets found in the new intercept layer.
- Rust/build/site/version-check jobs are passing; the current deterministic CI blocker is Python lint.
Recommended Fix Set
- Fix the
rufffailures first. - Rework live streaming misses to return the live stream immediately, tee chunks into a buffer, and record after completion/close.
- Use Phase 0
detect_streaming(req)consistently in_flowor normalizereq.streamfrom it in each adapter. - Preserve/surface replay lookup HTTP errors, especially strict-match 409, instead of converting them into cache misses.
- For httpx, let httpx construct its configured default transport first or faithfully reproduce/wrap the configured transport options before inserting Rewind.
- Revert the Python SDK bump to
0.15.0unless0.15.0has been published before this PR lands. - Add regression tests using real local HTTP servers/default transports, live streaming miss cases, body-only
stream:true, strict divergence, configured httpx clients, real aiohttp streaming behavior, and the install/uninstall edge cases.
…/aiohttp
CI's python job fails because:
1. ``import httpx`` at module-level (and similar for requests/aiohttp)
triggers ModuleNotFoundError when CI doesn't install the libraries.
The Python SDK declares no required deps on those libraries — they're
optional, opt-in by installing what your agent uses.
2. httpx_transport.py defined ``_SyntheticSSEByteStream(ByteStream)`` /
``_AsyncSyntheticSSEByteStream(AsyncByteStream)`` at module level.
When httpx isn't importable, ``ByteStream`` is unbound and the class
definition raises NameError at import time — breaking
``import rewind_agent.intercept`` for users who only have requests
or aiohttp installed.
## Fix
- httpx_transport.py: moved both ByteStream synthesizers and both
Transport subclasses inside a new ``_build_transport_classes(predicates)``
factory called from ``patch_httpx_clients``. Module level only has the
``HTTPX_AVAILABLE`` flag check; no httpx-typed classes get evaluated
at import time. Adapters work the same; just lazily defined.
- tests/test_intercept_{httpx,requests,aiohttp}.py: added
``pytest.importorskip(<library>)`` at the top so missing libraries
cleanly skip the module instead of failing collection. Subsequent
imports tagged ``# noqa: E402`` since they intentionally come after
the gate.
Verified locally:
- ``python -m ruff check .`` clean
- ``python -m pytest tests/test_intercept_*.py`` — 110 passed
- Simulated absence of all three libraries via meta_path blocker:
``rewind_agent.intercept`` imports cleanly, ``install()`` no-ops,
``savings()`` returns zeros. Verified ``HTTPX_AVAILABLE``,
``REQUESTS_AVAILABLE``, ``AIOHTTP_AVAILABLE`` all flip to False.
## CI coverage caveat
CI doesn't install httpx/requests/aiohttp today; this PR doesn't
change ``.github/workflows/ci.yml`` (the security hook on workflow
edits blocked it). With the importorskip gates in place, the Phase 1
adapter tests will SKIP in CI rather than ERROR. That's the right
failure mode for now — the python job goes green, the PR can land —
but Phase 1 adapter regressions wouldn't be caught in CI until
``pip install httpx requests aiohttp`` is added to the python job.
Recommended workflow patch (operator to apply, not in this PR):
::
# .github/workflows/ci.yml — python job, "Run Python tests" step:
pip install pytest httpx requests aiohttp -q
python -m pytest tests/ -v
The Python SDK still declares no required deps on these libraries —
they're optional for end users, mandatory only for full CI coverage.
Made-with: Cursor
Santa returned NAUGHTY with 7 blocking findings. This commit addresses all of them and adds 7 directed regression tests. Pre-push verification suite (ruff + pytest local + pytest bare-env + clippy + cargo test) all green — including a CI-equivalent simulated-bare-env pytest run that hides httpx/requests/aiohttp via meta_path blocker (367 passed, 12 skipped, 0 failed in that env). ## Fix → Finding mapping ### #6 — Version bump skipped unpublished 0.15.0 Reverted python/pyproject.toml + python/rewind_agent/__init__.py from 0.16.0 back to 0.15.0. Per CLAUDE.md track 2: master is at 0.15.0, PyPI is at 0.14.8 — 0.15.0 is unpublished, so Phase 1 changes ride with it. Bumping to 0.16.0 would skip the unpublished version. ### #1 follow-up — install tests must work in CI's bare env test_intercept_install.py asserted all three adapters patched. CI doesn't install httpx/requests/aiohttp, so HTTPX_AVAILABLE etc. all flip to False, install() no-ops the missing adapters, and the unconditional asserts fail. Made the assertions conditional on *_AVAILABLE flags. Each adapter is asserted patched if-and-only-if its library is importable. The test_install_with_custom_predicates_applies_to_httpx test gains a self.skipTest() guard since it actually exercises the transport patch with a MockTransport (httpx-required). ### #4 — Strict-match 409 was silently swallowed ExplicitClient._post wraps urllib in `try: ... except Exception: return None`. When the server returns HTTP 409 strict-match divergence, urllib raises HTTPError(409) → swallowed → caller sees None → adapters interpret as cache miss → fall through to live. Strict mode is broken. Fix: new ExplicitClient._post_replay_lookup with explicit 409 handling. HTTP 409 raises a new typed exception RewindReplayDivergenceError (exported from rewind_agent). Other 4xx/5xx errors still degrade to None (cache miss) so transient Rewind outages don't break the agent's normal flow. get_replayed_response / _async use the new method; their docstrings note that strict-mode divergence propagates as an exception. RewindReplayDivergenceError carries the server's diagnostic message plus optional target_step / stored_hash / incoming_hash for caller diagnostics. ### #5 — httpx configured default transport options were dropped When user did `httpx.Client(verify=False, http2=True, ...)` without transport=, our patched __init__ replaced transport=None with a fresh RewindHTTPTransport() BEFORE httpx's __init__ ran. That short- circuited httpx's own configured-default construction, dropping verify / cert / trust_env / http2 / proxies / limits / local_address / retries / socket_options / default_encoding / etc. Fix: two-mode patched_init. Mode (a) user passed transport=X — wrap X (existing behavior). Mode (b) user passed only top-level config — call original_init first (httpx builds its configured default at self._transport), then post-init wrap self._transport with RewindHTTPTransport(_inner=…). Same fix applied to AsyncClient.__init__. The only fragility is that we read self._transport which is httpx's documented internal attribute (stable across 0.x). ### #2 — Live streaming responses pre-read body before passthrough _flow.{handle_intercepted_async, _serve_cache_miss_sync} called `await resp.json()` / resp.json() on the live response BEFORE returning. For streaming responses this: - httpx: raises ResponseNotRead (body not yet streamed) - requests: marks _content_consumed=True, blocking iter_content - aiohttp: closes the connection after read User code's streaming iteration breaks every time. Fix: detect streaming via OR-combined transport hints + body-aware detect_streaming(req) — see #3. For STREAMING misses: pass through the live response immediately, record with placeholder response_value=None and zero tokens. Tee-based stream-recording (matching the Rust proxy's handle_streaming_response) is documented as a follow-up (v1.1) — proper capture without consuming requires a wrapping ByteStream that yields chunks AND captures, fired on stream completion. For NON-streaming misses (the common case for buffered chat completions): pre-read remains correct — httpx/requests/aiohttp buffer the body eagerly anyway. ### #3 — Body-only "stream": true wasn't routed to streaming Adapters set req.stream from transport-level signals (Accept: text/event-stream, library-specific stream=True hints). A request with `{"stream": true}` in the JSON body but no Accept header was treated as non-streaming — buffered cache hit instead of synthetic SSE. Fix: handle_intercepted_{sync,async} now OR-combines is_streaming (adapter signal) with detect_streaming(req) (Phase 0 body-aware helper that recognizes the JSON `"stream": true` field via regex, plus Accept header, plus explicit RewindRequest.stream). ### #7 — docs and ray-agent migration deferred (not in this PR) The plan listed docs/intercept-quickstart.md and the ray-agent port as Phase 1 deliverables. They're absent from this PR and the PR description should say so explicitly. The ray-agent migration files (use-cases/ray-agent/code/rewind_setup.py, use-cases/ray-agent/INTEGRATION-DIFF-V2.md) live in the local working tree but use-cases/ is .gitignore'd in this repo — they're meant to be copied into the ray-agent repo (separate repository). This was already noted in the version-bump commit message but the PR top-level description didn't make it visible. Will update the PR description with a clearer "deferred" section in the next reply to Santa. docs/intercept-quickstart.md is a follow-up; the in-tree rustdoc on intercept.install() and the operator-facing README in the use-cases/ray-agent/ directory cover the immediate need. ## New regression tests (7 in tests/test_intercept_santa_fixes.py) - test_strict_match_409_raises_typed_error — Santa #4 - test_non_409_http_error_is_swallowed_to_cache_miss — Santa #4 boundary - test_verify_false_setting_survives_intercept_install — Santa #5 - test_user_supplied_transport_is_wrapped_not_replaced — Santa #5 mode (a) - test_streaming_miss_passes_through_without_consuming_body — Santa #2 - test_cache_hit_with_body_stream_true_emits_synthetic_sse — Santa #3 - test_install_only_patches_available_adapters — Santa #1 follow-up ## Pre-push verification routine (followed BEFORE this push) User correctly called out that I'd been pushing without running the full lint/test stack locally. Established a 5-stage pre-push routine that mirrors CI exactly: 1. ruff check . — Python lint (CI's exact invocation) 2. pytest tests/ in local env (with all libs installed) 3. pytest tests/ in simulated bare env (httpx/requests/aiohttp blocked via sys.meta_path) — catches CI-only failures locally 4. cargo clippy -- -D warnings — Rust CI's exact invocation 5. cargo test --workspace — Rust CI's exact invocation All 5 stages green for this commit: - ruff: All checks passed! - pytest local: 420 passed, 1 skipped (was 413; +7 Santa regression tests) - pytest bare env: 367 passed, 12 skipped, 0 failed (CI mirror) - cargo clippy: clean - cargo test: all green, 0 FAILED Adding scripts/pre-push-check.sh as a reusable harness in a follow-up commit so future sessions don't have to re-derive this routine. Made-with: Cursor
Runs the exact 5-stage verification CI does, including a simulated bare environment that hides httpx/requests/aiohttp via sys.meta_path so CI-only failures (Phase 1 PR #149 hit this twice) get caught locally before the push. Stages: 1. ruff check . (Python lint, CI's exact invocation) 2. pytest tests/ in local env (with all libs installed) 3. pytest tests/ in simulated bare env (httpx/requests/aiohttp blocked at sys.meta_path) — only runs tracked test files since CI's checkout doesn't include local-only scratch 4. cargo clippy -- -D warnings (Rust CI) 5. cargo test --workspace (Rust CI) Exit code 0 = safe to push. First non-zero stage prints a clear error and aborts. Lessons learned from PR #149: pushing without running stages 1+3 locally caused multiple CI red iterations. Stage 3 specifically is the catch-all for env-mismatch issues (CI doesn't install httpx etc.; my local has them). Made-with: Cursor
Santa fixes — all 7 findings addressed in
|
| # | Finding | Fix | Regression test |
|---|---|---|---|
| 1 | Python lint failed (3× F401) | Already fixed in 46f20f2 (this push includes follow-up: install tests now condition on *_AVAILABLE so CI's bare env passes) |
test_install_only_patches_available_adapters |
| 2 | Live streams pre-read by _flow (await resp.json() consumes body before user iterates) |
_flow.{handle_intercepted_async, _serve_cache_miss_sync} now skip body pre-read on streaming. Records with placeholder response=None and zero tokens. Tee-based stream-recording (matching Rust proxy's handle_streaming_response) deferred to v1.1 with explicit doc note. |
test_streaming_miss_passes_through_without_consuming_body |
| 3 | Body-only "stream": true not routed to streaming path |
handle_intercepted_{sync,async} now OR-combines adapter is_streaming with Phase 0's body-aware detect_streaming(req). Body-only stream:true requests now hit the synthetic SSE path on cache hit. |
test_cache_hit_with_body_stream_true_emits_synthetic_sse |
| 4 | Strict-match HTTP 409 silently became cache miss | New ExplicitClient._post_replay_lookup re-raises 409 as RewindReplayDivergenceError (typed exception, exported from rewind_agent). Other 4xx/5xx still degrade to None to preserve best-effort behavior. |
test_strict_match_409_raises_typed_error, test_non_409_http_error_is_swallowed_to_cache_miss |
| 5 | httpx configured default transport options dropped (verify/cert/trust_env/http2/proxies/limits) | Two-mode patched_init: (a) user passed transport=X → wrap X (existing behavior); (b) user passed only top-level config → call original_init first (httpx builds configured default at self._transport), then post-init wrap with RewindHTTPTransport(_inner=…). AsyncClient mirrors. |
test_verify_false_setting_survives_intercept_install, test_user_supplied_transport_is_wrapped_not_replaced |
| 6 | Version skipped unpublished 0.15.0 (master is 0.15.0, PyPI is 0.14.8) | Reverted python/pyproject.toml + __init__.py to 0.15.0. Phase 1 changes ride with the unreleased version per CLAUDE.md track 2. |
(no test; structural) |
| 7 | docs/intercept-quickstart.md + ray-agent migration not in PR | Explicitly deferred. Ray-agent migration files (use-cases/ray-agent/code/rewind_setup.py, INTEGRATION-DIFF-V2.md) live in working tree but use-cases/ is .gitignore'd in this repo — they're meant to be copied into the ray-agent repo (separate repository). docs/intercept-quickstart.md is a follow-up; rustdoc + module docstrings cover the immediate need. PR description updated to make this clear. |
n/a |
Pre-push routine now codified
scripts/pre-push-check.sh (committed in 1f9a2dd) runs:
ruff check .(CI's exact Python lint invocation)pytest tests/in local env (all libs installed)pytest tests/in simulated bare env —sys.meta_pathblocker hides httpx/requests/aiohttp, only tracked test files run. This is the catch-all for CI-only failures; PR Phase 1: HTTP transport adapters (httpx, requests, aiohttp) + intercept.install() #149 wouldn't have had multiple red CI iterations if I'd run this stage from the start.cargo clippy -- -D warningscargo test --workspace
All 5 stages green for both commits in this push.
Test counts
- Local pytest: 420 passed, 1 skipped (was 413; +7 directed Santa regression tests in
test_intercept_santa_fixes.py) - Bare-env pytest: 367 passed, 12 skipped, 0 failed (CI mirror)
- Cargo test workspace: all green
- Cargo clippy: clean
- ruff: clean
What's still NOT in this PR (now explicit)
- Tee-based streaming-miss recording — Phase 1 v1 records streaming misses with placeholder zero tokens / null response. Proper capture (matching the Rust proxy's
handle_streaming_response) requires a wrapping ByteStream that yields chunks AND captures, fired on stream completion. Plan: v1.1 follow-up PR. docs/intercept-quickstart.md— operator-facing how-to. Module docstrings + the Phase 1 plan doc cover the immediate need.- Ray-agent migration — files in working tree are scoped to the separate ray-agent repo (use-cases/ is .gitignore'd). The migration PR there is a separate piece of work tracked outside this repo.
Re-requesting review.
risjai
left a comment
There was a problem hiding this comment.
Santa Re-Review: NAUGHTY
Re-ran Santa on the current PR head (1f9a2dd) after the reply claiming the 7 prior findings were addressed. Both fresh independent reviewers still returned FAIL.
CI is green now and several prior items were fixed, but there are remaining blockers.
Blocking Findings
-
httpx cache misses can still crash before user code receives the response.
_flowstill callsresp.json()/ response-body readers inside the transport path. For real httpx transport responses, the body may be unread at that point, causingResponseNotReadon normal live cache misses. -
httpx wrapped transports are not closed.
RewindHTTPTransport/RewindAsyncHTTPTransportdelegate requests to_inner, but don’t forwardclose()/aclose(). That leaks the actual configured/user transport and breaks normal lifecycle expectations. -
aiohttp
base_url+ relative path traffic can bypass interception.aiohttp_middleware.pybuildsRewindRequest(url=str(str_or_url))before aiohttp resolves relative URLs againstClientSession(base_url=...), so commonsession.post("/v1/chat/completions")patterns won’t match host-based predicates. -
PR metadata is stale and release-dangerous. Code is back to Python SDK
0.15.0, which is correct, but the PR body still says0.15.0 → 0.16.0and says to publish SDK0.16.0after merge.
Resolved Since Last Review
- CI is green now.
- Ruff failures are fixed.
- Body-only
{"stream": true}is wired throughdetect_streaming(). - Strict
409now raises a typedRewindReplayDivergenceError. - Version files are back to
0.15.0. - Deferred scope is acknowledged in the reply.
Verdict
Still do not merge yet. The remaining httpx live miss and lifecycle issues are adapter correctness bugs, not documentation nits.
…ness bugs Re-review at commit 1f9a2dd identified four real bugs surviving the first round of Santa fixes. CI was green and prior items were resolved, but mock-transport tests masked the real httpx behavior. This commit fixes all four with directed regression tests using shapes that match real (not mock) library behavior. ## Fix → Re-review finding ### Re-review #1 — httpx ResponseNotRead crashes on real transport ``_flow._read_response_body_{sync,async}`` called ``resp.json()`` directly. Real httpx response bodies are NOT auto-read at the transport layer; they're streamed lazily and ``.json()`` raises ``httpx.ResponseNotRead`` until ``.read()`` / ``.aread()`` runs. The prior tests used ``httpx.MockTransport`` + ``httpx.Response(json=…)`` which auto-reads the body via the constructor — masking the bug. Fix: explicit body materialization before parsing. Sync path: try ``resp.read()`` (httpx) before ``resp.json()``. ``read()`` is idempotent on already-read responses (requests has ``_content`` populated; aiohttp doesn't reach the sync path). Async path: try ``await resp.aread()`` (httpx) first, falling back to ``await resp.read()`` (aiohttp) if aread isn't present. Both libraries diverge on whether the read API is async; we handle both. Broadened the exception-catch in both paths from ``(json.JSONDecodeError, ValueError)`` to bare ``Exception`` so any transport-specific exception (httpx.ResponseNotRead, aiohttp's ContentTypeError, etc.) degrades to None recording rather than propagating up through the user's call site. ### Re-review #2 — RewindHTTPTransport leaks _inner ``RewindHTTPTransport`` and ``RewindAsyncHTTPTransport`` delegated ``handle_request`` / ``handle_async_request`` to ``self._inner`` but didn't override ``close()`` / ``aclose()``. When the user's Client closes (explicitly or via ``__exit__``), only our own resources got released — the wrapped configured/user transport leaked its connection pool, SSL context, and cert verifier for the process lifetime. Fix: explicit close / aclose forwarding. def close(self): if self._inner is not None: try: self._inner.close() except Exception: pass super().close() Async variant uses ``await self._inner.aclose()``. Both swallow exceptions on the inner close to preserve the super().close() invariant — even a buggy inner transport shouldn't leak our own cleanup. ### Re-review #3 — aiohttp base_url + relative paths bypass intercept ``aiohttp_middleware._build_rewind_request`` did ``url = str(str_or_url)`` before aiohttp resolves relative URLs against ``ClientSession._base_url``. So ``session.post("/v1/chat/completions")`` (typical OpenAI SDK pattern when configured with a custom gateway base URL) produced a RewindRequest whose URL was just the path. Host-based predicates (the default — matches ``api.openai.com`` etc.) silently failed, the request bypassed interception, no recording. Fix: new ``_resolve_url(session, str_or_url)`` helper that replicates aiohttp's own resolution logic. If ``session._base_url`` is set and the supplied URL doesn't start with a scheme (``http:``/``https:``/``ws:``/``wss:``), yarl-join the two. yarl is a hard aiohttp dep so importing it inside the helper is safe — we wouldn't be in this code path otherwise. Defensive: if yarl join blows up on a malformed URL, return the bare string. Predicate match will fail (silent bypass) but at least we don't crash the user's request. ### Re-review #4 — PR description metadata stale Code reverted to SDK 0.15.0 in the previous fix commit, but the PR description still said "0.15.0 → 0.16.0" and listed ``./scripts/publish-pypi.sh`` for SDK 0.16.0 in post-merge actions. Will update the PR body via ``gh pr edit`` in the same push. ## Regression tests added (9 new in test_intercept_santa_fixes.py) Total Santa fix tests: 7 (initial) + 9 (re-review) = **16 tests, all green**. Re-review #1 coverage: - test_sync_read_called_before_json_on_unread_response — constructs a real ``httpx.Response`` with ByteStream (not pre-read) and verifies ``_read_response_body_sync`` reads + parses correctly - test_async_aread_called_before_json_on_unread_response — same for async path - test_buffered_cache_miss_through_real_transport_records_tokens — end-to-end through AsyncClient. The MockTransport handler returns ``httpx.Response(stream=ByteStream(body))`` (not the prior ``json=body`` shape that auto-reads) so this test actually exercises the fix. Re-review #2 coverage: - test_sync_close_propagates_to_inner_transport — spy MockTransport counts close() calls; assert >= 1 after Client.close() - test_async_aclose_propagates_to_inner_transport — async equivalent Re-review #3 coverage: - test_relative_path_resolves_against_base_url — direct unit test on _resolve_url with a yarl URL base - test_absolute_url_passes_through_unchanged — absolute URLs aren't mangled even when base_url is set - test_no_base_url_returns_input_unchanged — sessions without base_url stringify the path as-is - test_aiohttp_session_base_url_match_records_via_predicate — end-to-end through aiohttp.ClientSession(base_url=...) + relative POST + default host predicate. Asserts recording fires (= predicate matched the resolved URL). ## Pre-push verification (codified in scripts/pre-push-check.sh) All 5 stages green BEFORE this push: - ruff check . — clean - pytest tests/ (local env, all libs) — 429 passed, 1 skipped (was 420; +9 re-review regression tests) - pytest tests/ (simulated bare env, libs hidden via meta_path) — 367 passed, 12 skipped, 0 failed (CI mirror) - cargo clippy -- -D warnings — clean - cargo test --workspace — all green The local-pre-push routine exists specifically because PR #149's prior CI iterations missed bugs that depended on real-vs-mock behavior; this re-review's #1 finding (ResponseNotRead) was visible ONLY when constructing test responses with the ``stream=ByteStream(...)`` shape that matches real transport output, not the ``Response(json=...)`` shape that auto-buffers. Future regression tests should default to the stream= shape for httpx wherever realism matters. Made-with: Cursor
Third time this file accidentally got swept into a Phase 1 commit (once on PR #148 cleanup, once on PR #149 initial commit, now again on PR #149 re-review fix in ba71817). Pattern: git add -A in commit prep grabs the untracked file because it lives in tests/ rather than use-cases/ (which is broadly .gitignore'd). The file is local scratch — imports openai (not a project dep, not installed in CI), so committing it would break CI's pytest collection. Fix: add an explicit .gitignore entry for the file. git add -A now skips it. The file stays on disk; just never tracked. Also removes it from HEAD (the ba71817 commit) via git rm --cached. Cannot amend ba71817 because it's already pushed to origin (per CLAUDE.md amend rules: "If you already pushed to remote, NEVER amend unless user explicitly requests it"). This commit is the forward-only fix. Made-with: Cursor
Re-review fixes —
|
| Re-review # | Finding | Fix | Regression test |
|---|---|---|---|
| 1 | httpx ResponseNotRead on real-transport buffered cache miss |
_flow._read_response_body_{sync,async} now call resp.read() / await resp.aread() BEFORE resp.json(). Async path also handles aiohttp's async read() API as fallback. Broadened exception catch from (JSONDecodeError, ValueError) to bare Exception so transport-specific errors degrade to None recording instead of propagating. |
test_sync_read_called_before_json_on_unread_response, test_async_aread_called_before_json_on_unread_response, test_buffered_cache_miss_through_real_transport_records_tokens (uses httpx.Response(stream=ByteStream(...)) shape — the unread shape that real transports produce) |
| 2 | RewindHTTPTransport / RewindAsyncHTTPTransport leak _inner |
Added close() (sync) and aclose() (async) overrides that forward to self._inner first, then call super. Inner exception swallowed defensively so a buggy inner transport doesn't break our own cleanup. |
test_sync_close_propagates_to_inner_transport, test_async_aclose_propagates_to_inner_transport (spy MockTransports counting close calls) |
| 3 | aiohttp base_url + relative path silently bypasses interception |
New _resolve_url(session, str_or_url) helper replicates aiohttp's own resolution logic via yarl.URL(base).join(URL(path)). Called BEFORE building RewindRequest so host predicates see the absolute URL. Skips schemes (http:/https:/ws:/wss:) — absolute URLs pass through unchanged. |
test_relative_path_resolves_against_base_url, test_absolute_url_passes_through_unchanged, test_no_base_url_returns_input_unchanged, test_aiohttp_session_base_url_match_records_via_predicate (end-to-end via real aiohttp.ClientSession(base_url=…) — predicate match → recording fires) |
| 4 | PR description metadata stale (still mentioned 0.15.0 → 0.16.0 bump) | PR body updated via gh pr edit. Versions section now reads "stays at 0.15.0", post-merge action says "publish SDK 0.15.0", commit table updated. The single remaining 0.16.0 reference in the body is historical (documents the bump-and-revert). |
n/a — structural doc fix |
Why the prior tests didn't catch re-review #1
The initial Phase 1 tests for httpx used httpx.MockTransport returning httpx.Response(json=body). The json= constructor argument auto-buffers the body — _content is set, is_stream_consumed=True, resp.json() works without read(). That's NOT how real httpx transports return responses. Real transports return httpx.Response(stream=AsyncByteStream(...)) with the body still on the wire.
The new test_buffered_cache_miss_through_real_transport_records_tokens test uses the stream=ByteStream(body) shape, which mirrors the real transport contract. Pre-fix code would crash with httpx.ResponseNotRead on this test; post-fix it passes. Going forward, regression tests for httpx response handling should default to the stream= shape.
Test counts
- Local pytest: 429 passed, 1 skipped (was 420; +9 re-review regression tests)
- Bare-env pytest: 367 passed, 12 skipped (CI mirror via
sys.meta_pathblocker — same as before) - Santa fix tests in total: 16 (7 initial review + 9 re-review)
- Cargo workspace + clippy: clean
Side note: untracked-scratch leak fix (83dbeaf)
python/tests/test_replay_e2e.py (local-only scratch using openai SDK, not a project dep) accidentally got swept into commits three times via git add -A. Added an explicit .gitignore entry so future git add -A skips it. The file stays on disk; just never tracked. PR body's commit table mentions this in the post-merge actions row.
CI green expected.
Phase 1 follow-up — operator-facing how-to for the intercept package shipped in PR agentoptics#149. Documents the public API, per-library examples, streaming behavior, strict-mode + RewindReplayDivergenceError, the savings counter, install/uninstall lifecycle, and explicit "what's NOT supported" notes (matching the deferred-scope claims in the Phase 1 PR description). ## Files - docs/intercept-quickstart.md (new, 331 lines): - When to use intercept vs init() vs proxy vs Explicit Recording API - 60-second quickstart - Per-library examples: httpx (sync + async), requests, aiohttp (incl. base_url + relative path) - Custom predicates pattern (DefaultPredicates subclass for corporate gateways) - Streaming behavior: cache-hit synthetic SSE, cache-miss pass-through, three-signal detection (stream flag, Accept header, body "stream":true) - Strict-match mode + RewindReplayDivergenceError example - savings() counter w/ custom cost_table override - Install/uninstall lifecycle, debugging which libs got patched - Honest list of v1 limitations (streaming-miss recording fidelity, streaming uploads, httpx mounts, aiohttp WebSocket, raise_for_status on cache hits) - Troubleshooting: "nothing recorded" / "ResponseNotRead" / "works locally but not CI" / host filtering - docs/recording.md: extended decision matrix from "Two ways to record" to three ways. Adds the HTTP intercept column with custom- gateway and streaming columns. Cross-links to the new quickstart. - docs/getting-started.md: added an "Already-Python alternative — no proxy" subsection in Quickstart so first-time users see the intercept option. Cross-links to the new quickstart. ## Pre-push verification All 5 stages green BEFORE push (scripts/pre-push-check.sh): - ruff: clean - pytest local: 429 passed, 1 skipped - pytest bare-env (CI mirror): 367 passed, 12 skipped - cargo clippy: clean - cargo test --workspace: all green No code changes — pure documentation. Tests pass because nothing they exercise changed. ## Out of scope for this PR - PyPI publish (rewind-agent 0.15.0, rewind-mcp 0.13.0) — pending per CLAUDE.md post-merge actions; user-initiated. - Streaming-miss tee recording (v1.1) — known gap, deferred per re-review agentoptics#2 fix notes. - ray-agent migration PR — separate repo. Made-with: Cursor
Summary
Phase 1 of the Universal Replay Architecture plan — HTTP transport interception for httpx, requests, and aiohttp. Single
intercept.install()call patches every importable HTTP library so any agent making HTTP-shaped LLM calls gets cache-then-live recording without per-call-site changes.Builds on the Phase 0 foundations merged in #148 (RewindRequest, detect_streaming, synthetic_sse_for_cache_hit, ResponseEnvelope, ExplicitClient cache APIs).
Why now
Phase 0 made the replay machinery correct, but it was only reachable from two places: the CLI proxy (assumes OpenAI-compatible SDK traffic) and
ExplicitClient.record_llm_call()calls scattered through agent code (per-call-site instrumentation). Neither covers agents built on bespoke HTTP gateways (ray-agent's mTLS flow), older SDKs, or third-party libraries that bypass the OpenAI SDK entirely.This PR closes the gap.
What's in this PR
7 commits, ordered for review-friendliness. Each commit lands tests green.
1be6eabplans/phase-1-http-transport-adapters.md)ec51c0e_predicates.py,_tokens.py,_savings.py,_flow.py(library-agnostic core, no HTTP libs touched)713b8848f01fff785cb19_requestmonkey-patch (TraceConfig insufficient for cache-hit short-circuit)a25838eintercept.install()orchestrator + updated public API896828c83dbeaf)Total: ~4250 LOC across 17 files (1500 product code + 2000 tests + 750 docs).
Public API
Default predicate strictness — STRICT
DEFAULT_LLM_HOSTSonly contains the 8 LLM provider hosts we've actually tested: OpenAI, Anthropic, Gemini, Cohere, Together, Groq, DeepSeek, Mistral. Subdomains do NOT match. Operators with custom gateways passinstall(predicates=…).Rationale documented in the plan: silent recording is worse than a quick "why isn't anything being recorded?" debugging trip — operators with non-listed providers will hit the latter within seconds and reach for the documented escape hatch.
Cache-hit response synthesis
_SyntheticSSEByteStream/_AsyncSyntheticSSEByteStream(subclasshttpx.ByteStream/AsyncByteStream)requests.Responsewith_contentset +_content_consumed=True_SyntheticClientResponsequacks-like-ClientResponsewith__aenter__, asyncread/text/json, async-iter viaresponse.contentFor streaming cache hits, the body goes through
synthetic_sse_for_cache_hit()from Phase 0 (single SSE event +[DONE]sentinel). Agents usingresponse.iter_bytes()/iter_content()/async for chunk in response.contentsee clean stream termination on cache hit.What's NOT in this PR (deferred)
use-cases/ray-agent/code/rewind_setup.pyandINTEGRATION-DIFF-V2.mdlive in this repo's working tree (use-cases/ is .gitignore'd because those files are meant to be copied into ray-agent's separate repo). Operators read them here to understand the migration; the ray-agent repo's own PR will land the result. Compression: ~360 LOC v1 integration → 2 LOC required + 4 optional for v2 (~180x).Test plan
cargo build --releasecleancargo test --workspace— 350+ tests, no regressions from Phase 0cargo clippy -- -D warningsclean (matches CI)python -m pytest tests/test_intercept_*.py— 110 tests, all greenpython -m pytest tests/(full suite) — 413 passed, 1 skipped (no regressions)use-cases/ray-agent/(they live in the working tree but aren't committed; gitignored as out-of-tree ray-agent material).DEFAULT_LLM_HOSTS? (Bedrock, OpenRouter, Fireworks, Replicate are notable absences but I haven't tested them.)_requestmonkey-patch (private API). Document the version-fragility risk — comfortable with that?Backwards compatibility
intercept.install()runs keep their original transports. We don't try to mutate live instances. Documented behavior across each adapter; reviewer note in each transport file.install()is missing-library tolerant. Users with onlyhttpxget onlyhttpxpatched;requests-only environments work the same way. No mandatory dependencies added beyond whatrewind_agentalready needs.Versions
0.13.0(Phase 1 is Python-only).0.15.0— master is at 0.15.0, PyPI is at 0.14.8, so 0.15.0 is unpublished. Per CLAUDE.md track-2 rule, Phase 1 changes ride with the unreleased version. (An earlier commit bumped to 0.16.0; reverted inc379584after Santa flagged the unpublished-skip in initial review.)0.13.0.Post-merge actions
./scripts/publish-pypi.shfrompython/for SDK 0.15.0 (the unpublished version this PR's changes ride on).v0.13.0for Rust binaries — Phase 1 doesn't change Rust, but cutting that release first means the v0.15.0 SDK and v0.13.0 binary ship together cleanly.Plan reference
Full design context, decisions on the open questions, and per-adapter implementation notes:
plans/phase-1-http-transport-adapters.md.Made with Cursor