docs: HTTP intercept quickstart + decision-matrix updates by risjai · Pull Request #150 · agentoptics/rewind

risjai · 2026-04-27T12:30:17Z

Summary

Phase 1 follow-up — operator-facing documentation for the intercept package shipped in PR #149. Pure docs change, no code touched.

What's in this PR

docs/intercept-quickstart.md (new, 331 lines) — full how-to with httpx / requests / aiohttp examples, custom predicates, streaming behavior, strict-mode + RewindReplayDivergenceError, savings counter, install/uninstall lifecycle, honest "NOT supported" list, and troubleshooting.
docs/recording.md — extended decision matrix from "Two ways to record" to three ways. Adds the HTTP intercept column.
docs/getting-started.md — adds "Already-Python alternative — no proxy" subsection so first-time users see the intercept option without having to dig.

Test plan

scripts/pre-push-check.sh — all 5 stages green (ruff / pytest local / pytest bare env / cargo clippy / cargo test)
Reviewer: spot-check the example code snippets compile / make sense (no actual code changes in this PR; the snippets reference the public API shipped in PR Phase 1: HTTP transport adapters (httpx, requests, aiohttp) + intercept.install() #149).
Reviewer: confirm the "what's NOT supported" list matches the actual limitations — pulled from the Phase 1 PR's deferred-scope claims and the re-review Add CLI auto-bootstrap to pip install #2 streaming-miss fix notes.

Out of scope

PyPI publish (rewind-agent 0.15.0, rewind-mcp 0.13.0) — pending per CLAUDE.md post-merge actions.
Streaming-miss tee recording (v1.1) — known gap deferred from Phase 1.
ray-agent migration PR — happens in the separate ray-agent repo.

Versions

Rust: stays at 0.13.0
Python SDK: stays at 0.15.0 (this PR is pure docs; rides with the unreleased version per CLAUDE.md track-2 rule)

Made with Cursor

Phase 1 follow-up — operator-facing how-to for the intercept package shipped in PR #149. Documents the public API, per-library examples, streaming behavior, strict-mode + RewindReplayDivergenceError, the savings counter, install/uninstall lifecycle, and explicit "what's NOT supported" notes (matching the deferred-scope claims in the Phase 1 PR description). ## Files - docs/intercept-quickstart.md (new, 331 lines): - When to use intercept vs init() vs proxy vs Explicit Recording API - 60-second quickstart - Per-library examples: httpx (sync + async), requests, aiohttp (incl. base_url + relative path) - Custom predicates pattern (DefaultPredicates subclass for corporate gateways) - Streaming behavior: cache-hit synthetic SSE, cache-miss pass-through, three-signal detection (stream flag, Accept header, body "stream":true) - Strict-match mode + RewindReplayDivergenceError example - savings() counter w/ custom cost_table override - Install/uninstall lifecycle, debugging which libs got patched - Honest list of v1 limitations (streaming-miss recording fidelity, streaming uploads, httpx mounts, aiohttp WebSocket, raise_for_status on cache hits) - Troubleshooting: "nothing recorded" / "ResponseNotRead" / "works locally but not CI" / host filtering - docs/recording.md: extended decision matrix from "Two ways to record" to three ways. Adds the HTTP intercept column with custom- gateway and streaming columns. Cross-links to the new quickstart. - docs/getting-started.md: added an "Already-Python alternative — no proxy" subsection in Quickstart so first-time users see the intercept option. Cross-links to the new quickstart. ## Pre-push verification All 5 stages green BEFORE push (scripts/pre-push-check.sh): - ruff: clean - pytest local: 429 passed, 1 skipped - pytest bare-env (CI mirror): 367 passed, 12 skipped - cargo clippy: clean - cargo test --workspace: all green No code changes — pure documentation. Tests pass because nothing they exercise changed. ## Out of scope for this PR - PyPI publish (rewind-agent 0.15.0, rewind-mcp 0.13.0) — pending per CLAUDE.md post-merge actions; user-initiated. - Streaming-miss tee recording (v1.1) — known gap, deferred per re-review #2 fix notes. - ray-agent migration PR — separate repo. Made-with: Cursor

vercel · 2026-04-27T12:30:18Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
rewind	Ready	Preview, Comment	Apr 27, 2026 0:30am

Reviewer follow-up on PR #151 caught a precise factual error in the docs. The two _session_id symbols look identical but are different objects: - patch.py:21 → _session_id = None (plain module variable for direct-mode SDK monkey-patching) - explicit.py:44 → _session_id: contextvars.ContextVar (what ExplicitClient.record_llm_call checks) init() sets patch._session_id when it opens its direct-mode session, but never touches explicit._session_id. So the decorator's outer record_llm_call is still a silent no-op even when init() is active — init() makes the inner SDK monkey-patches record, but it does NOT satisfy the decorator's own session precondition. ## Fix Removed init() from the "Three ways to enter a session" list. The three valid patterns are now: 1. ExplicitClient.session(...) context manager 2. ExplicitClient.ensure_session(...) 3. ExplicitClient.start_replay(...) (for replay flows) Added a new "init() does NOT enable the decorator" subsection spelling out the gotcha + showing how to compose init() with the decorator (call init() AND enter an ExplicitClient session — they work together via the contextvar that suppresses double-recording on miss). ## Why this is just a docs change The decorator's behavior is correct as shipped — it records via ExplicitClient and silently no-ops without a session, consistent with the rest of the SDK. The bug was purely that the docs misled users about what counted as "having a session". No tests touched; no code touched; no API changed. ## Pre-push verification All 5 stages green (scripts/pre-push-check.sh) — same code, just doc text changed. ## Open thought (for follow-up) The same gotcha probably applies to docs/intercept-quickstart.md (PR #150). intercept.install() also records via ExplicitClient, so a user who does `init() + intercept.install()` won't get intercept recordings either. Worth a separate doc-precision pass on PR #150 once it lands. Made-with: Cursor

User reminder on PR #151: pull BEFORE pushing. Codifying this in the pre-push script so a future session can't skip it. ## What changed scripts/pre-push-check.sh now has 6 stages instead of 5: [0/6] git fetch + ahead/behind check ← NEW [1/6] ruff check [2/6] pytest tests/ (local env) [3/6] pytest tests/ (bare env, CI mirror) [4/6] cargo clippy [5/6] cargo test --workspace Stage 0: - Fetches origin silently - If branch is BEHIND origin: prints clear error pointing at 'git pull --rebase', exits non-zero so subsequent stages don't waste time running against stale code - If detached HEAD: errors out (push from a named branch) - If no upstream branch yet: notes "first push" and continues - If up to date: prints ahead/behind counts and proceeds ## Why Last push on PR #151 got rejected because origin/feat/phase-2-... had been auto-merged with master (sibling PR #150 landed) while my local was unchanged. Pulling THEN pushing is the standard flow; codifying it in the pre-push script means I can't forget. Also saves 30+ seconds of running the rest of the suite against stale code only to have GitHub reject the push at the end. ## Verified ./scripts/pre-push-check.sh — all 6 stages green on this branch with origin and local at the same SHA. Adversarial test (manual): if I rewind HEAD by one commit and re-run, stage 0 detects "behind" and aborts with the clear 'git pull --rebase' message, before any test runs. Made-with: Cursor

Tier 2 of the Universal Replay Architecture. Wraps a Python function; returns the cached value on hit OR calls the function and records the return on miss. Composes cleanly with Phase 1's intercept.install() via a contextvar that suppresses double-recording. ## Public API from rewind_agent import cached_llm_call @cached_llm_call( extract_model=lambda call_args, ret: ret.model, extract_tokens=lambda call_args, ret: (ret.usage.prompt_tokens, ret.usage.completion_tokens), ) def chat(question: str) -> dict: return openai_client.chat.completions.create(...).model_dump() Sync + async functions both supported (detected via inspect.iscoroutinefunction). Generator / async-generator functions raise TypeError at decoration — single-return cache contract. ## Why Tier 2 exists Phase 1's intercept.install() patches the HTTP transport globally — powerful but blunt. Tier 2 gives operators per-call-site control: - Cache the OUTER function that composes multiple inner LLM/tool calls, vs caching individual HTTP calls - LLM calls that don't go through plain HTTP (Bedrock via boto3, gRPC to self-hosted models) — the decorator caches at function- return level, transport-agnostic - Tests pinning specific functions to known recordings ## Composition with intercept.install() Both can be active in the same process. The decorator's check fires first (it wraps the user's function). On hit: returns cached, no HTTP call ever happens. On miss: contextvar ``_cached_llm_call_active`` is set during the function call, and intercept._flow checks it to skip its own recording — preventing double-record at two granularities. The contextvar is reset via try/finally so exceptions in the user function don't leak the suppression. ## Cache-key derivation Default: SHA-256 of f"{fn_qualname}|{json(args, kwargs)}" with _safe_repr fallback for non-JSON-able args. Operators with unhashable args (clients, file handles) override via cache_key= parameter: @cached_llm_call(cache_key=lambda client, q, **kw: q) def chat(client, question: str) -> dict: ... Custom cache_key failure (raises) falls back to default + warning. ## Return-type round-trip Decorator stores JSON-serializable values in the cache. On hit, you get the JSON-deserialized form back, NOT the original Python type. Documented clearly. Common conversions handled automatically: - dict / list / primitives → as-is - model_dump() (Pydantic v2, OpenAI SDK) → called, result stored - dict() (Pydantic v1) → fallback - __dict__ → fallback - pathological → repr() stored, warning logged ## Tests (26 cases, all green) - Sync + async cache hit / miss / divergence - Custom extract_model + extract_tokens reach record - Custom cache_key overrides + failure-fallback - Default cache key stability (same args → same key, kwargs order invariant) - Strict-match RewindReplayDivergenceError propagates through decorator - Generator / async-generator decoration raises TypeError - Contextvar set during call + reset on exception - _to_json_serializable: dict, Pydantic v2 model_dump, v1 dict, pathological __slots__ class - _safe_repr primitives + lists + dict-with-non-str-keys + custom type fallback - Request payload shape stability + custom-key replacement + custom-key failure fallback ## What's NOT in this PR - Decision matrix update in docs/recording.md and docs/getting- started.md from "three ways" → "four ways". Requires PR agentoptics#150 (docs/intercept-quickstart.md) to merge first; follow-up commit on this branch will extend the matrix once PR agentoptics#150 lands. - Auto-detection of return-type → token extraction. Manual extract_tokens only; auto-detect is a v2.1 candidate. - Generator / async-generator support. Yields don't fit the single-return cache; documented as deferred. ## Pre-push verification All 5 stages green BEFORE push (scripts/pre-push-check.sh): - ruff: clean - pytest local: 455 passed, 1 skipped (was 429; +26 cached_call tests) - pytest bare-env (CI mirror): 367 passed, 12 skipped, 0 failed - cargo clippy: clean - cargo test --workspace: all green Made-with: Cursor

vercel Bot deployed to Preview April 27, 2026 12:30 View deployment

risjai mentioned this pull request Apr 27, 2026

Phase 2: cached_llm_call decorator (Tier 2) #151

Merged

8 tasks

risjai merged commit d0b0933 into master Apr 27, 2026
7 checks passed

risjai deleted the docs/intercept-quickstart branch April 27, 2026 12:44

risjai mentioned this pull request Apr 28, 2026

Phase 3 (commits 5-13): runner registry full lifecycle — dispatcher, callbacks, dashboard, runner SDK, CLI, e2e, docs, version bump #154

Merged

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: HTTP intercept quickstart + decision-matrix updates#150

docs: HTTP intercept quickstart + decision-matrix updates#150
risjai merged 1 commit into
masterfrom
docs/intercept-quickstart

risjai commented Apr 27, 2026

Uh oh!

vercel Bot commented Apr 27, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

risjai commented Apr 27, 2026

Summary

What's in this PR

Test plan

Out of scope

Versions

Uh oh!

vercel Bot commented Apr 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

vercel Bot commented Apr 27, 2026 •

edited

Loading