docs: HTTP intercept quickstart + decision-matrix updates#150
Merged
Conversation
Phase 1 follow-up — operator-facing how-to for the intercept package shipped in PR #149. Documents the public API, per-library examples, streaming behavior, strict-mode + RewindReplayDivergenceError, the savings counter, install/uninstall lifecycle, and explicit "what's NOT supported" notes (matching the deferred-scope claims in the Phase 1 PR description). ## Files - docs/intercept-quickstart.md (new, 331 lines): - When to use intercept vs init() vs proxy vs Explicit Recording API - 60-second quickstart - Per-library examples: httpx (sync + async), requests, aiohttp (incl. base_url + relative path) - Custom predicates pattern (DefaultPredicates subclass for corporate gateways) - Streaming behavior: cache-hit synthetic SSE, cache-miss pass-through, three-signal detection (stream flag, Accept header, body "stream":true) - Strict-match mode + RewindReplayDivergenceError example - savings() counter w/ custom cost_table override - Install/uninstall lifecycle, debugging which libs got patched - Honest list of v1 limitations (streaming-miss recording fidelity, streaming uploads, httpx mounts, aiohttp WebSocket, raise_for_status on cache hits) - Troubleshooting: "nothing recorded" / "ResponseNotRead" / "works locally but not CI" / host filtering - docs/recording.md: extended decision matrix from "Two ways to record" to three ways. Adds the HTTP intercept column with custom- gateway and streaming columns. Cross-links to the new quickstart. - docs/getting-started.md: added an "Already-Python alternative — no proxy" subsection in Quickstart so first-time users see the intercept option. Cross-links to the new quickstart. ## Pre-push verification All 5 stages green BEFORE push (scripts/pre-push-check.sh): - ruff: clean - pytest local: 429 passed, 1 skipped - pytest bare-env (CI mirror): 367 passed, 12 skipped - cargo clippy: clean - cargo test --workspace: all green No code changes — pure documentation. Tests pass because nothing they exercise changed. ## Out of scope for this PR - PyPI publish (rewind-agent 0.15.0, rewind-mcp 0.13.0) — pending per CLAUDE.md post-merge actions; user-initiated. - Streaming-miss tee recording (v1.1) — known gap, deferred per re-review #2 fix notes. - ray-agent migration PR — separate repo. Made-with: Cursor
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
8 tasks
risjai
added a commit
that referenced
this pull request
Apr 27, 2026
Reviewer follow-up on PR #151 caught a precise factual error in the docs. The two _session_id symbols look identical but are different objects: - patch.py:21 → _session_id = None (plain module variable for direct-mode SDK monkey-patching) - explicit.py:44 → _session_id: contextvars.ContextVar (what ExplicitClient.record_llm_call checks) init() sets patch._session_id when it opens its direct-mode session, but never touches explicit._session_id. So the decorator's outer record_llm_call is still a silent no-op even when init() is active — init() makes the inner SDK monkey-patches record, but it does NOT satisfy the decorator's own session precondition. ## Fix Removed init() from the "Three ways to enter a session" list. The three valid patterns are now: 1. ExplicitClient.session(...) context manager 2. ExplicitClient.ensure_session(...) 3. ExplicitClient.start_replay(...) (for replay flows) Added a new "init() does NOT enable the decorator" subsection spelling out the gotcha + showing how to compose init() with the decorator (call init() AND enter an ExplicitClient session — they work together via the contextvar that suppresses double-recording on miss). ## Why this is just a docs change The decorator's behavior is correct as shipped — it records via ExplicitClient and silently no-ops without a session, consistent with the rest of the SDK. The bug was purely that the docs misled users about what counted as "having a session". No tests touched; no code touched; no API changed. ## Pre-push verification All 5 stages green (scripts/pre-push-check.sh) — same code, just doc text changed. ## Open thought (for follow-up) The same gotcha probably applies to docs/intercept-quickstart.md (PR #150). intercept.install() also records via ExplicitClient, so a user who does `init() + intercept.install()` won't get intercept recordings either. Worth a separate doc-precision pass on PR #150 once it lands. Made-with: Cursor
risjai
added a commit
that referenced
this pull request
Apr 27, 2026
User reminder on PR #151: pull BEFORE pushing. Codifying this in the pre-push script so a future session can't skip it. ## What changed scripts/pre-push-check.sh now has 6 stages instead of 5: [0/6] git fetch + ahead/behind check ← NEW [1/6] ruff check [2/6] pytest tests/ (local env) [3/6] pytest tests/ (bare env, CI mirror) [4/6] cargo clippy [5/6] cargo test --workspace Stage 0: - Fetches origin silently - If branch is BEHIND origin: prints clear error pointing at 'git pull --rebase', exits non-zero so subsequent stages don't waste time running against stale code - If detached HEAD: errors out (push from a named branch) - If no upstream branch yet: notes "first push" and continues - If up to date: prints ahead/behind counts and proceeds ## Why Last push on PR #151 got rejected because origin/feat/phase-2-... had been auto-merged with master (sibling PR #150 landed) while my local was unchanged. Pulling THEN pushing is the standard flow; codifying it in the pre-push script means I can't forget. Also saves 30+ seconds of running the rest of the suite against stale code only to have GitHub reject the push at the end. ## Verified ./scripts/pre-push-check.sh — all 6 stages green on this branch with origin and local at the same SHA. Adversarial test (manual): if I rewind HEAD by one commit and re-run, stage 0 detects "behind" and aborts with the clear 'git pull --rebase' message, before any test runs. Made-with: Cursor
7 tasks
shivam2199
pushed a commit
to shivam2199/rewind
that referenced
this pull request
Apr 29, 2026
Tier 2 of the Universal Replay Architecture. Wraps a Python function;
returns the cached value on hit OR calls the function and records the
return on miss. Composes cleanly with Phase 1's intercept.install()
via a contextvar that suppresses double-recording.
## Public API
from rewind_agent import cached_llm_call
@cached_llm_call(
extract_model=lambda call_args, ret: ret.model,
extract_tokens=lambda call_args, ret: (ret.usage.prompt_tokens,
ret.usage.completion_tokens),
)
def chat(question: str) -> dict:
return openai_client.chat.completions.create(...).model_dump()
Sync + async functions both supported (detected via
inspect.iscoroutinefunction). Generator / async-generator functions
raise TypeError at decoration — single-return cache contract.
## Why Tier 2 exists
Phase 1's intercept.install() patches the HTTP transport globally —
powerful but blunt. Tier 2 gives operators per-call-site control:
- Cache the OUTER function that composes multiple inner LLM/tool
calls, vs caching individual HTTP calls
- LLM calls that don't go through plain HTTP (Bedrock via boto3,
gRPC to self-hosted models) — the decorator caches at function-
return level, transport-agnostic
- Tests pinning specific functions to known recordings
## Composition with intercept.install()
Both can be active in the same process. The decorator's check
fires first (it wraps the user's function). On hit: returns cached,
no HTTP call ever happens. On miss: contextvar
``_cached_llm_call_active`` is set during the function call, and
intercept._flow checks it to skip its own recording — preventing
double-record at two granularities.
The contextvar is reset via try/finally so exceptions in the user
function don't leak the suppression.
## Cache-key derivation
Default: SHA-256 of f"{fn_qualname}|{json(args, kwargs)}" with
_safe_repr fallback for non-JSON-able args. Operators with
unhashable args (clients, file handles) override via cache_key=
parameter:
@cached_llm_call(cache_key=lambda client, q, **kw: q)
def chat(client, question: str) -> dict: ...
Custom cache_key failure (raises) falls back to default + warning.
## Return-type round-trip
Decorator stores JSON-serializable values in the cache. On hit, you
get the JSON-deserialized form back, NOT the original Python type.
Documented clearly. Common conversions handled automatically:
- dict / list / primitives → as-is
- model_dump() (Pydantic v2, OpenAI SDK) → called, result stored
- dict() (Pydantic v1) → fallback
- __dict__ → fallback
- pathological → repr() stored, warning logged
## Tests (26 cases, all green)
- Sync + async cache hit / miss / divergence
- Custom extract_model + extract_tokens reach record
- Custom cache_key overrides + failure-fallback
- Default cache key stability (same args → same key, kwargs order
invariant)
- Strict-match RewindReplayDivergenceError propagates through
decorator
- Generator / async-generator decoration raises TypeError
- Contextvar set during call + reset on exception
- _to_json_serializable: dict, Pydantic v2 model_dump, v1 dict,
pathological __slots__ class
- _safe_repr primitives + lists + dict-with-non-str-keys + custom
type fallback
- Request payload shape stability + custom-key replacement +
custom-key failure fallback
## What's NOT in this PR
- Decision matrix update in docs/recording.md and docs/getting-
started.md from "three ways" → "four ways". Requires PR agentoptics#150
(docs/intercept-quickstart.md) to merge first; follow-up commit
on this branch will extend the matrix once PR agentoptics#150 lands.
- Auto-detection of return-type → token extraction. Manual
extract_tokens only; auto-detect is a v2.1 candidate.
- Generator / async-generator support. Yields don't fit the
single-return cache; documented as deferred.
## Pre-push verification
All 5 stages green BEFORE push (scripts/pre-push-check.sh):
- ruff: clean
- pytest local: 455 passed, 1 skipped (was 429; +26 cached_call tests)
- pytest bare-env (CI mirror): 367 passed, 12 skipped, 0 failed
- cargo clippy: clean
- cargo test --workspace: all green
Made-with: Cursor
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Phase 1 follow-up — operator-facing documentation for the intercept package shipped in PR #149. Pure docs change, no code touched.
What's in this PR
docs/intercept-quickstart.md(new, 331 lines) — full how-to with httpx / requests / aiohttp examples, custom predicates, streaming behavior, strict-mode +RewindReplayDivergenceError, savings counter, install/uninstall lifecycle, honest "NOT supported" list, and troubleshooting.docs/recording.md— extended decision matrix from "Two ways to record" to three ways. Adds the HTTP intercept column.docs/getting-started.md— adds "Already-Python alternative — no proxy" subsection so first-time users see the intercept option without having to dig.Test plan
scripts/pre-push-check.sh— all 5 stages green (ruff / pytest local / pytest bare env / cargo clippy / cargo test)Out of scope
Versions
Made with Cursor