Skip to content

docs: HTTP intercept quickstart + decision-matrix updates#150

Merged
risjai merged 1 commit into
masterfrom
docs/intercept-quickstart
Apr 27, 2026
Merged

docs: HTTP intercept quickstart + decision-matrix updates#150
risjai merged 1 commit into
masterfrom
docs/intercept-quickstart

Conversation

@risjai
Copy link
Copy Markdown
Collaborator

@risjai risjai commented Apr 27, 2026

Summary

Phase 1 follow-up — operator-facing documentation for the intercept package shipped in PR #149. Pure docs change, no code touched.

What's in this PR

  • docs/intercept-quickstart.md (new, 331 lines) — full how-to with httpx / requests / aiohttp examples, custom predicates, streaming behavior, strict-mode + RewindReplayDivergenceError, savings counter, install/uninstall lifecycle, honest "NOT supported" list, and troubleshooting.
  • docs/recording.md — extended decision matrix from "Two ways to record" to three ways. Adds the HTTP intercept column.
  • docs/getting-started.md — adds "Already-Python alternative — no proxy" subsection so first-time users see the intercept option without having to dig.

Test plan

Out of scope

  • PyPI publish (rewind-agent 0.15.0, rewind-mcp 0.13.0) — pending per CLAUDE.md post-merge actions.
  • Streaming-miss tee recording (v1.1) — known gap deferred from Phase 1.
  • ray-agent migration PR — happens in the separate ray-agent repo.

Versions

  • Rust: stays at 0.13.0
  • Python SDK: stays at 0.15.0 (this PR is pure docs; rides with the unreleased version per CLAUDE.md track-2 rule)

Made with Cursor

Phase 1 follow-up — operator-facing how-to for the intercept package
shipped in PR #149. Documents the public API, per-library examples,
streaming behavior, strict-mode + RewindReplayDivergenceError, the
savings counter, install/uninstall lifecycle, and explicit "what's
NOT supported" notes (matching the deferred-scope claims in the
Phase 1 PR description).

## Files

- docs/intercept-quickstart.md (new, 331 lines):
  - When to use intercept vs init() vs proxy vs Explicit Recording API
  - 60-second quickstart
  - Per-library examples: httpx (sync + async), requests, aiohttp
    (incl. base_url + relative path)
  - Custom predicates pattern (DefaultPredicates subclass for
    corporate gateways)
  - Streaming behavior: cache-hit synthetic SSE, cache-miss
    pass-through, three-signal detection (stream flag, Accept
    header, body "stream":true)
  - Strict-match mode + RewindReplayDivergenceError example
  - savings() counter w/ custom cost_table override
  - Install/uninstall lifecycle, debugging which libs got patched
  - Honest list of v1 limitations (streaming-miss recording fidelity,
    streaming uploads, httpx mounts, aiohttp WebSocket, raise_for_status
    on cache hits)
  - Troubleshooting: "nothing recorded" / "ResponseNotRead" /
    "works locally but not CI" / host filtering

- docs/recording.md: extended decision matrix from "Two ways to
  record" to three ways. Adds the HTTP intercept column with custom-
  gateway and streaming columns. Cross-links to the new quickstart.

- docs/getting-started.md: added an "Already-Python alternative — no
  proxy" subsection in Quickstart so first-time users see the
  intercept option. Cross-links to the new quickstart.

## Pre-push verification

All 5 stages green BEFORE push (scripts/pre-push-check.sh):
  - ruff: clean
  - pytest local: 429 passed, 1 skipped
  - pytest bare-env (CI mirror): 367 passed, 12 skipped
  - cargo clippy: clean
  - cargo test --workspace: all green

No code changes — pure documentation. Tests pass because nothing
they exercise changed.

## Out of scope for this PR

- PyPI publish (rewind-agent 0.15.0, rewind-mcp 0.13.0) — pending
  per CLAUDE.md post-merge actions; user-initiated.
- Streaming-miss tee recording (v1.1) — known gap, deferred per
  re-review #2 fix notes.
- ray-agent migration PR — separate repo.

Made-with: Cursor
@vercel
Copy link
Copy Markdown

vercel Bot commented Apr 27, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
rewind Ready Ready Preview, Comment Apr 27, 2026 0:30am

@risjai risjai merged commit d0b0933 into master Apr 27, 2026
7 checks passed
@risjai risjai deleted the docs/intercept-quickstart branch April 27, 2026 12:44
risjai added a commit that referenced this pull request Apr 27, 2026
Reviewer follow-up on PR #151 caught a precise factual error in
the docs. The two _session_id symbols look identical but are
different objects:

  - patch.py:21         → _session_id = None  (plain module variable
                          for direct-mode SDK monkey-patching)
  - explicit.py:44      → _session_id: contextvars.ContextVar
                          (what ExplicitClient.record_llm_call checks)

init() sets patch._session_id when it opens its direct-mode session,
but never touches explicit._session_id. So the decorator's outer
record_llm_call is still a silent no-op even when init() is active —
init() makes the inner SDK monkey-patches record, but it does NOT
satisfy the decorator's own session precondition.

## Fix

Removed init() from the "Three ways to enter a session" list. The
three valid patterns are now:

  1. ExplicitClient.session(...) context manager
  2. ExplicitClient.ensure_session(...)
  3. ExplicitClient.start_replay(...)  (for replay flows)

Added a new "init() does NOT enable the decorator" subsection
spelling out the gotcha + showing how to compose init() with the
decorator (call init() AND enter an ExplicitClient session — they
work together via the contextvar that suppresses double-recording
on miss).

## Why this is just a docs change

The decorator's behavior is correct as shipped — it records via
ExplicitClient and silently no-ops without a session, consistent
with the rest of the SDK. The bug was purely that the docs misled
users about what counted as "having a session". No tests touched;
no code touched; no API changed.

## Pre-push verification

All 5 stages green (scripts/pre-push-check.sh) — same code, just
doc text changed.

## Open thought (for follow-up)

The same gotcha probably applies to docs/intercept-quickstart.md
(PR #150). intercept.install() also records via ExplicitClient,
so a user who does `init() + intercept.install()` won't get
intercept recordings either. Worth a separate doc-precision pass
on PR #150 once it lands.

Made-with: Cursor
risjai added a commit that referenced this pull request Apr 27, 2026
User reminder on PR #151: pull BEFORE pushing. Codifying this in
the pre-push script so a future session can't skip it.

## What changed

scripts/pre-push-check.sh now has 6 stages instead of 5:

  [0/6] git fetch + ahead/behind check  ← NEW
  [1/6] ruff check
  [2/6] pytest tests/ (local env)
  [3/6] pytest tests/ (bare env, CI mirror)
  [4/6] cargo clippy
  [5/6] cargo test --workspace

Stage 0:

  - Fetches origin silently
  - If branch is BEHIND origin: prints clear error pointing at
    'git pull --rebase', exits non-zero so subsequent stages don't
    waste time running against stale code
  - If detached HEAD: errors out (push from a named branch)
  - If no upstream branch yet: notes "first push" and continues
  - If up to date: prints ahead/behind counts and proceeds

## Why

Last push on PR #151 got rejected because origin/feat/phase-2-...
had been auto-merged with master (sibling PR #150 landed) while my
local was unchanged. Pulling THEN pushing is the standard flow;
codifying it in the pre-push script means I can't forget. Also
saves 30+ seconds of running the rest of the suite against stale
code only to have GitHub reject the push at the end.

## Verified

  ./scripts/pre-push-check.sh — all 6 stages green on this branch
  with origin and local at the same SHA.

  Adversarial test (manual): if I rewind HEAD by one commit and
  re-run, stage 0 detects "behind" and aborts with the clear
  'git pull --rebase' message, before any test runs.

Made-with: Cursor
shivam2199 pushed a commit to shivam2199/rewind that referenced this pull request Apr 29, 2026
Tier 2 of the Universal Replay Architecture. Wraps a Python function;
returns the cached value on hit OR calls the function and records the
return on miss. Composes cleanly with Phase 1's intercept.install()
via a contextvar that suppresses double-recording.

## Public API

  from rewind_agent import cached_llm_call

  @cached_llm_call(
      extract_model=lambda call_args, ret: ret.model,
      extract_tokens=lambda call_args, ret: (ret.usage.prompt_tokens,
                                              ret.usage.completion_tokens),
  )
  def chat(question: str) -> dict:
      return openai_client.chat.completions.create(...).model_dump()

Sync + async functions both supported (detected via
inspect.iscoroutinefunction). Generator / async-generator functions
raise TypeError at decoration — single-return cache contract.

## Why Tier 2 exists

Phase 1's intercept.install() patches the HTTP transport globally —
powerful but blunt. Tier 2 gives operators per-call-site control:

- Cache the OUTER function that composes multiple inner LLM/tool
  calls, vs caching individual HTTP calls
- LLM calls that don't go through plain HTTP (Bedrock via boto3,
  gRPC to self-hosted models) — the decorator caches at function-
  return level, transport-agnostic
- Tests pinning specific functions to known recordings

## Composition with intercept.install()

Both can be active in the same process. The decorator's check
fires first (it wraps the user's function). On hit: returns cached,
no HTTP call ever happens. On miss: contextvar
``_cached_llm_call_active`` is set during the function call, and
intercept._flow checks it to skip its own recording — preventing
double-record at two granularities.

The contextvar is reset via try/finally so exceptions in the user
function don't leak the suppression.

## Cache-key derivation

Default: SHA-256 of f"{fn_qualname}|{json(args, kwargs)}" with
_safe_repr fallback for non-JSON-able args. Operators with
unhashable args (clients, file handles) override via cache_key=
parameter:

  @cached_llm_call(cache_key=lambda client, q, **kw: q)
  def chat(client, question: str) -> dict: ...

Custom cache_key failure (raises) falls back to default + warning.

## Return-type round-trip

Decorator stores JSON-serializable values in the cache. On hit, you
get the JSON-deserialized form back, NOT the original Python type.
Documented clearly. Common conversions handled automatically:

- dict / list / primitives → as-is
- model_dump() (Pydantic v2, OpenAI SDK) → called, result stored
- dict() (Pydantic v1) → fallback
- __dict__ → fallback
- pathological → repr() stored, warning logged

## Tests (26 cases, all green)

- Sync + async cache hit / miss / divergence
- Custom extract_model + extract_tokens reach record
- Custom cache_key overrides + failure-fallback
- Default cache key stability (same args → same key, kwargs order
  invariant)
- Strict-match RewindReplayDivergenceError propagates through
  decorator
- Generator / async-generator decoration raises TypeError
- Contextvar set during call + reset on exception
- _to_json_serializable: dict, Pydantic v2 model_dump, v1 dict,
  pathological __slots__ class
- _safe_repr primitives + lists + dict-with-non-str-keys + custom
  type fallback
- Request payload shape stability + custom-key replacement +
  custom-key failure fallback

## What's NOT in this PR

- Decision matrix update in docs/recording.md and docs/getting-
  started.md from "three ways" → "four ways". Requires PR agentoptics#150
  (docs/intercept-quickstart.md) to merge first; follow-up commit
  on this branch will extend the matrix once PR agentoptics#150 lands.
- Auto-detection of return-type → token extraction. Manual
  extract_tokens only; auto-detect is a v2.1 candidate.
- Generator / async-generator support. Yields don't fit the
  single-return cache; documented as deferred.

## Pre-push verification

All 5 stages green BEFORE push (scripts/pre-push-check.sh):
  - ruff: clean
  - pytest local: 455 passed, 1 skipped (was 429; +26 cached_call tests)
  - pytest bare-env (CI mirror): 367 passed, 12 skipped, 0 failed
  - cargo clippy: clean
  - cargo test --workspace: all green

Made-with: Cursor
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant