Phase 2: cached_llm_call decorator (Tier 2)#151
Conversation
Detailed plan for Tier 2 of the Universal Replay Architecture: @cached_llm_call decorator on top of Phase 0's ExplicitClient + Phase 1's _flow orchestration. Programmer-facing per-call-site control vs Phase 1's global intercept. ## Scope - New: rewind_agent.cached_call module (~250 LOC) - _flow.py contextvar to skip double-record when used under intercept.install() - Tests: ~25 cases (sync + async × hit/miss/divergence + custom extractors + composition with intercept) - Docs: docs/cached-llm-call.md + decision matrix update ## Key API @cached_llm_call(extract_tokens=lambda args, ret: ..., extract_model=...) def my_llm_call(question: str): -> ChatCompletion Sync and async; cache key derives from fn name + args via safe_repr; returns cached value on hit. Custom cache_key / extract_tokens / extract_model overrides supported. ## Composition with intercept.install() Decorator's check happens FIRST; if hit, no HTTP call ever fires. On miss, contextvar suppresses the inner intercept recording so we don't double-record. Cleanly composable. ## Open questions - Generator / async-gen functions: punt, raise on decoration - Return-type round-trip: document JSON-serialization caveat; ChatCompletion etc. land as dicts on cache hit Made-with: Cursor
Tier 2 of the Universal Replay Architecture. Wraps a Python function;
returns the cached value on hit OR calls the function and records the
return on miss. Composes cleanly with Phase 1's intercept.install()
via a contextvar that suppresses double-recording.
## Public API
from rewind_agent import cached_llm_call
@cached_llm_call(
extract_model=lambda call_args, ret: ret.model,
extract_tokens=lambda call_args, ret: (ret.usage.prompt_tokens,
ret.usage.completion_tokens),
)
def chat(question: str) -> dict:
return openai_client.chat.completions.create(...).model_dump()
Sync + async functions both supported (detected via
inspect.iscoroutinefunction). Generator / async-generator functions
raise TypeError at decoration — single-return cache contract.
## Why Tier 2 exists
Phase 1's intercept.install() patches the HTTP transport globally —
powerful but blunt. Tier 2 gives operators per-call-site control:
- Cache the OUTER function that composes multiple inner LLM/tool
calls, vs caching individual HTTP calls
- LLM calls that don't go through plain HTTP (Bedrock via boto3,
gRPC to self-hosted models) — the decorator caches at function-
return level, transport-agnostic
- Tests pinning specific functions to known recordings
## Composition with intercept.install()
Both can be active in the same process. The decorator's check
fires first (it wraps the user's function). On hit: returns cached,
no HTTP call ever happens. On miss: contextvar
``_cached_llm_call_active`` is set during the function call, and
intercept._flow checks it to skip its own recording — preventing
double-record at two granularities.
The contextvar is reset via try/finally so exceptions in the user
function don't leak the suppression.
## Cache-key derivation
Default: SHA-256 of f"{fn_qualname}|{json(args, kwargs)}" with
_safe_repr fallback for non-JSON-able args. Operators with
unhashable args (clients, file handles) override via cache_key=
parameter:
@cached_llm_call(cache_key=lambda client, q, **kw: q)
def chat(client, question: str) -> dict: ...
Custom cache_key failure (raises) falls back to default + warning.
## Return-type round-trip
Decorator stores JSON-serializable values in the cache. On hit, you
get the JSON-deserialized form back, NOT the original Python type.
Documented clearly. Common conversions handled automatically:
- dict / list / primitives → as-is
- model_dump() (Pydantic v2, OpenAI SDK) → called, result stored
- dict() (Pydantic v1) → fallback
- __dict__ → fallback
- pathological → repr() stored, warning logged
## Tests (26 cases, all green)
- Sync + async cache hit / miss / divergence
- Custom extract_model + extract_tokens reach record
- Custom cache_key overrides + failure-fallback
- Default cache key stability (same args → same key, kwargs order
invariant)
- Strict-match RewindReplayDivergenceError propagates through
decorator
- Generator / async-generator decoration raises TypeError
- Contextvar set during call + reset on exception
- _to_json_serializable: dict, Pydantic v2 model_dump, v1 dict,
pathological __slots__ class
- _safe_repr primitives + lists + dict-with-non-str-keys + custom
type fallback
- Request payload shape stability + custom-key replacement +
custom-key failure fallback
## What's NOT in this PR
- Decision matrix update in docs/recording.md and docs/getting-
started.md from "three ways" → "four ways". Requires PR #150
(docs/intercept-quickstart.md) to merge first; follow-up commit
on this branch will extend the matrix once PR #150 lands.
- Auto-detection of return-type → token extraction. Manual
extract_tokens only; auto-detect is a v2.1 candidate.
- Generator / async-generator support. Yields don't fit the
single-return cache; documented as deferred.
## Pre-push verification
All 5 stages green BEFORE push (scripts/pre-push-check.sh):
- ruff: clean
- pytest local: 455 passed, 1 skipped (was 429; +26 cached_call tests)
- pytest bare-env (CI mirror): 367 passed, 12 skipped, 0 failed
- cargo clippy: clean
- cargo test --workspace: all green
Made-with: Cursor
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
risjai
left a comment
There was a problem hiding this comment.
Review: needs changes before merge
I reviewed PR #151 against the Phase 2 cached_llm_call plan. Python/site/version checks are green, Rust build jobs are still pending, but there are two correctness blockers in the decorator implementation itself.
Blocking Findings
-
@cached_llm_call()does not create or enter a recording session, so the quickstart path records nothing.
The wrapper constructs a freshExplicitClient()and callsrecord_llm_call(), butExplicitClient.record_llm_call()immediately returnsNonewhen_session_idis unset. The docs show:@cached_llm_call() def chat(...): ... result1 = chat("What is 2+2?") # First call: hits OpenAI, records the return value
That is not true unless the caller separately wrapped execution in
client.session(...), calledensure_session(...), or started a replay context. The tests patchExplicitClient.record_llm_calldirectly, so they bypass the real session precondition and miss this. The decorator either needs to start/ensure a session itself (with a clear naming/threading strategy) or the API/docs/tests need to require an explicit activeExplicitClientsession and verify the no-session case. -
Custom
cache_keydoes not actually control the server-side cache identity.
_build_request_payload()includescache_key, but it also includesargs_reprandkwargs_repr. Phase 0 content validation hashes the entire request body, so two calls with the same customcache_keybut different/non-stableargs_reprvalues still produce differentrequest_hashvalues and will miss/diverge. This breaks the documented use case:@cached_llm_call(cache_key=lambda client, q, **_: q) def chat(client, question): ...
The
clientobject's address-bearing repr is still included inargs_repr, so the server hash changes across processes even though the custom key is stable. Either the lookup/record request body must be reduced to stable identity fields only (for example_rewind_decorator,fn_name,cache_key) and display args stored elsewhere, or the server validation needs to hash onlycache_keyfor decorator calls. Add a regression test where two different client objects share the same custom key and replay hits.
Missing / Misleading Tests
- Add an integration-style test with the real
ExplicitClient.record_llm_call()behavior showing what happens with no active session. Today all decorator record tests mock away the session guard. - Add custom-cache-key tests that compare the full request payload/hash inputs, not only
payload["cache_key"]. - Add a test for
cache_keyintentionally ignoring a non-stable object arg (reprcontains different object IDs) and assert the decorator still replays.
Smaller Notes
- The docs are good about return-type fidelity, but they should also state the session requirement if the implementation remains session-external.
- The PR is Python/docs only; the Python and site checks pass, but the Rust build jobs are still pending at the time of review.
Once the session lifecycle and custom-key semantics are fixed, this should be much closer.
Reviewer caught two real correctness bugs in the Phase 2 decorator. Both fixed with directed regression tests using the REAL ExplicitClient guard (not patched away). ## Review #2 — custom cache_key didn't actually control cache identity `_build_request_payload()` included `args_repr` and `kwargs_repr` in the request body. Phase 0's content validation hashes the WHOLE body to derive the request_hash, so two calls with the same custom ``cache_key=lambda client, q: q`` but different client objects (different memory addresses → different reprs) produced different request_hashes. Cache miss every time. The custom cache_key was effectively cosmetic. Fix: identity-only payload. Request body now contains only ``_rewind_decorator``, ``fn_name``, ``cache_key``. Args / kwargs are NOT in the payload — when user passes ``cache_key=lambda client, q: q``, the unstable client repr is correctly invisible to the server hash. For dashboard display, fn_name + cache_key suffices to identify the call. Power users wanting human-readable display in the dashboard can use plain-string cache_keys (e.g. ``f"chat:{model}:{question[:50]}"``) instead of opaque SHA-256 hashes. Regression tests: - ``test_custom_cache_key_payload_is_independent_of_arg_reprs`` — two object() instances with different reprs but same custom key produce IDENTICAL request payloads. - ``test_decorator_with_custom_cache_key_records_stable_request_across_clients`` — end-to-end through the decorator: two recordings with different clients + same key produce equal request bodies. - ``test_default_cache_key_payload_independent_of_unstable_object_args`` — pins the EXPECTED non-stability of default keys (the whole reason custom cache_key exists), so a future "smart" default doesn't silently drift. ## Review #1 — no-session quickstart records nothing `ExplicitClient.record_llm_call()` returns None silently when ``_session_id`` is unset. The quickstart docs claimed "First call hits OpenAI, records the return value" but unless the user separately wrapped execution in ``client.session(...)``, ``ensure_session(...)``, or ``init()``, recording was a silent no-op. Tests bypassed this with ``patch.object(ExplicitClient, "record_llm_call")``, masking the bug. This is consistent with the rest of the SDK (init() / intercept have the same precondition), but the docs need to say so. Auto- session creation in the decorator was considered and rejected: naming, lifecycle, and thread-safety concerns; the explicit-session pattern matches what users already do for other parts of the SDK. Fix: - New "Session requirement" section at the top of docs/cached-llm-call.md with three patterns (scoped session, long-lived ensure_session, init() auto-session). - Quickstart updated to wrap the example in ``with client.session("my-quickstart"):``. - Replay path documented inline. Regression tests: - ``test_no_session_function_runs_and_returns_live_result`` — verify decorator is a silent no-op for recording when no session active; function still runs, returns live result correctly. - ``test_no_session_record_is_silent_no_op`` — calls the REAL ``ExplicitClient.record_llm_call`` (no patch) with no session and asserts it returns None without raising. The test the reviewer asked for: shows the decorator's behavior under the actual production guard. - ``test_active_session_records_via_real_client_path`` — inverse: session-active path reaches _post (verified via patch on _post, not on record_llm_call). Confirms the guard semantics in both directions. ## Test count - cached_call tests: 26 → **32** (+6 review regressions) - Total Python tests: pre-push routine reports green across all 5 stages ## Pre-push verification All 5 stages green BEFORE push (scripts/pre-push-check.sh): - ruff: clean - pytest local: passes - pytest bare-env (CI mirror): passes - cargo clippy: clean - cargo test --workspace: clean Made-with: Cursor
Review fixes —
|
| Finding | Fix | Regression test |
|---|---|---|
#2 Custom cache_key didn't actually control cache identity. _build_request_payload included args_repr + kwargs_repr alongside cache_key; Phase 0 hashes the whole body, so two calls with the same custom key but different unstable arg reprs (object IDs) produced different request_hashes. |
Identity-only payload. Request body now contains exactly {_rewind_decorator, fn_name, cache_key}. Args / kwargs are NOT in the payload — when user passes cache_key=lambda client, q: q, the client object's unstable repr is correctly invisible to the server hash. Dashboard display: power users encode human-readable identity in the cache_key itself (e.g. f"chat:{model}:{question[:50]}") instead of using SHA hashes. |
test_custom_cache_key_payload_is_independent_of_arg_reprs (two object() instances → identical payloads), test_decorator_with_custom_cache_key_records_stable_request_across_clients (end-to-end: two recordings with different clients + same key produce equal request bodies), test_default_cache_key_payload_independent_of_unstable_object_args (pins the EXPECTED non-stability of default keys — the whole reason custom keys exist) |
#1 No-session no-recording. Docs claimed first call records; reality was silent no-op when _session_id was unset. Tests patched record_llm_call directly, bypassing the session guard. |
New "Session requirement" section at the top of docs/cached-llm-call.md with three explicit patterns (scoped client.session(...), long-lived ensure_session, init() auto-session). Quickstart updated to wrap example in with client.session("my-quickstart"):. Decorator behavior unchanged — silent no-op is consistent with the rest of the SDK (init / intercept have same precondition). Auto-session creation in the decorator was considered and rejected: naming + lifecycle + thread-safety concerns. |
test_no_session_function_runs_and_returns_live_result (function works correctly, no exception, no record), test_no_session_record_is_silent_no_op calls the REAL ExplicitClient.record_llm_call with no patch and verifies it returns None — the test you asked for, test_active_session_records_via_real_client_path (inverse: session-active path reaches _post, verified via patch on _post not on record_llm_call) |
What's NOT changed
- Decorator still doesn't auto-create sessions. Reasons documented in the commit message — naming, lifecycle, thread safety. If user feedback wants this in v2.1, the decision flips with explicit semantics.
- The
cache_keyuser-supplied value is still kept verbatim in the payload (not hashed by us) — meaning users can choose human-readable plain strings if they want dashboard legibility. The server still SHA-256s the whole payload for content validation, so cache identity is still uniqueness-of-payload. Documented.
Test count
- cached_call tests: 26 → 32 (+6 directed review regressions)
- All 5 pre-push stages green (
scripts/pre-push-check.sh)
CI should be green on first try this time. Re-requesting review.
risjai
left a comment
There was a problem hiding this comment.
Follow-up Review
The two original blockers are materially addressed in 0b3b8a8:
cache_keyis now the actual identity used in the request payload ({_rewind_decorator, fn_name, cache_key}), so unstableargs_repr/kwargs_reprno longer break custom keys.- The quickstart now uses
ExplicitClient().session(...), and tests cover the real no-session guard instead of patching it away.
One remaining doc issue before merge:
docs/cached-llm-call.mdlistsrewind_agent.init()as a way to enter a session for the decorator. I don’t think that’s accurate.init()usespatch.py’s direct-mode_session_id, whilecached_llm_callrecords throughExplicitClient, which checksexplicit.py’s contextvar_session_id. Soinit()may record inner SDK calls, but it does not make the decorator’s outerrecord_llm_call()work.
Please remove or reword the init() option in the Session requirement section. The valid patterns are ExplicitClient.session(...), ExplicitClient.ensure_session(...), or ExplicitClient.start_replay(...) for replay.
Reviewer follow-up on PR #151 caught a precise factual error in the docs. The two _session_id symbols look identical but are different objects: - patch.py:21 → _session_id = None (plain module variable for direct-mode SDK monkey-patching) - explicit.py:44 → _session_id: contextvars.ContextVar (what ExplicitClient.record_llm_call checks) init() sets patch._session_id when it opens its direct-mode session, but never touches explicit._session_id. So the decorator's outer record_llm_call is still a silent no-op even when init() is active — init() makes the inner SDK monkey-patches record, but it does NOT satisfy the decorator's own session precondition. ## Fix Removed init() from the "Three ways to enter a session" list. The three valid patterns are now: 1. ExplicitClient.session(...) context manager 2. ExplicitClient.ensure_session(...) 3. ExplicitClient.start_replay(...) (for replay flows) Added a new "init() does NOT enable the decorator" subsection spelling out the gotcha + showing how to compose init() with the decorator (call init() AND enter an ExplicitClient session — they work together via the contextvar that suppresses double-recording on miss). ## Why this is just a docs change The decorator's behavior is correct as shipped — it records via ExplicitClient and silently no-ops without a session, consistent with the rest of the SDK. The bug was purely that the docs misled users about what counted as "having a session". No tests touched; no code touched; no API changed. ## Pre-push verification All 5 stages green (scripts/pre-push-check.sh) — same code, just doc text changed. ## Open thought (for follow-up) The same gotcha probably applies to docs/intercept-quickstart.md (PR #150). intercept.install() also records via ExplicitClient, so a user who does `init() + intercept.install()` won't get intercept recordings either. Worth a separate doc-precision pass on PR #150 once it lands. Made-with: Cursor
|
You're right — verified this in code:
Fixed in
Pre-push routine green (5/5 stages). Pure docs change; no tests / code touched. Adjacent thought (not in this PR)The same precision issue probably applies to |
User reminder on PR #151: pull BEFORE pushing. Codifying this in the pre-push script so a future session can't skip it. ## What changed scripts/pre-push-check.sh now has 6 stages instead of 5: [0/6] git fetch + ahead/behind check ← NEW [1/6] ruff check [2/6] pytest tests/ (local env) [3/6] pytest tests/ (bare env, CI mirror) [4/6] cargo clippy [5/6] cargo test --workspace Stage 0: - Fetches origin silently - If branch is BEHIND origin: prints clear error pointing at 'git pull --rebase', exits non-zero so subsequent stages don't waste time running against stale code - If detached HEAD: errors out (push from a named branch) - If no upstream branch yet: notes "first push" and continues - If up to date: prints ahead/behind counts and proceeds ## Why Last push on PR #151 got rejected because origin/feat/phase-2-... had been auto-merged with master (sibling PR #150 landed) while my local was unchanged. Pulling THEN pushing is the standard flow; codifying it in the pre-push script means I can't forget. Also saves 30+ seconds of running the rest of the suite against stale code only to have GitHub reject the push at the end. ## Verified ./scripts/pre-push-check.sh — all 6 stages green on this branch with origin and local at the same SHA. Adversarial test (manual): if I rewind HEAD by one commit and re-run, stage 0 detects "behind" and aborts with the clear 'git pull --rebase' message, before any test runs. Made-with: Cursor
Summary
Tier 2 of the Universal Replay Architecture plan. Programmer-facing per-call-site decorator that wraps a Python function so its return value is cached by Rewind. Composes cleanly with Phase 1's
intercept.install()via a contextvar that prevents double-recording.Why this PR
Phase 1's
intercept.install()patches the HTTP transport layer globally — every httpx/requests/aiohttp call from the process gets evaluated against predicates. That's powerful but blunt: you opt the whole process in or out. Some operators want per-function control:The decorator gives them a single primitive:
@cached_llm_call. Wraps any function (sync or async); returns the cached result on hit or calls the function + records the return on miss.Public API
Async variant works the same way (
@cached_llm_call()detectsinspect.iscoroutinefunctionand emits an async wrapper). Generator / async-generator functions raiseTypeErrorat decoration — single-return cache contract.What's in this PR
plans/phase-2-cached-llm-call-decorator.mdpython/rewind_agent/cached_call.py_to_json_serializable+ safe extractorspython/rewind_agent/__init__.pycached_llm_callfrom the top levelpython/rewind_agent/intercept/_flow.py_is_under_cached_llm_call()to suppress double-recording when the decorator is activepython/tests/test_cached_call.pydocs/cached-llm-call.mdTotal: ~1500 LOC across 6 files (700 product + 460 tests + 180 docs + 226 plan).
Composition with
intercept.install()Both can be active in the same process:
_cached_llm_call_active, calls the user's function (which may make HTTP calls under intercept), records the return at function-level granularity. Intercept's_flowchecks the contextvar — when set, skips its own recording to avoid double-recording at two granularities.Net: ONE recording per cache miss, at the function level. Contextvar reset via try/finally so exceptions don't leak the suppression.
Cache key + return-type round-trip
Cache key default: SHA-256 of
f"{fn_qualname}|{json(args, kwargs)}"with_safe_reprfallback for non-JSON-able args. Override viacache_key=parameter for unhashable args (clients, file handles).Return type: decorator stores JSON-serializable values. On hit, you get the JSON-deserialized form back, NOT the original Python type. Common conversions handled automatically (dict, model_dump, Pydantic v1 dict, dict, repr fallback). Documented clearly with the recommended pattern:
return response.model_dump()in the decorated function.Test plan
scripts/pre-push-check.sh— all 5 stages green:RewindReplayDivergenceErrorpropagatesTypeErrorat decorationintercept.install()(tested via_is_under_cached_llm_callcheck + integration via decorator's contextvar)Out of scope (deferred)
docs/recording.md+docs/getting-started.mdfrom 3-way → 4-way — requires PR #150 (docs) to merge first, then a follow-up commit on this branch will extend the matrix.extract_tokensexplicitly. Auto-detection is a v2.1 candidate.TypeErrorat decoration time.Versions
Plan reference
plans/phase-2-cached-llm-call-decorator.md— full design doc, decision rationale, sequencing, acceptance criteria.Made with Cursor