feat: implement L3 replay primitives — RestoreHook, DriftDetector, schema extensions#137
feat: implement L3 replay primitives — RestoreHook, DriftDetector, schema extensions#137
Conversation
…hema extensions Closes #136 - Add agent_debugger_sdk/drift.py: DriftSeverity enum (WARNING/CRITICAL), DriftEvent dataclass, DriftDetector with compare() for action/tool/confidence drift - Add agent_debugger_sdk/checkpoints/hooks.py: RestoreHook protocol, RESTORE_HOOK_REGISTRY dict, apply_restore_hook() with graceful error handling - Export new symbols from agent_debugger_sdk/checkpoints/__init__.py - Extend api/schemas.py: RestoreRequest.replay_events, RestoreRequest.track_drift, RestoreResponse.replayed_events_count, RestoreResponse.drift_detected Unblocks 21/32 previously-skipped tests in tests/test_replay_depth_l3.py. Remaining 11 skips require AutoReplayManager and TraceContext.restore() extensions. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Implements the missing “L3 replay primitives” needed by tests/test_replay_depth_l3.py: a drift-detection utility, a restore-hook registry/applicator, and schema extensions to expose replay/drift options and results.
Changes:
- Added
DriftSeverity,DriftEvent, andDriftDetector.compare()for action/tool/confidence drift detection. - Added restore hook protocol + registry +
apply_restore_hook()with a generic fallback and exception logging. - Extended restore request/response API schemas with replay/drift flags and result fields.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
api/schemas.py |
Adds request flags (replay_events, track_drift) and response fields (replayed_events_count, drift_detected) for restore operations. |
agent_debugger_sdk/drift.py |
Introduces drift event primitives and a comparator to detect divergence between original and replayed events. |
agent_debugger_sdk/checkpoints/hooks.py |
Adds restore-hook protocol/registry and an applicator with fallback copying and logged hook failures. |
agent_debugger_sdk/checkpoints/__init__.py |
Re-exports new restore-hook symbols from the checkpoints package. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| orig_action = orig_data.get("chosen_action") or orig_data.get("action") | ||
| new_action = new_data.get("chosen_action") or new_data.get("action") |
There was a problem hiding this comment.
In compare(), using orig_data.get("chosen_action") or orig_data.get("action") treats valid falsy values (e.g., an empty-string chosen_action) as “missing”, which can mask real action drift (e.g., original "" vs restored "tool_b"). Prefer key-presence checks (e.g., check for "chosen_action" first, then "action") so empty strings don’t get dropped.
| orig_action = orig_data.get("chosen_action") or orig_data.get("action") | |
| new_action = new_data.get("chosen_action") or new_data.get("action") | |
| orig_action = ( | |
| orig_data["chosen_action"] | |
| if "chosen_action" in orig_data | |
| else orig_data.get("action") | |
| ) | |
| new_action = ( | |
| new_data["chosen_action"] | |
| if "chosen_action" in new_data | |
| else new_data.get("action") | |
| ) |
There was a problem hiding this comment.
Addressed in commit 019f622 — replaced or-based lookup with explicit key-presence checks so empty-string chosen_action values are no longer treated as missing.
| orig_tool = orig_data.get("tool_name") or orig_data.get("tool") | ||
| new_tool = new_data.get("tool_name") or new_data.get("tool") |
There was a problem hiding this comment.
Similarly, orig_data.get("tool_name") or orig_data.get("tool") will ignore valid falsy values (like "") and can hide tool drift when one side provides an empty tool name. Use explicit key checks instead of or so empty strings are treated as actual values.
| orig_tool = orig_data.get("tool_name") or orig_data.get("tool") | |
| new_tool = new_data.get("tool_name") or new_data.get("tool") | |
| orig_tool = ( | |
| orig_data["tool_name"] | |
| if "tool_name" in orig_data | |
| else orig_data["tool"] | |
| if "tool" in orig_data | |
| else None | |
| ) | |
| new_tool = ( | |
| new_data["tool_name"] | |
| if "tool_name" in new_data | |
| else new_data["tool"] | |
| if "tool" in new_data | |
| else None | |
| ) |
There was a problem hiding this comment.
Addressed in commit 019f622 — same fix applied to tool_name/tool lookup using explicit key-presence checks.
|
|
||
| # Registry mapping framework name → RestoreHook callable. | ||
| # Users and adapters can register hooks at import time or at runtime. | ||
| RESTORE_HOOK_REGISTRY: dict[str, Any] = {} |
There was a problem hiding this comment.
RESTORE_HOOK_REGISTRY is typed as dict[str, Any], which loses the benefit of the RestoreHook protocol and makes incorrect registrations harder to catch. Consider typing it as dict[str, RestoreHook] (or MutableMapping[str, RestoreHook]) to match the documented contract.
| RESTORE_HOOK_REGISTRY: dict[str, Any] = {} | |
| RESTORE_HOOK_REGISTRY: dict[str, RestoreHook] = {} |
There was a problem hiding this comment.
Addressed in commit 019f622 — RESTORE_HOOK_REGISTRY is now typed as dict[str, RestoreHook].
|
|
||
| if hook is not None: | ||
| try: | ||
| return await hook(checkpoint_state, target) |
There was a problem hiding this comment.
apply_restore_hook() returns whatever the hook returns. If a hook mistakenly returns None, callers will unexpectedly get None (despite the docstring saying the target is returned), which can cascade into attribute errors. Consider guarding this by returning target when the hook result is None (optionally with a warning log).
| return await hook(checkpoint_state, target) | |
| restored_target = await hook(checkpoint_state, target) | |
| if restored_target is None: | |
| logger.warning( | |
| "Restore hook for framework %r returned None; using original target", | |
| framework, | |
| ) | |
| return target | |
| return restored_target |
There was a problem hiding this comment.
Addressed in commit 019f622 — apply_restore_hook now checks if the hook returned None and falls back to returning the original target with a warning log.
- Use key-presence checks instead of `or` for chosen_action/action and tool_name/tool lookups in DriftDetector.compare() so empty-string values are not silently dropped (Copilot review comments) - Type RESTORE_HOOK_REGISTRY as dict[str, RestoreHook] to enforce the protocol at registration time - Guard apply_restore_hook() against hooks returning None by falling back to original target with a warning log Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
Addressed all four Copilot review comments in commit 019f622:
All checks pass: |
…hema extensions Implements the three self-contained pieces described in issue #136: - agent_debugger_sdk/checkpoints/hooks.py: add AutoReplayManager class that fetches post-checkpoint events, filters by sequence and importance, and is used by TraceContext.restore when replay_events=True - agent_debugger_sdk/checkpoints/__init__.py: export AutoReplayManager - agent_debugger_sdk/core/context/session_manager.py: store checkpoint_sequence in restored session config so TraceContext can use it for event filtering - agent_debugger_sdk/core/context/trace_context.py: extend restore() with replay_events and importance_threshold params; call apply_restore_hook on the restored framework; set replayed_events and _drift_detector on ctx; initialize _drift_detector and replayed_events in __init__ - tests/sdk/core/test_session_manager.py: update config assertion to include the new checkpoint_sequence key Result: 28 passed in test_replay_depth_l3.py (was 21), 4 skipped gracefully for track_drift/on_replay_event which require deeper integration work. No existing tests broken. Closes #136 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
All four Copilot review comments have been addressed in commit 019f622 ("address review feedback: fix falsy-value masking and type safety"):
|
|
This PR is superseded by #140, which contains the latest and most complete iteration of the L3 replay primitives feature. |
Closes #136
Summary
agent_debugger_sdk/drift.py(new):DriftSeverityenum (WARNING/CRITICAL),DriftEventdataclass,DriftDetectorwithcompare()detecting action/tool/confidence driftagent_debugger_sdk/checkpoints/hooks.py(new):RestoreHookprotocol,RESTORE_HOOK_REGISTRYdict,apply_restore_hook()with graceful error-logging on hook failureagent_debugger_sdk/checkpoints/__init__.py: exports all three new symbolsapi/schemas.py:RestoreRequestgainsreplay_events: boolandtrack_drift: bool;RestoreResponsegainsreplayed_events_count: intanddrift_detected: boolResults
test_replay_depth_l3.pypassingRemaining 11 skips require
AutoReplayManagerandTraceContext.restore(replay_events=…)— tracked separately.Test plan
ruff check .— all checks passedpytest tests/test_replay_depth_l3.py— 21 passed, 11 skipped (no failures)pytest -q(full suite) — 2205 passed, 20 skipped, no regressions🤖 Generated with Claude Code