feat: implement L3 replay primitives — RestoreHook, DriftDetector, schema extensions by acailic · Pull Request #137 · acailic/agent_debugger

acailic · 2026-04-04T02:53:44Z

Closes #136

Summary

agent_debugger_sdk/drift.py (new): DriftSeverity enum (WARNING/CRITICAL), DriftEvent dataclass, DriftDetector with compare() detecting action/tool/confidence drift
agent_debugger_sdk/checkpoints/hooks.py (new): RestoreHook protocol, RESTORE_HOOK_REGISTRY dict, apply_restore_hook() with graceful error-logging on hook failure
agent_debugger_sdk/checkpoints/__init__.py: exports all three new symbols
api/schemas.py: RestoreRequest gains replay_events: bool and track_drift: bool; RestoreResponse gains replayed_events_count: int and drift_detected: bool

Results

	Before	After
`test_replay_depth_l3.py` passing	0/32	21/32
Full suite	2155 passed, 59 skipped	2205 passed, 20 skipped

Remaining 11 skips require AutoReplayManager and TraceContext.restore(replay_events=…) — tracked separately.

Test plan

ruff check . — all checks passed
pytest tests/test_replay_depth_l3.py — 21 passed, 11 skipped (no failures)
pytest -q (full suite) — 2205 passed, 20 skipped, no regressions

🤖 Generated with Claude Code

…hema extensions Closes #136 - Add agent_debugger_sdk/drift.py: DriftSeverity enum (WARNING/CRITICAL), DriftEvent dataclass, DriftDetector with compare() for action/tool/confidence drift - Add agent_debugger_sdk/checkpoints/hooks.py: RestoreHook protocol, RESTORE_HOOK_REGISTRY dict, apply_restore_hook() with graceful error handling - Export new symbols from agent_debugger_sdk/checkpoints/__init__.py - Extend api/schemas.py: RestoreRequest.replay_events, RestoreRequest.track_drift, RestoreResponse.replayed_events_count, RestoreResponse.drift_detected Unblocks 21/32 previously-skipped tests in tests/test_replay_depth_l3.py. Remaining 11 skips require AutoReplayManager and TraceContext.restore() extensions. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Copilot

Pull request overview

Implements the missing “L3 replay primitives” needed by tests/test_replay_depth_l3.py: a drift-detection utility, a restore-hook registry/applicator, and schema extensions to expose replay/drift options and results.

Changes:

Added DriftSeverity, DriftEvent, and DriftDetector.compare() for action/tool/confidence drift detection.
Added restore hook protocol + registry + apply_restore_hook() with a generic fallback and exception logging.
Extended restore request/response API schemas with replay/drift flags and result fields.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

File	Description
`api/schemas.py`	Adds request flags (`replay_events`, `track_drift`) and response fields (`replayed_events_count`, `drift_detected`) for restore operations.
`agent_debugger_sdk/drift.py`	Introduces drift event primitives and a comparator to detect divergence between original and replayed events.
`agent_debugger_sdk/checkpoints/hooks.py`	Adds restore-hook protocol/registry and an applicator with fallback copying and logged hook failures.
`agent_debugger_sdk/checkpoints/__init__.py`	Re-exports new restore-hook symbols from the checkpoints package.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-04T02:56:58Z

+            orig_action = orig_data.get("chosen_action") or orig_data.get("action")
+            new_action = new_data.get("chosen_action") or new_data.get("action")


In compare(), using orig_data.get("chosen_action") or orig_data.get("action") treats valid falsy values (e.g., an empty-string chosen_action) as “missing”, which can mask real action drift (e.g., original "" vs restored "tool_b"). Prefer key-presence checks (e.g., check for "chosen_action" first, then "action") so empty strings don’t get dropped.

Suggested change

orig_action = orig_data.get("chosen_action") or orig_data.get("action")

new_action = new_data.get("chosen_action") or new_data.get("action")

orig_action = (

orig_data["chosen_action"]

if "chosen_action" in orig_data

else orig_data.get("action")

)

new_action = (

new_data["chosen_action"]

if "chosen_action" in new_data

else new_data.get("action")

)

Addressed in commit 019f622 — replaced or-based lookup with explicit key-presence checks so empty-string chosen_action values are no longer treated as missing.

Copilot · 2026-04-04T02:56:58Z

+            orig_tool = orig_data.get("tool_name") or orig_data.get("tool")
+            new_tool = new_data.get("tool_name") or new_data.get("tool")


Similarly, orig_data.get("tool_name") or orig_data.get("tool") will ignore valid falsy values (like "") and can hide tool drift when one side provides an empty tool name. Use explicit key checks instead of or so empty strings are treated as actual values.

Suggested change

orig_tool = orig_data.get("tool_name") or orig_data.get("tool")

new_tool = new_data.get("tool_name") or new_data.get("tool")

orig_tool = (

orig_data["tool_name"]

if "tool_name" in orig_data

else orig_data["tool"]

if "tool" in orig_data

else None

)

new_tool = (

new_data["tool_name"]

if "tool_name" in new_data

else new_data["tool"]

if "tool" in new_data

else None

)

Addressed in commit 019f622 — same fix applied to tool_name/tool lookup using explicit key-presence checks.

Copilot · 2026-04-04T02:56:58Z

+
+# Registry mapping framework name → RestoreHook callable.
+# Users and adapters can register hooks at import time or at runtime.
+RESTORE_HOOK_REGISTRY: dict[str, Any] = {}


RESTORE_HOOK_REGISTRY is typed as dict[str, Any], which loses the benefit of the RestoreHook protocol and makes incorrect registrations harder to catch. Consider typing it as dict[str, RestoreHook] (or MutableMapping[str, RestoreHook]) to match the documented contract.

Suggested change

RESTORE_HOOK_REGISTRY: dict[str, Any] = {}

RESTORE_HOOK_REGISTRY: dict[str, RestoreHook] = {}

Addressed in commit 019f622 — RESTORE_HOOK_REGISTRY is now typed as dict[str, RestoreHook].

Copilot · 2026-04-04T02:56:59Z

+
+    if hook is not None:
+        try:
+            return await hook(checkpoint_state, target)


apply_restore_hook() returns whatever the hook returns. If a hook mistakenly returns None, callers will unexpectedly get None (despite the docstring saying the target is returned), which can cascade into attribute errors. Consider guarding this by returning target when the hook result is None (optionally with a warning log).

Suggested change

return await hook(checkpoint_state, target)

restored_target = await hook(checkpoint_state, target)

if restored_target is None:

logger.warning(

"Restore hook for framework %r returned None; using original target",

framework,

)

return target

return restored_target

Addressed in commit 019f622 — apply_restore_hook now checks if the hook returned None and falls back to returning the original target with a warning log.

- Use key-presence checks instead of `or` for chosen_action/action and tool_name/tool lookups in DriftDetector.compare() so empty-string values are not silently dropped (Copilot review comments) - Type RESTORE_HOOK_REGISTRY as dict[str, RestoreHook] to enforce the protocol at registration time - Guard apply_restore_hook() against hooks returning None by falling back to original target with a warning log Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

acailic · 2026-04-04T04:18:53Z

Addressed all four Copilot review comments in commit 019f622:

drift.py — chosen_action/action lookup (discussion_r3035009235): Replaced or-based fallback with explicit key-presence checks so empty-string values are not silently dropped as falsy.
drift.py — tool_name/tool lookup (discussion_r3035009240): Same fix applied — uses if "tool_name" in orig_data / if "tool" in orig_data chained conditionals.
hooks.py — RESTORE_HOOK_REGISTRY type (discussion_r3035009244): Changed from dict[str, Any] to dict[str, RestoreHook] to enforce the protocol at registration time.
hooks.py — None return guard (discussion_r3035009248): apply_restore_hook() now checks if the hook returns None, logs a warning, and falls back to the original target rather than propagating None to callers.

All checks pass: ruff check clean, pytest -q 236 passed / 32 skipped (pre-existing "not yet implemented" skips, no regressions).

…hema extensions Implements the three self-contained pieces described in issue #136: - agent_debugger_sdk/checkpoints/hooks.py: add AutoReplayManager class that fetches post-checkpoint events, filters by sequence and importance, and is used by TraceContext.restore when replay_events=True - agent_debugger_sdk/checkpoints/__init__.py: export AutoReplayManager - agent_debugger_sdk/core/context/session_manager.py: store checkpoint_sequence in restored session config so TraceContext can use it for event filtering - agent_debugger_sdk/core/context/trace_context.py: extend restore() with replay_events and importance_threshold params; call apply_restore_hook on the restored framework; set replayed_events and _drift_detector on ctx; initialize _drift_detector and replayed_events in __init__ - tests/sdk/core/test_session_manager.py: update config assertion to include the new checkpoint_sequence key Result: 28 passed in test_replay_depth_l3.py (was 21), 4 skipped gracefully for track_drift/on_replay_event which require deeper integration work. No existing tests broken. Closes #136 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

acailic · 2026-04-04T10:18:14Z

All four Copilot review comments have been addressed in commit 019f622 ("address review feedback: fix falsy-value masking and type safety"):

drift.py — action lookup falsy-value masking: replaced or-chaining with explicit key-presence checks ("chosen_action" in orig_data) so empty-string actions are not silently dropped.
drift.py — tool lookup falsy-value masking: same fix applied to tool_name/tool lookups in the tool-call drift branch.
hooks.py — RESTORE_HOOK_REGISTRY type: changed from dict[str, Any] to dict[str, RestoreHook] to enforce the protocol contract at the type level.
hooks.py — apply_restore_hook() None guard: added a None-return guard after awaiting the hook; logs a warning and falls back to the original target if the hook returns None.

acailic · 2026-04-05T18:48:44Z

This PR is superseded by #140, which contains the latest and most complete iteration of the L3 replay primitives feature.

Copilot AI review requested due to automatic review settings April 4, 2026 02:53

Copilot started reviewing on behalf of acailic April 4, 2026 02:54 View session

Copilot AI reviewed Apr 4, 2026

View reviewed changes

acailic closed this Apr 5, 2026

acailic deleted the feat/l3-replay-primitives-issue-136 branch April 6, 2026 21:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: implement L3 replay primitives — RestoreHook, DriftDetector, schema extensions#137

feat: implement L3 replay primitives — RestoreHook, DriftDetector, schema extensions#137
acailic wants to merge 3 commits intomainfrom
feat/l3-replay-primitives-issue-136

acailic commented Apr 4, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 4, 2026

Uh oh!

acailic Apr 5, 2026

Uh oh!

Copilot AI Apr 4, 2026

Uh oh!

acailic Apr 5, 2026

Uh oh!

Copilot AI Apr 4, 2026

Uh oh!

acailic Apr 5, 2026

Uh oh!

Copilot AI Apr 4, 2026

Uh oh!

acailic Apr 5, 2026

Uh oh!

acailic commented Apr 4, 2026

Uh oh!

acailic commented Apr 4, 2026

Uh oh!

acailic commented Apr 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		orig_action = orig_data.get("chosen_action") or orig_data.get("action")
		new_action = new_data.get("chosen_action") or new_data.get("action")

-            orig_action = orig_data.get("chosen_action") or orig_data.get("action")
-            new_action = new_data.get("chosen_action") or new_data.get("action")
+            orig_action = (
+                orig_data["chosen_action"]
+                if "chosen_action" in orig_data
+                else orig_data.get("action")
+            )
+            new_action = (
+                new_data["chosen_action"]
+                if "chosen_action" in new_data
+                else new_data.get("action")
+            )

		orig_tool = orig_data.get("tool_name") or orig_data.get("tool")
		new_tool = new_data.get("tool_name") or new_data.get("tool")

-            orig_tool = orig_data.get("tool_name") or orig_data.get("tool")
-            new_tool = new_data.get("tool_name") or new_data.get("tool")
+            orig_tool = (
+                orig_data["tool_name"]
+                if "tool_name" in orig_data
+                else orig_data["tool"]
+                if "tool" in orig_data
+                else None
+            )
+            new_tool = (
+                new_data["tool_name"]
+                if "tool_name" in new_data
+                else new_data["tool"]
+                if "tool" in new_data
+                else None
+            )

	RESTORE_HOOK_REGISTRY: dict[str, Any] = {}
	RESTORE_HOOK_REGISTRY: dict[str, RestoreHook] = {}

-            return await hook(checkpoint_state, target)
+            restored_target = await hook(checkpoint_state, target)
+            if restored_target is None:
+                logger.warning(
+                    "Restore hook for framework %r returned None; using original target",
+                    framework,
+                )
+                return target
+            return restored_target

Conversation

acailic commented Apr 4, 2026

Summary

Results

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 4, 2026

Choose a reason for hiding this comment

Uh oh!

acailic Apr 5, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 4, 2026

Choose a reason for hiding this comment

Uh oh!

acailic Apr 5, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 4, 2026

Choose a reason for hiding this comment

Uh oh!

acailic Apr 5, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 4, 2026

Choose a reason for hiding this comment

Uh oh!

acailic Apr 5, 2026

Choose a reason for hiding this comment

Uh oh!

acailic commented Apr 4, 2026

Uh oh!

acailic commented Apr 4, 2026

Uh oh!

acailic commented Apr 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants