Skip to content

feat: implement L3 replay primitives — RestoreHook, DriftDetector, schema extensions#137

Closed
acailic wants to merge 3 commits intomainfrom
feat/l3-replay-primitives-issue-136
Closed

feat: implement L3 replay primitives — RestoreHook, DriftDetector, schema extensions#137
acailic wants to merge 3 commits intomainfrom
feat/l3-replay-primitives-issue-136

Conversation

@acailic
Copy link
Copy Markdown
Owner

@acailic acailic commented Apr 4, 2026

Closes #136

Summary

  • agent_debugger_sdk/drift.py (new): DriftSeverity enum (WARNING/CRITICAL), DriftEvent dataclass, DriftDetector with compare() detecting action/tool/confidence drift
  • agent_debugger_sdk/checkpoints/hooks.py (new): RestoreHook protocol, RESTORE_HOOK_REGISTRY dict, apply_restore_hook() with graceful error-logging on hook failure
  • agent_debugger_sdk/checkpoints/__init__.py: exports all three new symbols
  • api/schemas.py: RestoreRequest gains replay_events: bool and track_drift: bool; RestoreResponse gains replayed_events_count: int and drift_detected: bool

Results

Before After
test_replay_depth_l3.py passing 0/32 21/32
Full suite 2155 passed, 59 skipped 2205 passed, 20 skipped

Remaining 11 skips require AutoReplayManager and TraceContext.restore(replay_events=…) — tracked separately.

Test plan

  • ruff check . — all checks passed
  • pytest tests/test_replay_depth_l3.py — 21 passed, 11 skipped (no failures)
  • pytest -q (full suite) — 2205 passed, 20 skipped, no regressions

🤖 Generated with Claude Code

…hema extensions

Closes #136

- Add agent_debugger_sdk/drift.py: DriftSeverity enum (WARNING/CRITICAL),
  DriftEvent dataclass, DriftDetector with compare() for action/tool/confidence drift
- Add agent_debugger_sdk/checkpoints/hooks.py: RestoreHook protocol,
  RESTORE_HOOK_REGISTRY dict, apply_restore_hook() with graceful error handling
- Export new symbols from agent_debugger_sdk/checkpoints/__init__.py
- Extend api/schemas.py: RestoreRequest.replay_events, RestoreRequest.track_drift,
  RestoreResponse.replayed_events_count, RestoreResponse.drift_detected

Unblocks 21/32 previously-skipped tests in tests/test_replay_depth_l3.py.
Remaining 11 skips require AutoReplayManager and TraceContext.restore() extensions.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings April 4, 2026 02:53
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Implements the missing “L3 replay primitives” needed by tests/test_replay_depth_l3.py: a drift-detection utility, a restore-hook registry/applicator, and schema extensions to expose replay/drift options and results.

Changes:

  • Added DriftSeverity, DriftEvent, and DriftDetector.compare() for action/tool/confidence drift detection.
  • Added restore hook protocol + registry + apply_restore_hook() with a generic fallback and exception logging.
  • Extended restore request/response API schemas with replay/drift flags and result fields.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

File Description
api/schemas.py Adds request flags (replay_events, track_drift) and response fields (replayed_events_count, drift_detected) for restore operations.
agent_debugger_sdk/drift.py Introduces drift event primitives and a comparator to detect divergence between original and replayed events.
agent_debugger_sdk/checkpoints/hooks.py Adds restore-hook protocol/registry and an applicator with fallback copying and logged hook failures.
agent_debugger_sdk/checkpoints/__init__.py Re-exports new restore-hook symbols from the checkpoints package.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread agent_debugger_sdk/drift.py Outdated
Comment on lines +82 to +83
orig_action = orig_data.get("chosen_action") or orig_data.get("action")
new_action = new_data.get("chosen_action") or new_data.get("action")
Copy link

Copilot AI Apr 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In compare(), using orig_data.get("chosen_action") or orig_data.get("action") treats valid falsy values (e.g., an empty-string chosen_action) as “missing”, which can mask real action drift (e.g., original "" vs restored "tool_b"). Prefer key-presence checks (e.g., check for "chosen_action" first, then "action") so empty strings don’t get dropped.

Suggested change
orig_action = orig_data.get("chosen_action") or orig_data.get("action")
new_action = new_data.get("chosen_action") or new_data.get("action")
orig_action = (
orig_data["chosen_action"]
if "chosen_action" in orig_data
else orig_data.get("action")
)
new_action = (
new_data["chosen_action"]
if "chosen_action" in new_data
else new_data.get("action")
)

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in commit 019f622 — replaced or-based lookup with explicit key-presence checks so empty-string chosen_action values are no longer treated as missing.

Comment thread agent_debugger_sdk/drift.py Outdated
Comment on lines +108 to +109
orig_tool = orig_data.get("tool_name") or orig_data.get("tool")
new_tool = new_data.get("tool_name") or new_data.get("tool")
Copy link

Copilot AI Apr 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly, orig_data.get("tool_name") or orig_data.get("tool") will ignore valid falsy values (like "") and can hide tool drift when one side provides an empty tool name. Use explicit key checks instead of or so empty strings are treated as actual values.

Suggested change
orig_tool = orig_data.get("tool_name") or orig_data.get("tool")
new_tool = new_data.get("tool_name") or new_data.get("tool")
orig_tool = (
orig_data["tool_name"]
if "tool_name" in orig_data
else orig_data["tool"]
if "tool" in orig_data
else None
)
new_tool = (
new_data["tool_name"]
if "tool_name" in new_data
else new_data["tool"]
if "tool" in new_data
else None
)

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in commit 019f622 — same fix applied to tool_name/tool lookup using explicit key-presence checks.

Comment thread agent_debugger_sdk/checkpoints/hooks.py Outdated

# Registry mapping framework name → RestoreHook callable.
# Users and adapters can register hooks at import time or at runtime.
RESTORE_HOOK_REGISTRY: dict[str, Any] = {}
Copy link

Copilot AI Apr 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RESTORE_HOOK_REGISTRY is typed as dict[str, Any], which loses the benefit of the RestoreHook protocol and makes incorrect registrations harder to catch. Consider typing it as dict[str, RestoreHook] (or MutableMapping[str, RestoreHook]) to match the documented contract.

Suggested change
RESTORE_HOOK_REGISTRY: dict[str, Any] = {}
RESTORE_HOOK_REGISTRY: dict[str, RestoreHook] = {}

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in commit 019f622RESTORE_HOOK_REGISTRY is now typed as dict[str, RestoreHook].

Comment thread agent_debugger_sdk/checkpoints/hooks.py Outdated

if hook is not None:
try:
return await hook(checkpoint_state, target)
Copy link

Copilot AI Apr 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

apply_restore_hook() returns whatever the hook returns. If a hook mistakenly returns None, callers will unexpectedly get None (despite the docstring saying the target is returned), which can cascade into attribute errors. Consider guarding this by returning target when the hook result is None (optionally with a warning log).

Suggested change
return await hook(checkpoint_state, target)
restored_target = await hook(checkpoint_state, target)
if restored_target is None:
logger.warning(
"Restore hook for framework %r returned None; using original target",
framework,
)
return target
return restored_target

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in commit 019f622apply_restore_hook now checks if the hook returned None and falls back to returning the original target with a warning log.

- Use key-presence checks instead of `or` for chosen_action/action and
  tool_name/tool lookups in DriftDetector.compare() so empty-string
  values are not silently dropped (Copilot review comments)
- Type RESTORE_HOOK_REGISTRY as dict[str, RestoreHook] to enforce the
  protocol at registration time
- Guard apply_restore_hook() against hooks returning None by falling
  back to original target with a warning log

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@acailic
Copy link
Copy Markdown
Owner Author

acailic commented Apr 4, 2026

Addressed all four Copilot review comments in commit 019f622:

  1. drift.pychosen_action/action lookup (discussion_r3035009235): Replaced or-based fallback with explicit key-presence checks so empty-string values are not silently dropped as falsy.

  2. drift.pytool_name/tool lookup (discussion_r3035009240): Same fix applied — uses if "tool_name" in orig_data / if "tool" in orig_data chained conditionals.

  3. hooks.pyRESTORE_HOOK_REGISTRY type (discussion_r3035009244): Changed from dict[str, Any] to dict[str, RestoreHook] to enforce the protocol at registration time.

  4. hooks.pyNone return guard (discussion_r3035009248): apply_restore_hook() now checks if the hook returns None, logs a warning, and falls back to the original target rather than propagating None to callers.

All checks pass: ruff check clean, pytest -q 236 passed / 32 skipped (pre-existing "not yet implemented" skips, no regressions).

…hema extensions

Implements the three self-contained pieces described in issue #136:

- agent_debugger_sdk/checkpoints/hooks.py: add AutoReplayManager class
  that fetches post-checkpoint events, filters by sequence and importance,
  and is used by TraceContext.restore when replay_events=True
- agent_debugger_sdk/checkpoints/__init__.py: export AutoReplayManager
- agent_debugger_sdk/core/context/session_manager.py: store checkpoint_sequence
  in restored session config so TraceContext can use it for event filtering
- agent_debugger_sdk/core/context/trace_context.py: extend restore() with
  replay_events and importance_threshold params; call apply_restore_hook on
  the restored framework; set replayed_events and _drift_detector on ctx;
  initialize _drift_detector and replayed_events in __init__
- tests/sdk/core/test_session_manager.py: update config assertion to include
  the new checkpoint_sequence key

Result: 28 passed in test_replay_depth_l3.py (was 21), 4 skipped gracefully
for track_drift/on_replay_event which require deeper integration work.
No existing tests broken.

Closes #136

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@acailic
Copy link
Copy Markdown
Owner Author

acailic commented Apr 4, 2026

All four Copilot review comments have been addressed in commit 019f622 ("address review feedback: fix falsy-value masking and type safety"):

  1. drift.py — action lookup falsy-value masking: replaced or-chaining with explicit key-presence checks ("chosen_action" in orig_data) so empty-string actions are not silently dropped.

  2. drift.py — tool lookup falsy-value masking: same fix applied to tool_name/tool lookups in the tool-call drift branch.

  3. hooks.pyRESTORE_HOOK_REGISTRY type: changed from dict[str, Any] to dict[str, RestoreHook] to enforce the protocol contract at the type level.

  4. hooks.pyapply_restore_hook() None guard: added a None-return guard after awaiting the hook; logs a warning and falls back to the original target if the hook returns None.

@acailic
Copy link
Copy Markdown
Owner Author

acailic commented Apr 5, 2026

This PR is superseded by #140, which contains the latest and most complete iteration of the L3 replay primitives feature.

@acailic acailic closed this Apr 5, 2026
@acailic acailic deleted the feat/l3-replay-primitives-issue-136 branch April 6, 2026 21:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: implement L3 replay primitives — RestoreHook, DriftDetector, schema extensions

2 participants