fix(hooks): auto_correct matches Hermes payload shape (post_tool_call/patch/write_file)#204
Conversation
…/patch/write_file)
Pre-this-fix, _extract_correction only matched the Claude Code stdin
shape (tool_name in {Edit, Write} with args under 'input'). Hermes Agent
hosts (which compose AGENTS.md-aware coding agents) ship a different
shape: {hook_event_name: post_tool_call, tool_name: 'patch'|'write_file'|
'mcp_Patch'|..., tool_input: {...}}. Every Hermes post_tool_call hook
fired but the extractor silently returned None, so events.jsonl never
recorded a hook-driven correction.
Observed in the wild: oliver-admin brain (Hermes runtime + 27-agent fleet)
went 43h with zero new corrections while the fleet ran thousands of patch
calls. system.db had 15 historical CORRECTION events all sourced from
manual brain.correct() calls, zero from hooks.
Fix expands _extract_correction to:
- Accept args under either 'input' (CC) or 'tool_input' (Hermes)
- Map Hermes/MCP tool names to edit/write classes:
EDIT_TOOLS = {Edit, edit, patch, Patch, str_replace, string_replace}
WRITE_TOOLS = {Write, write, write_file, Write_file, create_file}
- Strip mcp_ prefix so mcp_Patch matches Patch
Verified live: a Hermes-shape stdin POST to the hook now produces
captured=true and enqueues to sync_queue; SyncWorker drains it; cloud
brains.last_sync_at advances; correction_count grows.
Regression test pinned in tests/test_auto_correct_payload_shapes.py
(12 cases covering both shapes + malformed input + integration).
Test plan: pytest tests/test_auto_correct_payload_shapes.py tests/test_hooks_learning.py -xvs
=> 39 passed in 6.78s
Layering check: edit is internal to gradata.hooks; no Layer 0 → 2 import.
Risk: low — function is a pure extractor with branched matching; adding
extra tool-name match arms is additive. Claude Code shape behavior is
unchanged (existing tests still pass).
There was a problem hiding this comment.
Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.
📝 Walkthrough
WalkthroughThis PR extends ChangesPayload shape normalization and testing
🎯 3 (Moderate) | ⏱️ ~25 minutes Suggested labels
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 OpenGrep (1.21.0)OpenGrep fatal error (exit code 2): �[32m✔�[39m �[1mOpengrep OSS�[0m �[1m Loading rules from local config...�[0m Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@Gradata/src/gradata/hooks/auto_correct.py`:
- Around line 67-101: The code assumes payload is a mapping and calls
payload.get(...), which raises for non-object JSON; add a top-level guard in
process_hook_input that returns None when payload is not a dict (e.g., if
payload is None, a list, string, number), before any access to payload; update
the start of the function (before computing tool_name, args, or name_normalized)
to check isinstance(payload, dict) and early-return None so the subsequent
payload.get, payload.get("tool_input")/("input"), and name_normalized logic are
safe.
In `@Gradata/tests/test_auto_correct_payload_shapes.py`:
- Around line 96-105: Add regression tests to cover non-object top-level JSON
shapes for _extract_correction: create tests that call _extract_correction with
parsed JSON values that are a list (e.g., []), a string (e.g., "x"), and a
number (e.g., 1) and assert they return None; place them alongside the existing
malformed-input tests in TestExtractCorrectionMalformedInput (also mirror into
the other test block referenced around lines 117-134) so the fragile path
handling non-object top-level types is exercised and returns None.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: a679fdde-e981-4c7b-a89f-651fffdc8913
📒 Files selected for processing (2)
Gradata/src/gradata/hooks/auto_correct.pyGradata/tests/test_auto_correct_payload_shapes.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
- GitHub Check: pytest (py3.11)
- GitHub Check: pytest (py3.12)
- GitHub Check: pytest windows-latest / py3.11
- GitHub Check: pytest ubuntu-latest / py3.11
- GitHub Check: pytest macos-latest / py3.12
- GitHub Check: pytest ubuntu-latest / py3.12
- GitHub Check: pytest windows-latest / py3.12
- GitHub Check: pytest macos-latest / py3.11
🧰 Additional context used
📓 Path-based instructions (2)
Gradata/src/**/*.py
📄 CodeRabbit inference engine (Gradata/AGENTS.md)
Gradata/src/**/*.py: Prefersentence-transformersfor local embeddings,google-genaifor Gemini embeddings,cryptographyfor AES-GCM encrypted system.db,bm25sfor BM25 rule ranking, andmem0aifor external memory adapters — guard all optional dependency imports withtry / except ImportErrorat the call site, never at module level
Maintain strict layering: Layer 0 (Primitives: _types.py, _db.py, _events.py, _paths.py, _file_lock.py; Patterns: contrib/patterns/) must never import from Layer 1 (Enhancements: enhancements/, rules/) or Layer 2 (Public API: brain.py, cli.py, daemon.py, mcp_server.py)
Never use bareexcept: pass— use typed exceptions or at minimumlogger.warning(...)withexc_info=Trueto avoid silent failure in a memory product
Never import from out-of-scope sibling directories../Sprites/or../Hausgem/withingradata/*code — that is a layering bug
Never leak private-sibling paths into public docs/code — no references to../Sprites/,../Hausgem/, email addresses, OneDrive paths, or Sprites-specific examples from insidegradata/*
Use atomic-write helper when writing JSON files to prevent corruption from mid-write crashes
Files:
Gradata/src/gradata/hooks/auto_correct.py
Gradata/tests/**/*.py
📄 CodeRabbit inference engine (Gradata/AGENTS.md)
Gradata/tests/**/*.py: SetBRAIN_DIRenvironment variable viatmp_pathin conftest.py for test isolation — ensure_paths.pymodule cache refreshes when callingBrain.init()directly inside tests
Add unit tests intests/test_*.pyfor every CI push without LLM calls (deterministic); mark integration tests with@pytest.mark.integrationand skip them by default (they hit real LLM APIs)
Files:
Gradata/tests/test_auto_correct_payload_shapes.py
🔇 Additional comments (2)
Gradata/src/gradata/hooks/auto_correct.py (1)
103-120: LGTM!Gradata/tests/test_auto_correct_payload_shapes.py (1)
1-94: LGTM!
| payload: dict, tool_output: dict | str | None = None | ||
| ) -> tuple[str, str] | None: | ||
| """Extract before/after text from a tool call. | ||
| """Extract before/after text from a tool call payload. | ||
|
|
||
| Handles two stdin shapes: | ||
|
|
||
| 1. **Claude Code** — `{"tool_name": "Edit", "input": {"old_string", "new_string"}}` | ||
| or `{"tool_name": "Write", "input": {"content"}}` with optional | ||
| `tool_output.old_content` for Write diffs. | ||
|
|
||
| 2. **Hermes Agent** (and similar AGENTS.md hosts) — | ||
| `{"hook_event_name": "post_tool_call", "tool_name": "patch", | ||
| "tool_input": {"path", "old_string", "new_string"}, ...}` | ||
| or `{"tool_name": "write_file", "tool_input": {"path", "content"}}`. | ||
|
|
||
| Hermes nests tool args under ``tool_input`` (not ``input``) and uses | ||
| lowercase tool names (``patch``, ``write_file``, ``mcp_Patch``, | ||
| ``mcp_Write_file``) rather than ``Edit``/``Write``. Earlier versions | ||
| of this hook only matched the Claude-Code shape, so every Hermes | ||
| post_tool_call hook fired but silently captured zero corrections — | ||
| the canonical "hooks fire but events.jsonl stays idle for days" | ||
| failure mode. See Gradata/gradata#TBD. | ||
|
|
||
| Handles Edit (old_string/new_string) and Write (checks git diff). | ||
| Returns (draft, final) or None if no meaningful correction. | ||
| """ | ||
| tool_name = tool_input.get("tool_name", "") | ||
|
|
||
| if tool_name == "Edit": | ||
| old = tool_input.get("input", {}).get("old_string", "") | ||
| new = tool_input.get("input", {}).get("new_string", "") | ||
| tool_name = payload.get("tool_name", "") or "" | ||
| # Hermes nests args under "tool_input"; Claude Code under "input". | ||
| args = payload.get("tool_input") or payload.get("input") or {} | ||
| if not isinstance(args, dict): | ||
| args = {} | ||
|
|
||
| # Normalize tool name for matching. Strip mcp_ prefix (e.g. "mcp_Patch" -> "Patch"). | ||
| name_normalized = tool_name | ||
| if name_normalized.startswith("mcp_"): | ||
| name_normalized = name_normalized[4:] |
There was a problem hiding this comment.
Guard top-level payload type before accessing mapping fields.
process_hook_input accepts any valid JSON; if stdin is a non-object JSON value, this code path raises at payload.get(...) instead of returning a non-captured result.
💡 Proposed fix
def _extract_correction(
payload: dict, tool_output: dict | str | None = None
) -> tuple[str, str] | None:
+ if not isinstance(payload, dict):
+ return None
+
- tool_name = payload.get("tool_name", "") or ""
+ raw_tool_name = payload.get("tool_name", "")
+ tool_name = raw_tool_name if isinstance(raw_tool_name, str) else ""
# Hermes nests args under "tool_input"; Claude Code under "input".
- args = payload.get("tool_input") or payload.get("input") or {}
- if not isinstance(args, dict):
- args = {}
+ tool_input_args = payload.get("tool_input")
+ input_args = payload.get("input")
+ if isinstance(tool_input_args, dict):
+ args = tool_input_args
+ elif isinstance(input_args, dict):
+ args = input_args
+ else:
+ args = {}🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@Gradata/src/gradata/hooks/auto_correct.py` around lines 67 - 101, The code
assumes payload is a mapping and calls payload.get(...), which raises for
non-object JSON; add a top-level guard in process_hook_input that returns None
when payload is not a dict (e.g., if payload is None, a list, string, number),
before any access to payload; update the start of the function (before computing
tool_name, args, or name_normalized) to check isinstance(payload, dict) and
early-return None so the subsequent payload.get,
payload.get("tool_input")/("input"), and name_normalized logic are safe.
| class TestExtractCorrectionMalformedInput: | ||
| def test_missing_args_returns_none(self) -> None: | ||
| assert _extract_correction({"tool_name": "Edit"}) is None | ||
|
|
||
| def test_args_not_a_dict_returns_none(self) -> None: | ||
| assert _extract_correction({"tool_name": "Edit", "input": "not a dict"}) is None | ||
|
|
||
| def test_empty_payload_returns_none(self) -> None: | ||
| assert _extract_correction({}) is None | ||
|
|
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial | ⚡ Quick win
Add a regression for non-object JSON stdin shape.
This suite validates malformed args, but not malformed top-level JSON types ([], "x", 1) that still parse and currently exercise the fragile path in _extract_correction.
🧪 Proposed test additions
class TestExtractCorrectionMalformedInput:
@@
def test_empty_payload_returns_none(self) -> None:
assert _extract_correction({}) is None
+
+ def test_process_hook_input_non_object_json_returns_not_captured(self) -> None:
+ result = process_hook_input("[]")
+ assert result.get("captured") is FalseAlso applies to: 117-134
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@Gradata/tests/test_auto_correct_payload_shapes.py` around lines 96 - 105, Add
regression tests to cover non-object top-level JSON shapes for
_extract_correction: create tests that call _extract_correction with parsed JSON
values that are a list (e.g., []), a string (e.g., "x"), and a number (e.g., 1)
and assert they return None; place them alongside the existing malformed-input
tests in TestExtractCorrectionMalformedInput (also mirror into the other test
block referenced around lines 117-134) so the fragile path handling non-object
top-level types is exercised and returns None.
…op of #204) PR #204 was the band-aid: it extended a single _extract_correction in auto_correct.py to match Hermes' payload shape alongside Claude Code's. That fixed the 43h sync staleness but left a maintenance hazard — a hard-coded tool-name allowlist living far from the host adapter modules that know their host's tool taxonomy. This refactor moves runtime extraction INTO each host adapter: gradata/hooks/adapters/ _base.py — HostAdapter Protocol, dispatch helpers, shared EDIT_TOOL_ALIASES / WRITE_TOOL_ALIASES, extract primitives (extract_from_edit_args, from_write_args) claude_code.py — detect() + extract_correction() for {tool_name:Edit, input:{...}} hermes.py — detect() + extract_correction() for hook_event_name envelope + tool_input nesting codex.py — detect() + extract_correction() for {tool:apply_patch, input:{...}} gemini.py — detect() + extract_correction() for {tool:..., args:{...}} opencode.py — detect() + extract_correction() for {event:preTool, tool:..., input:{...}} cursor.py — detect() always False (MCP-only, no stdin payloads) auto_correct._extract_correction shrinks to a 12-line dispatcher that walks adapters in priority order; first to claim the payload via detect() routes to its extract_correction(). The contract is enforced by tests/test_adapter_extraction_contracts.py (57 cases) which encodes four invariants: 1. Edit payloads MUST capture — owning adapter detects + extracts; dispatcher returns the tuple. 2. Non-edit payloads MUST be None — owning adapter MAY detect them (structural truth, payload IS from this host) but extract_correction returns None for non-edit tools. Capturing a 'ls' command as a correction would corrupt the brain. 3. Cross-host hygiene — no adapter falsely claims another host's EDIT payload (verified for every host × every other host). 4. Dispatcher robustness — garbage input never crashes; a crashing adapter never blocks the rest of the chain. Bugs surfaced by the contract tests (and fixed in this PR): - codex.detect() was claiming Claude Code's 'Edit' payloads (both used 'input' key). Codex now also requires 'tool' field (not 'tool_name'), no 'event' field (rules out OpenCode), no 'args' field (rules out Gemini). Test plan: pytest tests/test_adapter_extraction_contracts.py \ tests/test_auto_correct_payload_shapes.py \ tests/test_hook_adapters.py \ tests/test_hooks_learning.py => 106 passed in 4.99s Live verification on oliver-admin: echo '{"hook_event_name":"post_tool_call","tool_name":"patch", "tool_input":{"old_string":"a","new_string":"b"}}' | \ BRAIN_DIR=~/.gradata/brain python3 -m gradata.hooks.auto_correct => {"captured": true, ...} Layering: edit confined to gradata.hooks.adapters; no Layer 0 → 2 import. Risk: low — same wire behavior as #204 (Claude Code shape + Hermes shape both capture, non-edits return None), but the responsibility has moved to the adapter modules where new hosts can be added without touching auto_correct.py. Series: #204 (band-aid, Hermes shape support) → THIS PR (clean fix, adapter-owned extraction with contract test).
…op of #204) (#205) PR #204 was the band-aid: it extended a single _extract_correction in auto_correct.py to match Hermes' payload shape alongside Claude Code's. That fixed the 43h sync staleness but left a maintenance hazard — a hard-coded tool-name allowlist living far from the host adapter modules that know their host's tool taxonomy. This refactor moves runtime extraction INTO each host adapter: gradata/hooks/adapters/ _base.py — HostAdapter Protocol, dispatch helpers, shared EDIT_TOOL_ALIASES / WRITE_TOOL_ALIASES, extract primitives (extract_from_edit_args, from_write_args) claude_code.py — detect() + extract_correction() for {tool_name:Edit, input:{...}} hermes.py — detect() + extract_correction() for hook_event_name envelope + tool_input nesting codex.py — detect() + extract_correction() for {tool:apply_patch, input:{...}} gemini.py — detect() + extract_correction() for {tool:..., args:{...}} opencode.py — detect() + extract_correction() for {event:preTool, tool:..., input:{...}} cursor.py — detect() always False (MCP-only, no stdin payloads) auto_correct._extract_correction shrinks to a 12-line dispatcher that walks adapters in priority order; first to claim the payload via detect() routes to its extract_correction(). The contract is enforced by tests/test_adapter_extraction_contracts.py (57 cases) which encodes four invariants: 1. Edit payloads MUST capture — owning adapter detects + extracts; dispatcher returns the tuple. 2. Non-edit payloads MUST be None — owning adapter MAY detect them (structural truth, payload IS from this host) but extract_correction returns None for non-edit tools. Capturing a 'ls' command as a correction would corrupt the brain. 3. Cross-host hygiene — no adapter falsely claims another host's EDIT payload (verified for every host × every other host). 4. Dispatcher robustness — garbage input never crashes; a crashing adapter never blocks the rest of the chain. Bugs surfaced by the contract tests (and fixed in this PR): - codex.detect() was claiming Claude Code's 'Edit' payloads (both used 'input' key). Codex now also requires 'tool' field (not 'tool_name'), no 'event' field (rules out OpenCode), no 'args' field (rules out Gemini). Test plan: pytest tests/test_adapter_extraction_contracts.py \ tests/test_auto_correct_payload_shapes.py \ tests/test_hook_adapters.py \ tests/test_hooks_learning.py => 106 passed in 4.99s Live verification on oliver-admin: echo '{"hook_event_name":"post_tool_call","tool_name":"patch", "tool_input":{"old_string":"a","new_string":"b"}}' | \ BRAIN_DIR=~/.gradata/brain python3 -m gradata.hooks.auto_correct => {"captured": true, ...} Layering: edit confined to gradata.hooks.adapters; no Layer 0 → 2 import. Risk: low — same wire behavior as #204 (Claude Code shape + Hermes shape both capture, non-edits return None), but the responsibility has moved to the adapter modules where new hosts can be added without touching auto_correct.py. Series: #204 (band-aid, Hermes shape support) → THIS PR (clean fix, adapter-owned extraction with contract test). Co-authored-by: data-engineer <data-engineer@gradata.ai>
Summary
_extract_correctiononly matched Claude Code stdin shape (tool_name in {Edit, Write}+ args underinput). Hermes Agent (and similar AGENTS.md hosts) sends{hook_event_name: post_tool_call, tool_name: 'patch'|'write_file'|'mcp_Patch'|..., tool_input: {...}}. Every Hermes post_tool_call hook fired but the extractor silently returned None — so events.jsonl never recorded a hook-driven correction.Observed: oliver-admin brain went 43h with zero new corrections while a 27-agent Hermes fleet ran thousands of patch calls. system.db had 15 historical CORRECTION events all sourced
brain.correct(manual), zerohermes-hook.Fix
Expand
_extract_correctionto accept both shapes:input(CC) ortool_input(Hermes)EDIT_TOOLS = {Edit, edit, patch, Patch, str_replace, string_replace}WRITE_TOOLS = {Write, write, write_file, Write_file, create_file}mcp_prefix somcp_PatchmatchesPatchTest plan
12 new test cases pin both stdin shapes + malformed input + end-to-end
process_hook_inputintegration with a real Brain.Verification on live system (oliver-admin)
Before fix:
After fix:
Layering check
Edit is internal to
gradata.hooks; no Layer 0 → 2 import.Risk
Low — pure extractor with additive matching. Claude Code shape behavior unchanged (existing tests pass).