fix(eod): defensive JSON extraction for LLM rationale synthesis (L1248/L2669)#198
Merged
Merged
Conversation
…8/L2669)
`_synthesize_rationales` parsed Haiku's response via bare
`json.loads(response.content[0].text)` which silently fell back to
the template path whenever Haiku returned anything but a clean
top-level JSON object. EOD logs have shown the recurrence
("Expecting value: line 1 column 1 (char 0) — using template
fallback") across multiple runs over the last month — promoted P3
→ P2 in the 2026-05-20 curation pass per the entry's own "if
recurrence confirmed" trigger.
New `_extract_json_object` defensively handles three observed LLM
anomalies before raising:
1. Markdown code fences (```json\n{...}\n``` or ```\n{...}\n```)
2. Conversational preamble ("Sure, here's the JSON:\n{...}")
3. Trailing text after the closing brace
Clean JSON still goes through the fast path. On total parse
failure, log a bounded sample (`raw[:400]`) so the next recurrence
surfaces a concrete failure mode instead of the opaque "char 0"
message, and re-raise into the existing template-fallback handler
so the EOD email still ships.
9 new tests cover the clean / fenced (with + without language tag)
/ preamble / trailing / combined / garbage / empty paths. Suite
905 → 913, all green.
Closes both L1248 + L2669 P3 → P2 promotions (canonical entry +
cross-ref from Executor side) — same fix lands both checkboxes.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…TA, no shortcuts)
Replaces the prior `json.loads` + defensive-string-extraction
helper with Anthropic's tool-use + Pydantic validation — the
institutional approach. Parse failures from text-formatting
anomalies (markdown fences, conversational preamble, trailing
text) are now structurally impossible: Haiku is forced to emit a
typed `tool_use` block whose `input` is SDK-validated against the
declared schema BEFORE it lands here; Pydantic re-validates at the
boundary for type safety + strict field enforcement.
Pattern mirrors alpha-engine-research's `with_structured_output`
discipline (e.g. `agents/macro_agent.py:380`) without adding the
langchain dependency the Executor doesn't otherwise carry — uses
the raw Anthropic SDK's `tools=[...]` + `tool_choice` primitive
directly. Pydantic + Anthropic SDK 0.92 already in
`requirements.txt`.
What changed:
- New `_Narrative` + `_RationalesResponse` Pydantic models define
the contract. The JSON Schema is derived via
`model_json_schema()` and registered as `_RATIONALES_TOOL`.
- `_synthesize_rationales` calls Haiku with
`tools=[_RATIONALES_TOOL]` + `tool_choice={"type": "tool",
"name": "emit_rationales"}` — forced tool emission.
- The tool_use block is picked explicitly (Anthropic may emit a
text block alongside it); its `input` is Pydantic-validated
before being converted to `{ticker: narrative}`.
- Any failure (missing tool_use block, validation error, SDK
error) falls through to the existing template fallback path —
unchanged contract; only the LLM happy path is upgraded.
- Removed the `_extract_json_object` heuristic helper entirely;
it's no longer needed.
Tests rewritten:
- `TestExtractJsonObject` (9 tests of the heuristic) deleted.
- `TestRationalesResponsePydantic` (6): valid payload / empty
list / missing field / wrong type — Pydantic boundary
correctness.
- `TestSynthesizeRationalesToolUse` (5): happy path with
assertions on the SDK call kwargs (`tools`, `tool_choice`) /
text-block-before-tool-block / missing tool_use → template
fallback / malformed input → template fallback / empty contexts
short-circuit.
- All existing `TestSynthesizeRationales` template-fallback
tests still pass (the fallback contract is unchanged).
Suite: 913 → 915 (net +2 after removing the 9 string-extraction
tests and adding 11 contract tests).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Owner
Author
|
Pushed SOTA upgrade as follow-up commit What changed from the initial heuristic:
Tests rewritten: deleted 9 string-extraction tests, added 11 contract tests (Pydantic boundary correctness + tool-use happy/sad paths with SDK-kwargs assertions). Suite 913 → 915. Both commits remain in the PR for review-trail clarity. Anyone reviewing: feel free to squash on merge — the heuristic commit is obsolete. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
`_synthesize_rationales` parsed Haiku's response via bare `json.loads(response.content[0].text)` which silently fell back to the template path whenever Haiku returned anything but a clean top-level JSON object. EOD logs have shown the recurrence (`Expecting value: line 1 column 1 (char 0) — using template fallback`) across multiple runs over the last month — promoted P3 → P2 in the 2026-05-20 curation pass per the entry's own "if recurrence confirmed" trigger.
What changed
New `_extract_json_object` defensively handles three observed LLM anomalies before raising:
Clean JSON still goes through the fast path. On total parse failure, log a bounded sample (`raw[:400]`) so the next recurrence surfaces a concrete failure mode instead of the opaque "char 0" message, and re-raise into the existing template-fallback handler so the EOD email still ships.
Tests
9 new tests in `tests/test_eod_reconcile_logic.py::TestExtractJsonObject`:
Suite: 905 → 913, all green.
Closes
L1248 + L2669 P3 → P2 promotions (canonical entry + cross-ref from the Executor side) — same fix lands both checkboxes.
Test plan
pytest tests/test_eod_reconcile_logic.py→ 36 passed (was 27 before this PR's 9 new tests)🤖 Generated with Claude Code