Stage A: harden provider against truncated structured responses#3
Merged
Conversation
When the model hits its output-token budget mid-plan, the JSON object is left unterminated and the turn died with a confusing "invalid JSON" error. This makes truncation a first-class, legible failure: - Add provider.max_output_tokens / num_ctx settings, threaded into both adapters. Ollama gets num_predict/num_ctx options; OpenAI gets max_output_tokens. - Detect truncation explicitly: Ollama done_reason=="length" and OpenAI status=="incomplete" now raise ProviderErrorCode.TRUNCATED with the partial response_text attached, instead of falling through to json.loads. - Planner repair retry is truncation-aware: it tells the model its output was cut off and to emit a shorter plan (use content_brief for large file bodies) rather than the generic "not valid JSON" nudge. - Persist a capped response_text into the EXCEPTION / plan_generation_failed event payloads so the NDJSON log is self-diagnosing without --debug. Tests: num_predict/num_ctx and max_output_tokens wiring, TRUNCATED on done_reason=length and status=incomplete, and orchestrator recovery from a truncated plan carrying the content_brief hint. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This was referenced May 27, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Context
Part 1 of the plan to fix the
Provider returned invalid JSON for a structured responsecrash (traced from session0e54161d). Root cause: afile.writeaction carries the entire file body inline in the plan JSON; the model hits its output-token budget mid-string, the JSON never closes, and the turn dies with a confusing parse error.This stage doesn't change where content lives yet (that's Stage B) — it makes the truncation failure legible and recoverable so it stops masquerading as an invalid-JSON error.
What changed
provider.max_output_tokens/provider.num_ctx, threaded into both adapters (Ollama →num_predict/num_ctxoptions; OpenAI →max_output_tokens).done_reason == "length"and OpenAIstatus == "incomplete"now raise the newProviderErrorCode.TRUNCATEDwith the partialresponse_textattached, instead of falling through tojson.loads.content_brieffor large bodies — landing in Stage B), rather than the generic "not valid JSON" nudge that previously re-sent the same oversized request.response_textis now persisted into theEXCEPTION/plan_generation_failedevent payloads, so the NDJSON event log captures the raw bad response without needing--debug.Tests
5 new (385 total, ruff clean): num_predict/num_ctx + max_output_tokens wiring;
TRUNCATEDondone_reason=lengthandstatus=incomplete; orchestrator recovery from a truncated plan carrying thecontent_briefhint.Sequencing
Stage A of A→B→C→D. Independently shippable; Stage B (decoupling large writes) builds on the
max_output_tokensandcontent_briefgroundwork here.🤖 Generated with Claude Code