Skip to content

Stage A: harden provider against truncated structured responses#3

Merged
Anmolnoor merged 1 commit into
mainfrom
stage-a/truncation-hardening
May 27, 2026
Merged

Stage A: harden provider against truncated structured responses#3
Anmolnoor merged 1 commit into
mainfrom
stage-a/truncation-hardening

Conversation

@Anmolnoor
Copy link
Copy Markdown
Owner

Context

Part 1 of the plan to fix the Provider returned invalid JSON for a structured response crash (traced from session 0e54161d). Root cause: a file.write action carries the entire file body inline in the plan JSON; the model hits its output-token budget mid-string, the JSON never closes, and the turn dies with a confusing parse error.

This stage doesn't change where content lives yet (that's Stage B) — it makes the truncation failure legible and recoverable so it stops masquerading as an invalid-JSON error.

What changed

  • New settings provider.max_output_tokens / provider.num_ctx, threaded into both adapters (Ollama → num_predict/num_ctx options; OpenAI → max_output_tokens).
  • Explicit truncation detection: Ollama done_reason == "length" and OpenAI status == "incomplete" now raise the new ProviderErrorCode.TRUNCATED with the partial response_text attached, instead of falling through to json.loads.
  • Truncation-aware repair: the planner's retry tells the model its output was cut off and to emit a shorter plan (use content_brief for large bodies — landing in Stage B), rather than the generic "not valid JSON" nudge that previously re-sent the same oversized request.
  • Self-diagnosing logs: a capped (4 KB) response_text is now persisted into the EXCEPTION / plan_generation_failed event payloads, so the NDJSON event log captures the raw bad response without needing --debug.

Tests

5 new (385 total, ruff clean): num_predict/num_ctx + max_output_tokens wiring; TRUNCATED on done_reason=length and status=incomplete; orchestrator recovery from a truncated plan carrying the content_brief hint.

Sequencing

Stage A of A→B→C→D. Independently shippable; Stage B (decoupling large writes) builds on the max_output_tokens and content_brief groundwork here.

🤖 Generated with Claude Code

When the model hits its output-token budget mid-plan, the JSON object is
left unterminated and the turn died with a confusing "invalid JSON" error.
This makes truncation a first-class, legible failure:

- Add provider.max_output_tokens / num_ctx settings, threaded into both
  adapters. Ollama gets num_predict/num_ctx options; OpenAI gets
  max_output_tokens.
- Detect truncation explicitly: Ollama done_reason=="length" and OpenAI
  status=="incomplete" now raise ProviderErrorCode.TRUNCATED with the
  partial response_text attached, instead of falling through to json.loads.
- Planner repair retry is truncation-aware: it tells the model its output
  was cut off and to emit a shorter plan (use content_brief for large file
  bodies) rather than the generic "not valid JSON" nudge.
- Persist a capped response_text into the EXCEPTION / plan_generation_failed
  event payloads so the NDJSON log is self-diagnosing without --debug.

Tests: num_predict/num_ctx and max_output_tokens wiring, TRUNCATED on
done_reason=length and status=incomplete, and orchestrator recovery from a
truncated plan carrying the content_brief hint.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant