Skip to content

Stage E: fix deferred-write content generation + jitter repair retries#7

Closed
Anmolnoor wants to merge 1 commit into
stage-d/out-of-scope-escalationfrom
stage-e/content-gen-and-retry-fixes
Closed

Stage E: fix deferred-write content generation + jitter repair retries#7
Anmolnoor wants to merge 1 commit into
stage-d/out-of-scope-escalationfrom
stage-e/content-gen-and-retry-fixes

Conversation

@Anmolnoor
Copy link
Copy Markdown
Owner

Context

Two robustness fixes surfaced by running the report flow against live Ollama Cloud models (gemini / kimi / qwen3.5). The crash fixes (A/B) and questioning (C/D) all worked; these address quality/robustness gaps that only appear against a real model.

Note: based on stage-d/out-of-scope-escalation; merge order is #3#4#5#6 → this. GitHub retargets as bases merge.

1. Deferred-write body came out wrapped in a JSON plan

When Stage B materialized a content_brief via the text call, _generate_file_body replayed the planning conversation (prior plans as assistant turns). That primed the model to keep "planning" — it returned another {actions: [...]} object, so the written file was a JSON blob with the real markdown buried in a content field (and stats were vague/hallucinated because the data arrived as planning turns, not clean reference data).

  • Reframe the call: drop the assistant-plan turns; pass gathered data as a single plain reference block with a hard "you are a file-content writer, not a planner — output raw bytes only" instruction.
  • _unwrap_generated_file_body defensive net: if the model still returns an AssistantPlan-shaped blob, extract the write action's content. Plain text and legitimate JSON files (no actions) pass through untouched.

2. Plan repair retries were deterministic

At temperature=0, a model that emits malformed/empty JSON reproduces it identically on retry — wasting the attempt (exactly what kimi did, breaking the same brace twice). ProviderPrompt gains an optional temperature; the planner sets 0.4 on repair retries (first attempt stays deterministic at 0). Ollama honors it for both JSON and text calls.

Verification

Re-ran the original report request end-to-end against qwen3.5:397b-cloud: the report now writes as clean markdown starting with #, using real GitHub values (31 followers, account created 2019, etc.) instead of a JSON wrapper with invented stats.

Tests

4 new + 1 extended (401 total, ruff clean): _unwrap_generated_file_body extracts wrapped content and leaves plain/real-JSON untouched; the orchestrator unwraps a plan-wrapped generated body; the Ollama adapter honors a prompt temperature; the repair retry carries temperature 0.4.

🤖 Generated with Claude Code

Two robustness fixes found by running the report flow against a live model.

1. Deferred-write body came out wrapped in a JSON plan. _generate_file_body
   replayed the planning conversation (prior plans as assistant turns), which
   primed the model to keep emitting an actions object instead of raw file
   bytes — so the written file was a JSON blob with the real markdown buried in
   a content field, and stats were vague/hallucinated.
   - Reframe the call: drop the assistant-plan turns; pass gathered data as a
     single plain reference block with a hard "you are a file-content writer,
     not a planner — output raw bytes only" instruction.
   - Add _unwrap_generated_file_body as a defensive net: if the model still
     returns an AssistantPlan-shaped blob, extract the write action's content;
     plain text and legitimate JSON files pass through untouched.

2. Plan repair retries were deterministic. At temperature 0 a model that emits
   malformed/empty JSON reproduces it identically on retry, wasting the attempt.
   ProviderPrompt gains an optional temperature; the planner sets 0.4 on repair
   retries (first attempt stays deterministic). Ollama honors it for both JSON
   and text calls.

Verified end-to-end against qwen3.5: the report now writes as clean markdown
using real GitHub values (followers, dates) instead of a JSON wrapper.

Tests: unwrap extracts wrapped content and leaves plain/real-JSON untouched;
orchestrator unwraps a plan-wrapped generated body; provider honors a prompt
temperature; the repair retry carries temperature 0.4.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@Anmolnoor
Copy link
Copy Markdown
Owner Author

Merged into main via commit 27c97b0 (stages A–G landed as one merge). See the summary on #9.

@Anmolnoor Anmolnoor closed this May 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant