Skip to content

fix(agentic): guard Write against overwrite, allow loop recovery, and harden Write content generation#690

Merged
bobleer merged 4 commits into
GCWing:mainfrom
bobleer:fix/write-guard-and-loop-recovery
May 13, 2026
Merged

fix(agentic): guard Write against overwrite, allow loop recovery, and harden Write content generation#690
bobleer merged 4 commits into
GCWing:mainfrom
bobleer:fix/write-guard-and-loop-recovery

Conversation

@bobleer
Copy link
Copy Markdown
Collaborator

@bobleer bobleer commented May 13, 2026

Summary

Three related improvements that address recurring failure modes we have hit in real agent runs:

  1. Write tool overwrite guard (file_write_tool.rs)

    • Write now refuses to overwrite an existing file and returns an error that points the model at the right alternatives: Edit to modify, or Delete followed by Write to fully rewrite.
    • The tool description is updated to match, so the model is steered toward the correct workflow up front.
    • Motivation: models occasionally regenerate files with incomplete content via Write, silently losing data. Edit is almost always the correct choice; full rewrites remain possible via the explicit Delete + Write sequence.
  2. Recoverable loop detection (execution_engine.rs)

    • Both the consecutive-signature and periodic-signature loop detectors used to terminate the round on first detection.
    • They now inject a <system_reminder> user message describing the detected loop and asking the model to change strategy, granting up to 3 such recovery attempts (shared across both detectors) before falling back to the previous terminate-with-loop_detected behavior.
    • Reminders are persisted via SessionManager so the recovery is visible in transcripts.
    • The existing safety net is preserved — we still stop eventually — while genuine transient stalls become recoverable instead of fatal.
  3. Write content generation: prompt + sanitization + omission warning (round_executor.rs)

    • Prompt is hardened to forbid omission placeholders (e.g. // rest of the code, // existing code unchanged) and stray markdown fences / wrapper XML, while explicitly allowing literal ... when it is genuine file content (XML/JSON/docs). An assistant prefill of <bitfun_contents>\n is added to bias raw-content output.
    • extract_bitfun_contents now sanitizes the body: strips thinking-style XML blocks (<think>, <reasoning>, <reflection>, <analysis>, including the non-standard <think ... > variants some reasoning models emit) and strips outer markdown code fences when they wrap the entire body.
    • A conservative detect_placeholder_patterns warning is added. It matches only comment-style omission phrases that are essentially never legitimate in real source/data files (e.g. // ... rest of the code, <!-- snip -->) and emits a warning only — it never blocks the write, because Write must remain general enough to produce any kind of file (including prose / XML that legitimately mentions those phrases).

Test plan

  • cargo check --workspace
  • cargo test -p bitfun-core --lib agentic::execution::round_executor — 29 tests passing, including new cases:
    • thinking-block stripping with attributes / non-standard close
    • markdown fence stripping (with and without <bitfun_contents> tags)
    • preservation of legitimate XML inside the body
    • placeholder detector positive cases (// ... rest of the code, # existing code unchanged, <!-- snip -->)
    • placeholder detector negative cases (XML data containing ... / "the rest of the story", prose discussing the phrase, plain TODO: / FIXME: comments)
  • Manual smoke against a real agent session to confirm the loop-recovery reminder is delivered as a user message and the model can resume.

bobleer added 3 commits May 13, 2026 08:37
…elete+Write

The Write tool previously overwrote existing files unconditionally, which
made it easy for models to clobber files with incomplete content when they
should have used Edit. Refuse the write when the target file already
exists and update the tool description to explain the intended workflow:
use Edit to modify, or Delete + Write to fully rewrite.

The error message returned to the model also points at both alternatives
so it can self-correct on the next round.
…tected loops

Previously, both the consecutive-signature and periodic-signature loop
detectors terminated the round immediately on the first hit. In practice
the model can often recover if it is told that it is stuck and asked to
change strategy.

Inject a system_reminder user message describing the detected loop and
asking the model to switch approach, give it up to 3 such recovery
attempts (shared across both detectors), and only then fall back to the
existing terminate-with-loop_detected behavior. The reminders are also
persisted via SessionManager so the recovery is visible in transcripts.

This keeps the existing safety net (we still stop eventually) while
making genuine transient stalls recoverable instead of fatal.
…del output

Two related improvements to the two-stage Write flow that asks the model
to emit the full file body inside <bitfun_contents> tags.

1. Prompt hardening
   - Spell out the rule against omission placeholders ("// rest of the
     code", "// existing code unchanged", etc.) and clarify that literal
     "..." is fine when it is genuine file content (XML/JSON/docs).
   - Forbid markdown fences and stray XML wrappers around the body.
   - Add an assistant prefill of "<bitfun_contents>\n" to bias the model
     toward emitting raw content immediately.

2. Output sanitization in extract_bitfun_contents
   - Strip thinking-style XML blocks (<think>, <reasoning>,
     <reflection>, <analysis>) including the non-standard <think ... >
     variants that some reasoning models emit.
   - Strip outer markdown code fences (```lang ... ```) when they wrap
     the entire body.
   - Add a conservative "omission marker" detector that warns when the
     generated body contains comment-style phrases such as
     "// ... rest of the code" or "<!-- snip -->". The detector is
     deliberately strict (only matches phrases that are essentially never
     legitimate in real source/data files) and only emits a warning — it
     never blocks the write, since Write must be able to produce any
     kind of file, including ones that legitimately discuss these
     phrases in prose.

Adds unit tests covering thinking-block stripping, fence stripping,
preservation of legitimate XML, and both positive and negative cases for
the placeholder detector (including XML data and prose mentioning the
phrases, which must not trigger).
@bobleer bobleer marked this pull request as draft May 13, 2026 01:07
…ent retries

- Skip Write content generation when path resolution/policy fails or the file exists
- Reject existing targets in validate_input when content is absent to avoid wasted model calls
- Treat identical Write content on an existing path as successful no-op (already_exists_same_content)
- Add FileWriteTool unit tests and tighten assistant-facing success guidance
@bobleer bobleer marked this pull request as ready for review May 13, 2026 02:17
@bobleer bobleer merged commit e491491 into GCWing:main May 13, 2026
4 checks passed
@bobleer bobleer deleted the fix/write-guard-and-loop-recovery branch May 22, 2026 02:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant