fix(agentic): guard Write against overwrite, allow loop recovery, and harden Write content generation by bobleer · Pull Request #690 · GCWing/BitFun

bobleer · 2026-05-13T00:38:24Z

Summary

Three related improvements that address recurring failure modes we have hit in real agent runs:

Write tool overwrite guard (file_write_tool.rs)
- Write now refuses to overwrite an existing file and returns an error that points the model at the right alternatives: Edit to modify, or Delete followed by Write to fully rewrite.
- The tool description is updated to match, so the model is steered toward the correct workflow up front.
- Motivation: models occasionally regenerate files with incomplete content via Write, silently losing data. Edit is almost always the correct choice; full rewrites remain possible via the explicit Delete + Write sequence.
Recoverable loop detection (execution_engine.rs)
- Both the consecutive-signature and periodic-signature loop detectors used to terminate the round on first detection.
- They now inject a <system_reminder> user message describing the detected loop and asking the model to change strategy, granting up to 3 such recovery attempts (shared across both detectors) before falling back to the previous terminate-with-loop_detected behavior.
- Reminders are persisted via SessionManager so the recovery is visible in transcripts.
- The existing safety net is preserved — we still stop eventually — while genuine transient stalls become recoverable instead of fatal.
Write content generation: prompt + sanitization + omission warning (round_executor.rs)
- Prompt is hardened to forbid omission placeholders (e.g. // rest of the code, // existing code unchanged) and stray markdown fences / wrapper XML, while explicitly allowing literal ... when it is genuine file content (XML/JSON/docs). An assistant prefill of <bitfun_contents>\n is added to bias raw-content output.
- extract_bitfun_contents now sanitizes the body: strips thinking-style XML blocks (<think>, <reasoning>, <reflection>, <analysis>, including the non-standard <think ... > variants some reasoning models emit) and strips outer markdown code fences when they wrap the entire body.
- A conservative detect_placeholder_patterns warning is added. It matches only comment-style omission phrases that are essentially never legitimate in real source/data files (e.g. // ... rest of the code, ) and emits a warning only — it never blocks the write, because Write must remain general enough to produce any kind of file (including prose / XML that legitimately mentions those phrases).

Test plan

cargo check --workspace
cargo test -p bitfun-core --lib agentic::execution::round_executor — 29 tests passing, including new cases:
- thinking-block stripping with attributes / non-standard close
- markdown fence stripping (with and without <bitfun_contents> tags)
- preservation of legitimate XML inside the body
- placeholder detector positive cases (// ... rest of the code, # existing code unchanged, )
- placeholder detector negative cases (XML data containing ... / "the rest of the story", prose discussing the phrase, plain TODO: / FIXME: comments)
Manual smoke against a real agent session to confirm the loop-recovery reminder is delivered as a user message and the model can resume.

…elete+Write The Write tool previously overwrote existing files unconditionally, which made it easy for models to clobber files with incomplete content when they should have used Edit. Refuse the write when the target file already exists and update the tool description to explain the intended workflow: use Edit to modify, or Delete + Write to fully rewrite. The error message returned to the model also points at both alternatives so it can self-correct on the next round.

…tected loops Previously, both the consecutive-signature and periodic-signature loop detectors terminated the round immediately on the first hit. In practice the model can often recover if it is told that it is stuck and asked to change strategy. Inject a system_reminder user message describing the detected loop and asking the model to switch approach, give it up to 3 such recovery attempts (shared across both detectors), and only then fall back to the existing terminate-with-loop_detected behavior. The reminders are also persisted via SessionManager so the recovery is visible in transcripts. This keeps the existing safety net (we still stop eventually) while making genuine transient stalls recoverable instead of fatal.

…del output Two related improvements to the two-stage Write flow that asks the model to emit the full file body inside <bitfun_contents> tags. 1. Prompt hardening - Spell out the rule against omission placeholders ("// rest of the code", "// existing code unchanged", etc.) and clarify that literal "..." is fine when it is genuine file content (XML/JSON/docs). - Forbid markdown fences and stray XML wrappers around the body. - Add an assistant prefill of "<bitfun_contents>\n" to bias the model toward emitting raw content immediately. 2. Output sanitization in extract_bitfun_contents - Strip thinking-style XML blocks (<think>, <reasoning>, <reflection>, <analysis>) including the non-standard <think ... > variants that some reasoning models emit. - Strip outer markdown code fences (```lang ... ```) when they wrap the entire body. - Add a conservative "omission marker" detector that warns when the generated body contains comment-style phrases such as "// ... rest of the code" or "". The detector is deliberately strict (only matches phrases that are essentially never legitimate in real source/data files) and only emits a warning — it never blocks the write, since Write must be able to produce any kind of file, including ones that legitimately discuss these phrases in prose. Adds unit tests covering thinking-block stripping, fence stripping, preservation of legitimate XML, and both positive and negative cases for the placeholder detector (including XML data and prose mentioning the phrases, which must not trigger).

…ent retries - Skip Write content generation when path resolution/policy fails or the file exists - Reject existing targets in validate_input when content is absent to avoid wasted model calls - Treat identical Write content on an existing path as successful no-op (already_exists_same_content) - Add FileWriteTool unit tests and tighten assistant-facing success guidance

bobleer added 3 commits May 13, 2026 08:37

bobleer marked this pull request as draft May 13, 2026 01:07

bobleer marked this pull request as ready for review May 13, 2026 02:17

bobleer merged commit e491491 into GCWing:main May 13, 2026
4 checks passed

bobleer deleted the fix/write-guard-and-loop-recovery branch May 22, 2026 02:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(agentic): guard Write against overwrite, allow loop recovery, and harden Write content generation#690

fix(agentic): guard Write against overwrite, allow loop recovery, and harden Write content generation#690
bobleer merged 4 commits into
GCWing:mainfrom
bobleer:fix/write-guard-and-loop-recovery

bobleer commented May 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

bobleer commented May 13, 2026

Summary

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant