Skip to content

Conversation

@ammar-agent
Copy link
Collaborator

Problem

Agents assume file edits succeed and proceed with dependent operations (commits, pushes, builds) before tool results return. During streaming, the agent generates optimistic text like "Now I'll push..." before edit results arrive. When asked later, the agent knows about the failure (it's in context), but the damage is done.

Solution

Move the critical warning to the first line of all file edit tool descriptions with a ⚠️ emoji:

⚠️ CRITICAL: Always check tool results - edits WILL fail if old_string is not found or unique. 
Do not proceed with dependent operations (commits, pushes, builds) until confirming success.

Applied to:

  • file_edit_replace_string
  • file_edit_replace_lines
  • file_edit_insert

Why This Helps

Warnings buried mid-description are easy for streaming models to miss. By placing the warning first with a visual indicator, we increase the likelihood that agents will:

  1. Pause before assuming success
  2. Check tool results explicitly
  3. Only proceed with dependent operations after confirmation

Generated with cmux

Move critical warning to first line with ⚠️ emoji for all file edit tools:
- file_edit_replace_string
- file_edit_replace_lines
- file_edit_insert

This addresses the issue where agents stream optimistic text about edits
succeeding before tool results return. The prominent warning at the start
of the description should help agents check results before proceeding with
dependent operations like commits, pushes, or builds.

Generated with `cmux`
@ammario
Copy link
Member

ammario commented Nov 3, 2025

Seemd to give a 4% improvement to terminal bench scores

@ammario ammario added this pull request to the merge queue Nov 3, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Nov 3, 2025
@ammario ammario merged commit f664385 into main Nov 3, 2025
14 checks passed
@ammario ammario deleted the fix-edits branch November 3, 2025 22:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants