Skip to content

Conversation

@ammar-agent
Copy link
Collaborator

@ammar-agent ammar-agent commented Nov 3, 2025

Problem

Terminal-bench agents hit "Redundant path prefix" errors on 55% of tasks (44/80) when using absolute paths like /app/file.txt. This wastes tool calls as agents retry with relative paths—particularly affecting QEMU (80% fail), Git (83% fail), and Security (75% fail) categories.

Solution

Changed validateNoRedundantPrefix to auto-correct paths while keeping the warning. Tools now use the corrected path and include: "Using relative paths like 'file.txt' instead of '/app/file.txt' saves tokens. The path has been auto-corrected for you."

Expected impact: +5-8pp pass rate improvement. Combined with merged 30-min timeout, targeting 60-65% pass rate.

Generated with cmux

Instead of rejecting absolute paths within the workspace, automatically
convert them to relative paths and include a warning to the agent.

**Problem:**
- Oct 31 nightly run showed 55% of terminal-bench tasks (44/80) hit
  'Redundant path prefix' errors
- Agent wastes tool calls retrying with relative paths
- Causes confusion and burns time approaching timeout
- Particularly affects QEMU (80% fail), Git (83% fail), Security (75% fail)

**Solution:**
- validateNoRedundantPrefix now returns {correctedPath, warning} instead of {error}
- Tools (file_read, file_edit_*) auto-correct the path and include warning in response
- Agent still gets feedback to use relative paths, but operation succeeds
- Preserves token-saving goal while removing friction

**Expected impact:**
- Reduce wasted tool calls in 44/80 tasks
- Estimated +5-8 percentage point improvement in pass rate (48-51% vs current 43%)
- Particularly helps QEMU/Git/Security task categories

**Testing:**
- Updated all tests to expect auto-correction instead of errors
- All 26 fileCommon tests passing

_Generated with `cmux`_
chatgpt-codex-connector[bot]

This comment was marked as resolved.

Add optional warning field to FileReadToolResult and FileEditDiffSuccessBase
to support auto-correction warnings.

Also update file_read test to expect auto-correction instead of rejection.
Eliminates duplication across file_read, file_edit_insert, and file_edit_operation by
creating validateAndCorrectPath() wrapper in fileCommon.ts.
@ammario ammario enabled auto-merge November 3, 2025 02:39
@ammario ammario added this pull request to the merge queue Nov 3, 2025
Merged via the queue into main with commit ce69a5c Nov 3, 2025
18 of 19 checks passed
@ammario ammario deleted the fix-path-validation-auto-correct branch November 3, 2025 03:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants