Skip to content

feat: revert agent test edits at grading (swe_lego, swe_rebench_v2)#1212

Merged
rasdani merged 2 commits intomainfrom
feat/swe-revert-test-patch
Apr 21, 2026
Merged

feat: revert agent test edits at grading (swe_lego, swe_rebench_v2)#1212
rasdani merged 2 commits intomainfrom
feat/swe-revert-test-patch

Conversation

@rasdani
Copy link
Copy Markdown
Contributor

@rasdani rasdani commented Apr 20, 2026

Summary

Both swe_lego and swe_rebench_v2 apply test_patch in setup() so agents can read the failing tests from t=0, but then run test_cmd at grading against whatever state the working tree is in. This PR adds the canonical SWE-bench pre-grading dance — revert agent edits to test files, then re-apply test_patch cleanly — closing a reward-hack loophole where an agent could weaken FAIL_TO_PASS assertions mid-rollout and still score reward=1. Agent source edits (their actual fix) survive untouched; only test-file bits get canonicalized.

Why

Upstream SWE-bench's harness does this exact two-step before running the test command (see swebench/harness/test_spec/python.py#L405-L462 and swebench/harness/utils.py::get_modified_files / get_new_files). Our wrappers previously applied test_patch at setup-time only, so any modification the agent made to the test file in between stuck around for grading. This PR ports the upstream pattern.

Scope

  • New module verifiers/envs/experimental/composable/tasksets/swe/_test_patch.py with get_modified_files, get_new_files, and revert_and_reapply_test_patch. The async helper does git checkout <base_commit> -- <modified>, rm -f <new>, then delegates the re-apply to the taskset's own _apply_patch_file so the taskset-specific git apply flags (different between swe_lego and swe_rebench_v2) are preserved. Uses base_commit (not HEAD) so a git commit by the agent can't shift the checkout target onto their tampered snapshot.
  • swe_lego.py::_run_tests calls the helper at the top of the method, before any existing logic.
  • swe_rebench_v2.py::_run_tests calls the helper right after resolving workdir.
  • Setup-time test_patch apply is left in place on both wrappers (idempotent with the grading-time re-apply). No changes to _calculate_reward, setup, validate_instance, _apply_gold_patch, or any other taskset.

Test plan

Static checks

  • uv run ruff check verifiers/envs/experimental/composable/tasksets/swe/ — clean.
  • uv run pre-commit run --files <touched> — clean (ruff-check, ruff-format, sync-AGENTS).
  • Inline parser sanity tests via uv run python -c ...: modified-only patch, new-file-only patch, mixed, empty string, quoted paths-with-spaces, multiple modifieds, dedup, realistic SWE-bench-shaped test_patch — all 8 cases pass.

Live e2e (against real Prime sandboxes)

Happy-path validate (n=1 per wrapper) — confirms the revert-and-reapply doesn't regress the baseline (every validate_instance call now runs the helper; a broken helper would turn previously-green gold patches red):

wrapper instance valid elapsed
swe_lego adamchainz__apig-wsgi-80 ✅ True 19.98s
swe_rebench_v2 elastic__synthetics-316 ✅ True 44.40s

Sweep validate (n=4 per wrapper, concurrency=2) — light regression across 8 distinct rows:

wrapper result notes
swe_lego 4/4 ✅
swe_rebench_v2 4/4 ✅ First run was 3/4 with a transient on crawler-commons__crawler-commons-227 (java, parse_java_mvn); a clean rerun on the identical instance list came back 4/4. valid=False, error=None without reproducibility — likely Maven/JVM first-run flake, no evidence the revert helper was involved.

Tamper test — the actual guard validation. Bypass gold; tamper test file via AST rewrite (replace every test_*() body with pass, preserving syntax); call _run_tests. Monkey-patched validate_instance for orchestration; monkey-patched revert_and_reapply_test_patch to no-op for the control case:

case tamper revert active? reward valid explanation
A (this PR's code) AST replaced 17 test_* bodies with pass ✅ yes 0.0 False revert restored test_apig_wsgi.py + re-applied test_patch → canonical F2P tests ran against buggy code → failed → cheat blocked
B (control) same tamper ❌ no (monkey-patched) 1.0 True tamper survived → 17 def test_*(): pass bodies trivially succeeded → cheat succeeded

Both cases parsed 21 outcomes (tests ran cleanly in both — tamper preserved syntax, only the grading state differed). The reward split is entirely attributable to the revert step: with it, the loophole is structurally closed; without it, the exact reward-hack scenario this PR exists to prevent reproduces end-to-end.

Known limitations

  • Paths quoted with C-style escapes in diff headers (git config core.quotepath) are not fully decoded — we strip surrounding double quotes but leave backslash-escapes alone. SWE-bench test_patch fields in practice don't carry such paths; noted inline in the module docstring.
  • Revert scope matches upstream SWE-bench: only files listed in test_patch are reverted. An agent could in theory write/modify a conftest.py outside test_patch's file list to affect pytest collection — upstream has the same gap; closing it is a separate hardening question.

Mirror upstream SWE-bench's pre-grading dance so agents can't reward-hack
by weakening FAIL_TO_PASS assertions mid-rollout. Before running the
test command: git-checkout HEAD on any test files the agent modified,
rm -f any newly-added test files, then re-apply test_patch cleanly.
Agent source edits (their actual fix) are untouched.

Adds a small helper module with two unified-diff parsers
(get_modified_files, get_new_files) and an async revert/reapply that
delegates the final re-apply to the taskset's own _apply_patch_file so
the taskset-specific git apply flags are preserved.
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 578cd8b. Configure here.

Comment thread verifiers/envs/experimental/composable/tasksets/swe/_test_patch.py Outdated
Bugbot catch on #1212: `git checkout HEAD -- <test_file>` is unsafe
if an agent runs `git add && git commit` mid-rollout — HEAD then
points at the agent's commit, so "revert" would restore the agent's
(potentially tampered) test version instead of the pristine base state.
Re-apply afterward might then conflict or 3-way-merge against the
tampered content, leaving the reward-hack loophole this PR exists to
close partially open.

Thread `base_commit` through `revert_and_reapply_test_patch` and use
`git checkout <base_commit> -- <path>`. Matches upstream SWE-bench
(swebench/harness/test_spec/python.py#L420). Both call-sites guard on
`base_commit` being non-empty before invoking the helper.

Includes docstring/comment updates pointing at the `HEAD` → `base_commit`
rationale so the reasoning stays with the code.
@rasdani rasdani requested a review from hallerite April 21, 2026 03:13
@rasdani rasdani merged commit 58b164e into main Apr 21, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant