Skip to content

Add SWE debug environment#1306

Merged
rasdani merged 6 commits intomainfrom
codex/swe-debug-env
May 8, 2026
Merged

Add SWE debug environment#1306
rasdani merged 6 commits intomainfrom
codex/swe-debug-env

Conversation

@rasdani
Copy link
Copy Markdown
Contributor

@rasdani rasdani commented May 7, 2026

Summary

  • add SWEDebugEnv, a no-agent staged debugger for SWE-style SandboxTaskSet instances
  • support optional task setup, one debug_step (none, gold_patch, command, script), and optional test/scoring at exit
  • export SWEDebugEnv from the experimental composable modules and document it in the environment/reference docs
  • remove SWE taskset-specific row filter shortcuts (filter_repos and SWE-rebench-V2 language) so row selection goes through the shared post-processed filter_fn path
  • keep SWE-Smith's profile-coverage filter as a required supportability guard

Validation

  • uv run ruff check verifiers/envs/experimental/composable/swe_debug_env.py verifiers/envs/experimental/composable/tasksets/swe
  • uv run python -m py_compile verifiers/envs/experimental/composable/swe_debug_env.py
  • git diff --check
  • push hooks: ruff check, ruff format, Sync AGENTS.md from docs, ty (ci parity)

Note

Medium Risk
Introduces a new sandbox-executing environment and changes SWE TaskSet constructor parameters (removing filter_repos/language shortcuts), which may break downstream callers and affects sandbox lifecycle/test execution paths.

Overview
Adds SWEDebugEnv, a no-agent environment that creates an SWE SandboxTaskSet sandbox, optionally runs task setup, performs a single debug action (gold_patch, command, or script), and optionally runs tests/scoring while recording detailed timing and output tails in state.

Exports SWEDebugEnv from verifiers.envs.experimental/composable and documents it in the environment and API reference docs.

Simplifies SWE taskset dataset selection by removing taskset-specific row-filter constructor args (e.g. filter_repos, MultiSWE exclude_langs, SWE-rebench-V2 language) in favor of the shared post-processed filter_fn mechanism.

Reviewed by Cursor Bugbot for commit 1f9179d. Bugbot is set up for automated code reviews on this repo. Configure here.

Comment thread verifiers/envs/experimental/composable/swe_debug_env.py Outdated
Comment thread verifiers/envs/experimental/composable/swe_debug_env.py
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

There are 3 total unresolved issues (including 2 from previous reviews).

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit aec1f34. Configure here.

Comment thread verifiers/envs/experimental/composable/tasksets/swe/multi_swe.py Outdated
@rasdani rasdani requested a review from willccbb May 8, 2026 21:10
@rasdani rasdani merged commit 3eddb08 into main May 8, 2026
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant