Remove direct prime-sandboxes dependency from rlm-swe v1 by willccbb · Pull Request #1316 · PrimeIntellect-ai/verifiers

willccbb · 2026-05-08T18:20:17Z

Description

Type of Change

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Documentation update
Test improvement

Testing

All existing tests pass when running uv run pytest locally.
New tests have been added to cover the changes

Checklist

My code follows the style guidelines of this project as outlined in AGENTS.md
I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
Any dependent changes have been merged and published

Additional Notes

Note

High Risk
High risk because it replaces the rlm_swe_v1 taskset implementation with dataset-driven, sandbox-backed test staging/execution and adds new sandbox file-transfer/background-job APIs that affect runtime interactions.

Overview
Reworks rlm_swe_v1 to remove Harbor/packaged tasks and instead build tasks from the R2E-Gym/R2E-Gym-Subset dataset, including per-row sandbox image selection, environment variable construction, optional repo filtering, and reward based on running run_tests.sh and parsing pytest summaries.

Adds rollout setup/cleanup hooks to stage hidden tests by archiving/downloading them out of the sandbox and later re-uploading them for scoring; removes the bundled skills/ and tasks/ smoke content and updates packaging/dependencies accordingly.

Expands several v1 example environments to at least 10 examples (more task rows + higher num_examples in pyprojects), refactors some static task sources to be generated from shared lists, and adds new tests to enforce the 10-example minimum and to cover the new rlm_swe_v1 behavior.

Extends verifiers.v1 sandbox helpers (SandboxLease/SandboxHandle) with upload_file, download_file, and run_background_job methods to support the new SWE workflow.

^{Reviewed by Cursor Bugbot for commit 7eeb50a. Bugbot is set up for automated code reviews on this repo. Configure here.}

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

There are 2 total unresolved issues (including 1 from previous review).

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit 8aa8458. Configure here.}

Remove direct prime-sandboxes dependency from rlm-swe v1

217df8e

willccbb requested review from rasdani and xeophon May 8, 2026 18:20

cursor Bot reviewed May 8, 2026

View reviewed changes

Comment thread environments/rlm_swe_v1/rlm_swe_v1.py Outdated

Comment thread environments/rlm_swe_v1/rlm_swe_v1.py

Fix rlm-swe v1 CI failures

1b3a0e3

cursor Bot reviewed May 8, 2026

View reviewed changes

Comment thread environments/rlm_swe_v1/rlm_swe_v1.py

Comment thread environments/rlm_swe_v1/rlm_swe_v1.py

Fix rlm-swe v1 bugbot issues

beb1814

cursor Bot reviewed May 8, 2026

View reviewed changes

Comment thread tests/test_v1_rlm_swe.py

Comment thread environments/rlm_swe_v1/rlm_swe_v1.py

willccbb added 2 commits May 8, 2026 14:59

Cover rlm-swe runtime hook dispatch

f03e8d9

Merge main into rlm-swe sandbox PR

7e55ba5

cursor Bot reviewed May 8, 2026

View reviewed changes

Comment thread environments/rlm_swe_v1/rlm_swe_v1.py Outdated

Fix v1 example count checks for mcp search

8aa8458

cursor Bot reviewed May 9, 2026

View reviewed changes

Comment thread environments/rlm_swe_v1/rlm_swe_v1.py Outdated

willccbb added 2 commits May 8, 2026 18:11

Respect explicit RLM env overrides

223205e

Parse pytest summary status tokens

7eeb50a

willccbb merged commit 3be6614 into main May 9, 2026
8 checks passed

willccbb deleted the codex/audit-v1-env-example-sets branch May 9, 2026 02:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove direct prime-sandboxes dependency from rlm-swe v1#1316

Remove direct prime-sandboxes dependency from rlm-swe v1#1316
willccbb merged 8 commits into
mainfrom
codex/audit-v1-env-example-sets

willccbb commented May 8, 2026 •

edited by cursor Bot

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

willccbb commented May 8, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of Change

Testing

Checklist

Additional Notes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

willccbb commented May 8, 2026 •

edited by cursor Bot

Loading