Skip to content

Abort rollout when RLM worker recovery fails#891

Merged
snimu merged 1 commit into
mainfrom
sebastian/rlm-improvements-2026-02-10j
Feb 10, 2026
Merged

Abort rollout when RLM worker recovery fails#891
snimu merged 1 commit into
mainfrom
sebastian/rlm-improvements-2026-02-10j

Conversation

@snimu
Copy link
Copy Markdown
Contributor

@snimu snimu commented Feb 10, 2026

Description

Instead of returning a soft error message to the model when the worker can't be restarted, raise RLMWorkerRecoveryError (a SandboxError subclass) to cleanly abort the rollout. If recovery succeeded, the RLM is told that the REPL state is reset.

Type of Change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update
  • Test improvement

Testing

  • All existing tests pass when running uv run pytest locally.
  • New tests have been added to cover the changes

Checklist

  • My code follows the style guidelines of this project as outlined in AGENTS.md
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • Any dependent changes have been merged and published

Note

Medium Risk
Changes error-handling control flow for timeouts, which can alter rollout termination behavior and upstream exception handling, but is scoped to the experimental RLM environment.

Overview
RLM rollouts now hard-fail when worker recovery fails after a code-execution timeout.

RLMEnv._execute_code now attempts _recover_from_code_timeout() and, if restart fails, raises the new RLMWorkerRecoveryError (a vf.SandboxError subclass) instead of returning a soft error string; if recovery succeeds, the returned timeout message always states the worker was restarted and REPL state reset.

Written by Cursor Bugbot for commit e707c33. This will update automatically on new commits. Configure here.

Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.

Comment thread verifiers/envs/experimental/rlm_env.py Outdated
Instead of returning a soft error message to the model when the worker
can't be restarted, raise RLMWorkerRecoveryError (a SandboxError
subclass) to cleanly abort the rollout. Continuing with a dead REPL
just wastes turns.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@snimu snimu force-pushed the sebastian/rlm-improvements-2026-02-10j branch from 573ab88 to e707c33 Compare February 10, 2026 22:01
@snimu snimu merged commit dfd5650 into main Feb 10, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant