Skip to content

Remove RLM_DEFAULT_TOOL_NAMES, accept rlm_tools#1223

Merged
snimu merged 4 commits intomainfrom
sebastian/rlm-tool-toggle-2026-04-21
Apr 21, 2026
Merged

Remove RLM_DEFAULT_TOOL_NAMES, accept rlm_tools#1223
snimu merged 4 commits intomainfrom
sebastian/rlm-tool-toggle-2026-04-21

Conversation

@snimu
Copy link
Copy Markdown
Contributor

@snimu snimu commented Apr 21, 2026

Description

The rlm_harness takes environment_vars and the ComposableEnv merges those with its own. That way, we have a unified way to drive both sandbox behavior and verifiers behavior from the harness, while still allowing for environment vars that are given to ComposableEnv.

rlm_harness now takes the RLM_TOOLS environment vars.

Type of Change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update
  • Test improvement

Testing

  • All existing tests pass when running uv run pytest locally.
  • New tests have been added to cover the changes

Checklist

  • My code follows the style guidelines of this project as outlined in AGENTS.md
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • Any dependent changes have been merged and published

Additional Notes


Note

Medium Risk
Changes how sandbox environment variables are merged for ComposableEnv and introduces harness-owned env vars, which could alter runtime behavior for existing harness/tasksets if keys collide or ordering assumptions existed.

Overview
ComposableEnv now merges harness-provided sandbox environment variables (Harness.environment_vars) into the sandbox env, with validation preventing overrides of protected keys.

rlm_harness is updated to accept an rlm_tools list and uses it as the single source of truth for both tool monitoring (tool_names) and the sandbox’s RLM_TOOLS env var, removing the prior fixed default constant. Documentation is updated to reflect the expanded Harness config surface.

Reviewed by Cursor Bugbot for commit dcb4aa1. Bugbot is set up for automated code reviews on this repo. Configure here.

snimu and others added 2 commits April 21, 2026 19:57
Lets callers declare which builtin RLM tools are active so
ToolMonitorRubric tracks exactly that set. Unset keeps today's default
(ipython + summarize). Callers are expected to pass the same list via
ComposableEnv(environment_vars={"RLM_TOOLS": ...}) so the RLM sandbox
advertises the same tools; this pairing is done at the research-env
level.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@snimu snimu requested a review from willccbb April 21, 2026 19:42
Lets a harness own sandbox env vars that are logically paired with
other harness config, so they can't silently desync. Use case:
rlm_harness(rlm_tools=[...]) now sets both Harness.tool_names (for
ToolMonitorRubric) and Harness.environment_vars["RLM_TOOLS"] (for the
sandbox) from the same single kwarg. Research envs stop maintaining a
parallel "RLM_TOOLS": ",".join(...) entry in their own
environment_vars dicts.

Merge order in ComposableEnv.build_env_vars (first→last, later wins):
caller-supplied environment_vars → harness.environment_vars →
taskset.get_env_vars() → AGENT_WORKDIR. Harness wins over caller by
design so a stray caller override can't break a harness-level pairing.
Harness keys collide-check against PROTECTED_ENV_VARS + AGENT_WORKDIR,
same as taskset.

Also adds docstring to rlm_harness making the fan-out explicit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit f14048b. Configure here.

Comment thread verifiers/envs/experimental/composable/harness.py
The Harness field enumeration in docs/environments.md had drifted from
the dataclass. Adds the 5 missing fields in declaration order:
sandbox_spec, skills_path, get_upload_dirs (pre-existing omissions),
tool_names (pre-existing), and environment_vars (this PR).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@snimu snimu merged commit 2667f26 into main Apr 21, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant