Skip to content

Expose RLM metrics to verifiers#1195

Merged
snimu merged 5 commits intomainfrom
sebastian/harness-metrics-in-state-metrics-2026-04-19
Apr 20, 2026
Merged

Expose RLM metrics to verifiers#1195
snimu merged 5 commits intomainfrom
sebastian/harness-metrics-in-state-metrics-2026-04-19

Conversation

@snimu
Copy link
Copy Markdown
Contributor

@snimu snimu commented Apr 19, 2026

Description

Previously, ComposableEnv metrics were namespaces but never added to state["metrics"] and thus never made visible. This PR fixes that, especially with an eye on the RLM harness.

Type of Change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update
  • Test improvement

Testing

  • All existing tests pass when running uv run pytest locally.
  • New tests have been added to cover the changes

Checklist

  • My code follows the style guidelines of this project as outlined in AGENTS.md
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • Any dependent changes have been merged and published

Additional Notes


Note

Medium Risk
Touches scoring/aggregation paths by adding a new cleanup rubric and changing RubricGroup to treat None rewards as 0, which could affect how rewards/metrics are combined across rubrics.

Overview
ComposableEnv now exposes harness-collected numeric metrics via state["metrics"]. It collects prefixed numeric values into state["_harness_metrics"] during post_rollout, then a new HarnessMetricsRubric merges them into state["metrics"] during rubric cleanup when metrics_path is configured.

Rubric aggregation is made more robust by treating None reward values as 0.0 in RubricGroup for both score_rollout and score_group, avoiding crashes when a rubric only contributes metrics.

Tests are updated/added to assert RLM harness metrics end up in state["metrics"] after cleanup and that scoring still succeeds with the new metrics rubric.

Reviewed by Cursor Bugbot for commit a3a4f71. Bugbot is set up for automated code reviews on this repo. Configure here.

@snimu snimu requested a review from hallerite April 19, 2026 18:10
Comment thread verifiers/envs/experimental/composable/composable_env.py
logger = logging.getLogger(__name__)


class _HarnessMetricsRubric(vf.Rubric):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i dont like private classes

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made public


class _HarnessMetricsRubric(vf.Rubric):
@cleanup
async def merge_harness_metrics(self, state: State) -> None:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mmh doing this as cleanup func seems like an anti-pattern. can we not just read harness metrics into metrics directly and not have a HarnessMetricRubric at all?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed internally: this is the only way to do this without touching core verifiers functionality, which we should be very careful about.

Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 7e2973c. Configure here.

Comment thread verifiers/envs/experimental/composable/composable_env.py
@snimu snimu requested review from mikasenghaas and rasdani April 20, 2026 09:06
@snimu snimu merged commit b141428 into main Apr 20, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants