Expose RLM metrics to verifiers by snimu · Pull Request #1195 · PrimeIntellect-ai/verifiers

snimu · 2026-04-19T18:10:32Z

Description

Previously, ComposableEnv metrics were namespaces but never added to state["metrics"] and thus never made visible. This PR fixes that, especially with an eye on the RLM harness.

Type of Change

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Documentation update
Test improvement

Testing

All existing tests pass when running uv run pytest locally.
New tests have been added to cover the changes

Checklist

My code follows the style guidelines of this project as outlined in AGENTS.md
I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
Any dependent changes have been merged and published

Additional Notes

Note

Medium Risk
Touches scoring/aggregation paths by adding a new cleanup rubric and changing RubricGroup to treat None rewards as 0, which could affect how rewards/metrics are combined across rubrics.

Overview
ComposableEnv now exposes harness-collected numeric metrics via state["metrics"]. It collects prefixed numeric values into state["_harness_metrics"] during post_rollout, then a new HarnessMetricsRubric merges them into state["metrics"] during rubric cleanup when metrics_path is configured.

Rubric aggregation is made more robust by treating None reward values as 0.0 in RubricGroup for both score_rollout and score_group, avoiding crashes when a rubric only contributes metrics.

Tests are updated/added to assert RLM harness metrics end up in state["metrics"] after cleanup and that scoring still succeeds with the new metrics rubric.

^{Reviewed by Cursor Bugbot for commit a3a4f71. Bugbot is set up for automated code reviews on this repo. Configure here.}

mikasenghaas · 2026-04-19T18:23:26Z

 logger = logging.getLogger(__name__)


+class _HarnessMetricsRubric(vf.Rubric):


i dont like private classes

Made public

mikasenghaas · 2026-04-19T18:26:11Z


+class _HarnessMetricsRubric(vf.Rubric):
+    @cleanup
+    async def merge_harness_metrics(self, state: State) -> None:


mmh doing this as cleanup func seems like an anti-pattern. can we not just read harness metrics into metrics directly and not have a HarnessMetricRubric at all?

Discussed internally: this is the only way to do this without touching core verifiers functionality, which we should be very careful about.

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit 7e2973c. Configure here.}

snimu added 2 commits April 19, 2026 17:55

Mirror harness metrics into state metrics

1255eb1

Merge harness metrics during composable cleanup

15aaee2

snimu requested a review from hallerite April 19, 2026 18:10

cursor Bot reviewed Apr 19, 2026

View reviewed changes

Comment thread verifiers/envs/experimental/composable/composable_env.py

Make harness metrics rubric cleanup-only

6d9da81

mikasenghaas reviewed Apr 19, 2026

View reviewed changes

Treat missing rubric rewards as zero

7e2973c

cursor Bot reviewed Apr 19, 2026

View reviewed changes

Comment thread verifiers/envs/experimental/composable/composable_env.py

Rename harness metrics rubric

a3a4f71

snimu requested review from mikasenghaas and rasdani April 20, 2026 09:06

rasdani approved these changes Apr 20, 2026

View reviewed changes

snimu merged commit b141428 into main Apr 20, 2026
6 checks passed

snimu mentioned this pull request Apr 22, 2026

chore: v0.1.13.dev4 dev release #1227

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Expose RLM metrics to verifiers#1195

Expose RLM metrics to verifiers#1195
snimu merged 5 commits intomainfrom
sebastian/harness-metrics-in-state-metrics-2026-04-19

snimu commented Apr 19, 2026 •

edited by cursor Bot

Loading

Uh oh!

Uh oh!

mikasenghaas Apr 19, 2026

Uh oh!

snimu Apr 20, 2026

Uh oh!

mikasenghaas Apr 19, 2026

Uh oh!

snimu Apr 20, 2026

Uh oh!

cursor Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		logger = logging.getLogger(__name__)


		class _HarnessMetricsRubric(vf.Rubric):

Conversation

snimu commented Apr 19, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of Change

Testing

Checklist

Additional Notes

Uh oh!

Uh oh!

mikasenghaas Apr 19, 2026

Choose a reason for hiding this comment

Uh oh!

snimu Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

mikasenghaas Apr 19, 2026

Choose a reason for hiding this comment

Uh oh!

snimu Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

snimu commented Apr 19, 2026 •

edited by cursor Bot

Loading