Merged
Conversation
| logger = logging.getLogger(__name__) | ||
|
|
||
|
|
||
| class _HarnessMetricsRubric(vf.Rubric): |
Member
There was a problem hiding this comment.
i dont like private classes
|
|
||
| class _HarnessMetricsRubric(vf.Rubric): | ||
| @cleanup | ||
| async def merge_harness_metrics(self, state: State) -> None: |
Member
There was a problem hiding this comment.
mmh doing this as cleanup func seems like an anti-pattern. can we not just read harness metrics into metrics directly and not have a HarnessMetricRubric at all?
Contributor
Author
There was a problem hiding this comment.
Discussed internally: this is the only way to do this without touching core verifiers functionality, which we should be very careful about.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 7e2973c. Configure here.
rasdani
approved these changes
Apr 20, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

Description
Previously,
ComposableEnvmetrics were namespaces but never added tostate["metrics"]and thus never made visible. This PR fixes that, especially with an eye on the RLM harness.Type of Change
Testing
uv run pytestlocally.Checklist
Additional Notes
Note
Medium Risk
Touches scoring/aggregation paths by adding a new cleanup rubric and changing
RubricGroupto treatNonerewards as 0, which could affect how rewards/metrics are combined across rubrics.Overview
ComposableEnv now exposes harness-collected numeric metrics via
state["metrics"]. It collects prefixed numeric values intostate["_harness_metrics"]duringpost_rollout, then a newHarnessMetricsRubricmerges them intostate["metrics"]during rubric cleanup whenmetrics_pathis configured.Rubric aggregation is made more robust by treating
Nonerewardvalues as0.0inRubricGroupfor bothscore_rolloutandscore_group, avoiding crashes when a rubric only contributes metrics.Tests are updated/added to assert RLM harness metrics end up in
state["metrics"]after cleanup and that scoring still succeeds with the new metrics rubric.Reviewed by Cursor Bugbot for commit a3a4f71. Bugbot is set up for automated code reviews on this repo. Configure here.