Amanley/reward overrides by alexmanle · Pull Request #869 · NVIDIA/cloudai

alexmanle · 2026-04-13T22:20:38Z

Summary

Refines the agent config overrides for rewards. Originally only constraint check failures' reward was possible to override, now metric errors can be modified. Also, leaves room for future overrides.

New TOML flags:

[Tests.agent_config.rewards]
constraint_failure = -5.0
metric_failure = 0.0

Test Plan

Added new tests.

uv run python -m pytest \
  tests/test_handlers.py::test_rewards_nested \
  tests/test_cloudaigym.py::test_constraint_failure \
  -v

Keeps similar behavior as #865, but improves the interface.

Note: Overrides only take effect in DSE mode!

coderabbitai · 2026-04-13T22:20:53Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Replaces numeric metric-error sentinel with a singleton sentinel type and MetricValue alias; introduces a nested RewardOverrides model on agent configs and threads it into CloudAIGymEnv; simplifies BaseGym.step signature and env.step calls; updates many report-strategy return types and tests to use the new sentinel/type.

Changes

Cohort / File(s)	Summary
Documentation `doc/USER_GUIDE.rst`	Adds `agent_config.rewards` docs, documents `METRIC_ERROR` as `MetricErrorSentinel` and recommends identity checks (`is`) for metric errors; updates TOML example with `[Tests.agent_config.rewards]`.
Core exports `src/cloudai/core.py`	Re-exports `MetricErrorSentinel`, `MetricValue`, and `RewardOverrides` via `__all__`.
Metric sentinel & TestRun `src/cloudai/_core/test_scenario.py`	Replaces numeric `METRIC_ERROR` with a `MetricErrorSentinel()` singleton; adds `MetricValue: TypeAlias = float
Reward model & Agent config `src/cloudai/configurator/base_agent.py`	Adds Pydantic `RewardOverrides` (`constraint_failure`, `metric_failure`) and replaces `constraint_reward_override` with `rewards: RewardOverrides` on `BaseAgentConfig`.
Base Gym API `src/cloudai/configurator/base_gym.py`	Removes `constraint_check_reward` parameter from `BaseGym.step` signature and docs.
CloudAIGym implementation `src/cloudai/configurator/cloudai_gym.py`	`CloudAIGymEnv` now accepts/stores `rewards`; `step` no longer takes per-call reward—constraint failures return `self.rewards.constraint_failure`; metric-error slots use `self.rewards.metric_failure`.
CLI & runners `src/cloudai/cli/handlers.py`, `src/cloudai/systems/slurm/single_sbatch_runner.py`	Construct `CloudAIGymEnv(..., rewards=agent_config.rewards)` and simplify `env.step(...)` calls (remove per-step constraint reward arg).
Report generation strategies (typing) `src/cloudai/_core/report_generation_strategy.py`, `src/cloudai/workloads/.../report_generation_strategy.py` (many files)	Change numerous `get_metric`/extractor return annotations from `float` to `MetricValue` and import `MetricValue`; implementations now explicitly return `METRIC_ERROR` or numeric `MetricValue`.
Tests `tests/test_cloudaigym.py`, `tests/test_handlers.py`, `tests/workloads/megatron_run/test_report_gen_strategy.py`	Tests updated to pass `rewards` into env, assert metric errors via identity (`is METRIC_ERROR`), and add `test_rewards_nested` to validate parsing into `RewardOverrides`.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐰 I found a sentinel, small and neat,
tucked rewards in configs—what a treat!
Steps borrow from one cozy chest,
metrics signal when they're not at rest.
I hop, I proof, I munch a doc-bit sweet.

🚥 Pre-merge checks | ✅ 1 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name	Status	Explanation	Resolution
Title check	❓ Inconclusive	The title 'Amanley/reward overrides' is vague and uses a generic naming pattern (author/branch name) rather than describing the actual change made in the pull request.	Replace with a more descriptive title like 'Add reward overrides for metric failures and constraint checks' that clearly indicates the main change.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Description check	✅ Passed	The description clearly explains the refactoring of agent config overrides, introduces the new TOML configuration structure for rewards (constraint_failure and metric_failure), provides test commands, and references related work.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

tests/test_cloudaigym.py (1)
161-189: 🛠️ Refactor suggestion | 🟠 Major

Add an env-level test for metric_failure.

This PR’s new behavior is overriding failed-metric observation slots, but the suite only checks config parsing and constraint_failure. Please add a focused CloudAIGymEnv test that hits get_observation()/step() with a missing reporter or METRIC_ERROR metric and asserts the RewardOverrides(metric_failure=...) value is emitted instead of -1.0.

Based on learnings, prefer expressing behavioral documentation through tests rather than docstrings.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test_cloudaigym.py` around lines 161 - 189, Add a new unit test
mirroring test_constraint_failure that verifies metric_failure behavior: create
a CloudAIGymEnv with a RewardOverrides(metric_failure=VALUE) and a
TestRun/NeMoRunTestDefinition, then simulate a missing reporter or a
METRIC_ERROR by configuring the runner/system or observation inputs so
get_observation()/step() produces a failed metric; call env.step(...) and assert
the observation slot and returned reward equal VALUE (and done/info behave like
the constraint test). Reference CloudAIGymEnv, get_observation(), step(),
RewardOverrides, METRIC_ERROR, TestRun and reuse the test structure of
test_constraint_failure to keep setup consistent.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/cloudai/cli/handlers.py`:
- Around line 148-152: The single-sbatch DSE path constructs CloudAIGymEnv as
CloudAIGymEnv(next_tr, self) so its get_observation()/compute_reward() use
hard-coded defaults and never see agent_config.rewards; update the
single_sbatch_runner code that creates CloudAIGymEnv to thread the same rewards
config (agent_config.rewards) into the CloudAIGymEnv constructor and ensure any
subsequent calls to get_observation() / compute_reward() operate on that
instance so metric_failure and constraint_failure are computed from the provided
rewards instead of staying at -1.0; locate the constructor call in
single_sbatch_runner.py (currently CloudAIGymEnv(next_tr, self)) and pass the
rewards argument consistent with the other call site where CloudAIGymEnv(...,
rewards=agent_config.rewards) is used.

In `@src/cloudai/configurator/cloudai_gym.py`:
- Around line 224-234: The float equality check v ==
self.test_run.get_metric_error_value() is fragile; change to use a unique error
sentinel (or an explicit error flag) instead of a float: update
get_metric_error_value to return a unique object (e.g., METRIC_ERROR = object())
and ensure get_metric_value returns that sentinel when a metric is an error,
then replace the equality check with an identity check (v is
self.test_run.get_metric_error_value()) in the code that builds observation (the
loop using self.test_run.get_metric_value and
self.test_run.get_metric_error_value), or alternatively have get_metric_value
return (value, is_error) and check the is_error flag before appending the
observation.

---

Outside diff comments:
In `@tests/test_cloudaigym.py`:
- Around line 161-189: Add a new unit test mirroring test_constraint_failure
that verifies metric_failure behavior: create a CloudAIGymEnv with a
RewardOverrides(metric_failure=VALUE) and a TestRun/NeMoRunTestDefinition, then
simulate a missing reporter or a METRIC_ERROR by configuring the runner/system
or observation inputs so get_observation()/step() produces a failed metric; call
env.step(...) and assert the observation slot and returned reward equal VALUE
(and done/info behave like the constraint test). Reference CloudAIGymEnv,
get_observation(), step(), RewardOverrides, METRIC_ERROR, TestRun and reuse the
test structure of test_constraint_failure to keep setup consistent.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro Plus

Run ID: cc4dbd81-6c28-4ccf-8d84-386e0aa1c053

📥 Commits

Reviewing files that changed from the base of the PR and between 0442ec9 and 50fca59.

📒 Files selected for processing (9)

doc/USER_GUIDE.rst
src/cloudai/_core/test_scenario.py
src/cloudai/cli/handlers.py
src/cloudai/configurator/base_agent.py
src/cloudai/configurator/base_gym.py
src/cloudai/configurator/cloudai_gym.py
src/cloudai/core.py
tests/test_cloudaigym.py
tests/test_handlers.py

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/cloudai/_core/test_scenario.py`:
- Line 23: Update the typing imports to use built-in generics instead of
deprecated typing aliases: remove List, Set, and Type from the typing import
line (keep TYPE_CHECKING, Any, Optional, TypeAlias, Union) and then update all
usages in this module accordingly (e.g. change List[str] to list[str], change
Set[Type[ReportGenerationStrategy]] to set[type[ReportGenerationStrategy]], and
change Type[ReportGenerationStrategy] to type[ReportGenerationStrategy]) so the
code uses modern built-in type hints.

In `@src/cloudai/systems/slurm/single_sbatch_runner.py`:
- Around line 213-217: If a test supplies agent_config but
registry.agents_map.get(next_tr.test.agent) returns None, fail fast instead of
silently ignoring overrides: check for next_tr.test.agent_config (or truthy
overrides) and if agent_class is None raise a clear error (e.g.,
ValueError/RuntimeError) referencing next_tr.test.agent and that agent_config
cannot be applied; otherwise continue to instantiate agent_config via
agent_class.get_config_class() and read agent_config.rewards as before.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro Plus

Run ID: 14a0b18a-6bd2-4780-92c7-177940e17dc8

📥 Commits

Reviewing files that changed from the base of the PR and between 50fca59 and 587d245.

📒 Files selected for processing (17)

doc/USER_GUIDE.rst
src/cloudai/_core/report_generation_strategy.py
src/cloudai/_core/test_scenario.py
src/cloudai/configurator/cloudai_gym.py
src/cloudai/core.py
src/cloudai/systems/slurm/single_sbatch_runner.py
src/cloudai/workloads/ai_dynamo/report_generation_strategy.py
src/cloudai/workloads/aiconfig/report_generation_strategy.py
src/cloudai/workloads/common/llm_serving.py
src/cloudai/workloads/megatron_bridge/report_generation_strategy.py
src/cloudai/workloads/megatron_run/report_generation_strategy.py
src/cloudai/workloads/nccl_test/performance_report_generation_strategy.py
src/cloudai/workloads/nemo_run/report_generation_strategy.py
src/cloudai/workloads/nixl_bench/report_generation_strategy.py
src/cloudai/workloads/nixl_ep/report_generation_strategy.py
src/cloudai/workloads/ucc_test/report_generation_strategy.py
tests/workloads/megatron_run/test_report_gen_strategy.py

coderabbitai

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/cloudai/_core/report_generation_strategy.py`:
- Around line 33-34: The base implementation of get_metric currently returns 0.0
which falsely signals a successful metric; change get_metric(self, metric: str)
-> MetricValue to return the metric-error sentinel from the MetricValue contract
(i.e., return METRIC_ERROR or MetricValue.METRIC_ERROR depending on how the
sentinel is exposed) so inheritors propagate metric failures correctly instead
of emitting a bogus zero.

In `@src/cloudai/configurator/cloudai_gym.py`:
- Around line 214-222: The None checks around rewards are redundant: replace the
block that initializes obs_replace (currently setting -1.0 then checking
self.rewards and self.rewards.metric_failure) with a direct assignment
obs_replace = self.rewards.metric_failure (since RewardOverrides and
metric_failure are always set via CloudAIGymEnv.__init__), and leave the
subsequent test_run.get_metric_value(...)/METRIC_ERROR handling unchanged.

In `@src/cloudai/systems/slurm/single_sbatch_runner.py`:
- Around line 219-221: The reconstruction path in single_sbatch_runner.py is
bypassing CloudAIGymEnv.step(), so when unroll_dse() skips combinations and
handle_dse() writes a trajectory it calls gym.get_observation(...) and
gym.compute_reward(...) which ignores rewards.constraint_failure; change
handle_dse()/this reconstruction block to detect a failed constraint (the same
check used by CloudAIGymEnv.step()) and, when that failure is true or metrics
are missing, set the reward to rewards.constraint_failure instead of calling
compute_reward, or alternatively invoke CloudAIGymEnv.step() for that transition
so the existing constraint application in CloudAIGymEnv.step() is honored
(reference CloudAIGymEnv.step, get_observation, compute_reward, unroll_dse,
handle_dse, and rewards.constraint_failure).

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro Plus

Run ID: 967377dc-aa26-4f34-9f60-a35cd632462d

📥 Commits

Reviewing files that changed from the base of the PR and between 587d245 and fae011b.

📒 Files selected for processing (14)

doc/USER_GUIDE.rst
src/cloudai/_core/report_generation_strategy.py
src/cloudai/_core/test_scenario.py
src/cloudai/configurator/base_agent.py
src/cloudai/configurator/cloudai_gym.py
src/cloudai/systems/slurm/single_sbatch_runner.py
src/cloudai/workloads/aiconfig/report_generation_strategy.py
src/cloudai/workloads/common/llm_serving.py
src/cloudai/workloads/nemo_run/report_generation_strategy.py
src/cloudai/workloads/nixl_bench/report_generation_strategy.py
src/cloudai/workloads/nixl_ep/report_generation_strategy.py
src/cloudai/workloads/ucc_test/report_generation_strategy.py
tests/test_cloudaigym.py
tests/workloads/megatron_run/test_report_gen_strategy.py

coderabbitai

♻️ Duplicate comments (1)

src/cloudai/_core/report_generation_strategy.py (1)

33-33: ⚠️ Potential issue | 🟡 Minor

Mark intentionally unused parameter to satisfy lint.

metric is unused on Line 33 (Ruff ARG002). Rename it to _metric to make intent explicit and keep lint clean.

🔧 Suggested change

-    def get_metric(self, metric: str) -> MetricValue:
+    def get_metric(self, _metric: str) -> MetricValue:
         return METRIC_ERROR

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/cloudai/_core/report_generation_strategy.py` at line 33, Rename the
unused parameter in the method get_metric (MetricValue get_metric) from metric
to _metric to satisfy the Ruff ARG002 lint; update the function signature in
report_generation_strategy.py to use _metric and adjust any internal references
or overrides (if any) to match the new parameter name so callers and subclass
implementations remain consistent.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@src/cloudai/_core/report_generation_strategy.py`:
- Line 33: Rename the unused parameter in the method get_metric (MetricValue
get_metric) from metric to _metric to satisfy the Ruff ARG002 lint; update the
function signature in report_generation_strategy.py to use _metric and adjust
any internal references or overrides (if any) to match the new parameter name so
callers and subclass implementations remain consistent.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro Plus

Run ID: 1ee7f536-967d-4bf3-b0b1-c0a75c0610f2

📥 Commits

Reviewing files that changed from the base of the PR and between fed3792 and 12bfbfd.

📒 Files selected for processing (2)

src/cloudai/_core/report_generation_strategy.py
src/cloudai/configurator/cloudai_gym.py

alexmanle added 3 commits April 13, 2026 15:09

unify reward overrides

f87d158

fix reward override tests

af94c9a

update user guide with new agent flags

50fca59

alexmanle requested review from jeffnvidia, podkidyshev and srivatsankrishnan as code owners April 13, 2026 22:20

coderabbitai Bot reviewed Apr 13, 2026

View reviewed changes

Comment thread src/cloudai/cli/handlers.py

Comment thread src/cloudai/configurator/cloudai_gym.py Outdated

alexmanle added 3 commits April 13, 2026 15:30

fix potential issues

a95a3a8

fix float ambiguity across report strategies

906b3a9

doc tweaks

587d245

coderabbitai Bot reviewed Apr 13, 2026

View reviewed changes

Comment thread src/cloudai/_core/test_scenario.py

Comment thread src/cloudai/systems/slurm/single_sbatch_runner.py Outdated

podkidyshev previously approved these changes Apr 14, 2026

View reviewed changes

Comment thread src/cloudai/_core/test_scenario.py Outdated

Comment thread src/cloudai/configurator/base_agent.py Outdated

Comment thread src/cloudai/configurator/cloudai_gym.py Outdated

Comment thread src/cloudai/systems/slurm/single_sbatch_runner.py Outdated

Minor tweaks and fix linting

fae011b

alexmanle dismissed podkidyshev’s stale review via fae011b April 14, 2026 15:54

coderabbitai Bot reviewed Apr 14, 2026

View reviewed changes

Comment thread src/cloudai/_core/report_generation_strategy.py Outdated

Comment thread src/cloudai/configurator/cloudai_gym.py Outdated

Comment thread src/cloudai/systems/slurm/single_sbatch_runner.py

alexmanle added 2 commits April 14, 2026 09:03

fix copyright for ci

fed3792

minor fixes

12bfbfd

coderabbitai Bot reviewed Apr 14, 2026

View reviewed changes

podkidyshev approved these changes Apr 14, 2026

View reviewed changes

podkidyshev merged commit 4571758 into NVIDIA:main Apr 14, 2026
4 checks passed

Conversation

alexmanle commented Apr 13, 2026

Summary

Test Plan

Uh oh!

coderabbitai Bot commented Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 inconclusive)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

coderabbitai Bot commented Apr 13, 2026 •

edited

Loading