Skip to content

Amanley/reward overrides#869

Merged
podkidyshev merged 9 commits into
NVIDIA:mainfrom
alexmanle:amanley/reward_overrides
Apr 14, 2026
Merged

Amanley/reward overrides#869
podkidyshev merged 9 commits into
NVIDIA:mainfrom
alexmanle:amanley/reward_overrides

Conversation

@alexmanle
Copy link
Copy Markdown
Contributor

Summary

Refines the agent config overrides for rewards. Originally only constraint check failures' reward was possible to override, now metric errors can be modified. Also, leaves room for future overrides.

New TOML flags:

[Tests.agent_config.rewards]
constraint_failure = -5.0
metric_failure = 0.0

Test Plan

Added new tests.

uv run python -m pytest \
  tests/test_handlers.py::test_rewards_nested \
  tests/test_cloudaigym.py::test_constraint_failure \
  -v

Keeps similar behavior as #865, but improves the interface.

Note: Overrides only take effect in DSE mode!

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 13, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Replaces numeric metric-error sentinel with a singleton sentinel type and MetricValue alias; introduces a nested RewardOverrides model on agent configs and threads it into CloudAIGymEnv; simplifies BaseGym.step signature and env.step calls; updates many report-strategy return types and tests to use the new sentinel/type.

Changes

Cohort / File(s) Summary
Documentation
doc/USER_GUIDE.rst
Adds agent_config.rewards docs, documents METRIC_ERROR as MetricErrorSentinel and recommends identity checks (is) for metric errors; updates TOML example with [Tests.agent_config.rewards].
Core exports
src/cloudai/core.py
Re-exports MetricErrorSentinel, MetricValue, and RewardOverrides via __all__.
Metric sentinel & TestRun
src/cloudai/_core/test_scenario.py
Replaces numeric METRIC_ERROR with a MetricErrorSentinel() singleton; adds `MetricValue: TypeAlias = float
Reward model & Agent config
src/cloudai/configurator/base_agent.py
Adds Pydantic RewardOverrides (constraint_failure, metric_failure) and replaces constraint_reward_override with rewards: RewardOverrides on BaseAgentConfig.
Base Gym API
src/cloudai/configurator/base_gym.py
Removes constraint_check_reward parameter from BaseGym.step signature and docs.
CloudAIGym implementation
src/cloudai/configurator/cloudai_gym.py
CloudAIGymEnv now accepts/stores rewards; step no longer takes per-call reward—constraint failures return self.rewards.constraint_failure; metric-error slots use self.rewards.metric_failure.
CLI & runners
src/cloudai/cli/handlers.py, src/cloudai/systems/slurm/single_sbatch_runner.py
Construct CloudAIGymEnv(..., rewards=agent_config.rewards) and simplify env.step(...) calls (remove per-step constraint reward arg).
Report generation strategies (typing)
src/cloudai/_core/report_generation_strategy.py, src/cloudai/workloads/.../report_generation_strategy.py (many files)
Change numerous get_metric/extractor return annotations from float to MetricValue and import MetricValue; implementations now explicitly return METRIC_ERROR or numeric MetricValue.
Tests
tests/test_cloudaigym.py, tests/test_handlers.py, tests/workloads/megatron_run/test_report_gen_strategy.py
Tests updated to pass rewards into env, assert metric errors via identity (is METRIC_ERROR), and add test_rewards_nested to validate parsing into RewardOverrides.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐰 I found a sentinel, small and neat,
tucked rewards in configs—what a treat!
Steps borrow from one cozy chest,
metrics signal when they're not at rest.
I hop, I proof, I munch a doc-bit sweet.

🚥 Pre-merge checks | ✅ 1 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name Status Explanation Resolution
Title check ❓ Inconclusive The title 'Amanley/reward overrides' is vague and uses a generic naming pattern (author/branch name) rather than describing the actual change made in the pull request. Replace with a more descriptive title like 'Add reward overrides for metric failures and constraint checks' that clearly indicates the main change.
✅ Passed checks (1 passed)
Check name Status Explanation
Description check ✅ Passed The description clearly explains the refactoring of agent config overrides, introduces the new TOML configuration structure for rewards (constraint_failure and metric_failure), provides test commands, and references related work.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
tests/test_cloudaigym.py (1)

161-189: 🛠️ Refactor suggestion | 🟠 Major

Add an env-level test for metric_failure.

This PR’s new behavior is overriding failed-metric observation slots, but the suite only checks config parsing and constraint_failure. Please add a focused CloudAIGymEnv test that hits get_observation()/step() with a missing reporter or METRIC_ERROR metric and asserts the RewardOverrides(metric_failure=...) value is emitted instead of -1.0.

Based on learnings, prefer expressing behavioral documentation through tests rather than docstrings.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test_cloudaigym.py` around lines 161 - 189, Add a new unit test
mirroring test_constraint_failure that verifies metric_failure behavior: create
a CloudAIGymEnv with a RewardOverrides(metric_failure=VALUE) and a
TestRun/NeMoRunTestDefinition, then simulate a missing reporter or a
METRIC_ERROR by configuring the runner/system or observation inputs so
get_observation()/step() produces a failed metric; call env.step(...) and assert
the observation slot and returned reward equal VALUE (and done/info behave like
the constraint test). Reference CloudAIGymEnv, get_observation(), step(),
RewardOverrides, METRIC_ERROR, TestRun and reuse the test structure of
test_constraint_failure to keep setup consistent.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/cloudai/cli/handlers.py`:
- Around line 148-152: The single-sbatch DSE path constructs CloudAIGymEnv as
CloudAIGymEnv(next_tr, self) so its get_observation()/compute_reward() use
hard-coded defaults and never see agent_config.rewards; update the
single_sbatch_runner code that creates CloudAIGymEnv to thread the same rewards
config (agent_config.rewards) into the CloudAIGymEnv constructor and ensure any
subsequent calls to get_observation() / compute_reward() operate on that
instance so metric_failure and constraint_failure are computed from the provided
rewards instead of staying at -1.0; locate the constructor call in
single_sbatch_runner.py (currently CloudAIGymEnv(next_tr, self)) and pass the
rewards argument consistent with the other call site where CloudAIGymEnv(...,
rewards=agent_config.rewards) is used.

In `@src/cloudai/configurator/cloudai_gym.py`:
- Around line 224-234: The float equality check v ==
self.test_run.get_metric_error_value() is fragile; change to use a unique error
sentinel (or an explicit error flag) instead of a float: update
get_metric_error_value to return a unique object (e.g., METRIC_ERROR = object())
and ensure get_metric_value returns that sentinel when a metric is an error,
then replace the equality check with an identity check (v is
self.test_run.get_metric_error_value()) in the code that builds observation (the
loop using self.test_run.get_metric_value and
self.test_run.get_metric_error_value), or alternatively have get_metric_value
return (value, is_error) and check the is_error flag before appending the
observation.

---

Outside diff comments:
In `@tests/test_cloudaigym.py`:
- Around line 161-189: Add a new unit test mirroring test_constraint_failure
that verifies metric_failure behavior: create a CloudAIGymEnv with a
RewardOverrides(metric_failure=VALUE) and a TestRun/NeMoRunTestDefinition, then
simulate a missing reporter or a METRIC_ERROR by configuring the runner/system
or observation inputs so get_observation()/step() produces a failed metric; call
env.step(...) and assert the observation slot and returned reward equal VALUE
(and done/info behave like the constraint test). Reference CloudAIGymEnv,
get_observation(), step(), RewardOverrides, METRIC_ERROR, TestRun and reuse the
test structure of test_constraint_failure to keep setup consistent.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro Plus

Run ID: cc4dbd81-6c28-4ccf-8d84-386e0aa1c053

📥 Commits

Reviewing files that changed from the base of the PR and between 0442ec9 and 50fca59.

📒 Files selected for processing (9)
  • doc/USER_GUIDE.rst
  • src/cloudai/_core/test_scenario.py
  • src/cloudai/cli/handlers.py
  • src/cloudai/configurator/base_agent.py
  • src/cloudai/configurator/base_gym.py
  • src/cloudai/configurator/cloudai_gym.py
  • src/cloudai/core.py
  • tests/test_cloudaigym.py
  • tests/test_handlers.py

Comment thread src/cloudai/cli/handlers.py
Comment thread src/cloudai/configurator/cloudai_gym.py Outdated
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/cloudai/_core/test_scenario.py`:
- Line 23: Update the typing imports to use built-in generics instead of
deprecated typing aliases: remove List, Set, and Type from the typing import
line (keep TYPE_CHECKING, Any, Optional, TypeAlias, Union) and then update all
usages in this module accordingly (e.g. change List[str] to list[str], change
Set[Type[ReportGenerationStrategy]] to set[type[ReportGenerationStrategy]], and
change Type[ReportGenerationStrategy] to type[ReportGenerationStrategy]) so the
code uses modern built-in type hints.

In `@src/cloudai/systems/slurm/single_sbatch_runner.py`:
- Around line 213-217: If a test supplies agent_config but
registry.agents_map.get(next_tr.test.agent) returns None, fail fast instead of
silently ignoring overrides: check for next_tr.test.agent_config (or truthy
overrides) and if agent_class is None raise a clear error (e.g.,
ValueError/RuntimeError) referencing next_tr.test.agent and that agent_config
cannot be applied; otherwise continue to instantiate agent_config via
agent_class.get_config_class() and read agent_config.rewards as before.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro Plus

Run ID: 14a0b18a-6bd2-4780-92c7-177940e17dc8

📥 Commits

Reviewing files that changed from the base of the PR and between 50fca59 and 587d245.

📒 Files selected for processing (17)
  • doc/USER_GUIDE.rst
  • src/cloudai/_core/report_generation_strategy.py
  • src/cloudai/_core/test_scenario.py
  • src/cloudai/configurator/cloudai_gym.py
  • src/cloudai/core.py
  • src/cloudai/systems/slurm/single_sbatch_runner.py
  • src/cloudai/workloads/ai_dynamo/report_generation_strategy.py
  • src/cloudai/workloads/aiconfig/report_generation_strategy.py
  • src/cloudai/workloads/common/llm_serving.py
  • src/cloudai/workloads/megatron_bridge/report_generation_strategy.py
  • src/cloudai/workloads/megatron_run/report_generation_strategy.py
  • src/cloudai/workloads/nccl_test/performance_report_generation_strategy.py
  • src/cloudai/workloads/nemo_run/report_generation_strategy.py
  • src/cloudai/workloads/nixl_bench/report_generation_strategy.py
  • src/cloudai/workloads/nixl_ep/report_generation_strategy.py
  • src/cloudai/workloads/ucc_test/report_generation_strategy.py
  • tests/workloads/megatron_run/test_report_gen_strategy.py

Comment thread src/cloudai/_core/test_scenario.py
Comment thread src/cloudai/systems/slurm/single_sbatch_runner.py Outdated
podkidyshev
podkidyshev previously approved these changes Apr 14, 2026
Comment thread src/cloudai/_core/test_scenario.py Outdated
Comment thread src/cloudai/configurator/base_agent.py Outdated
Comment thread src/cloudai/configurator/cloudai_gym.py Outdated
Comment thread src/cloudai/systems/slurm/single_sbatch_runner.py Outdated
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/cloudai/_core/report_generation_strategy.py`:
- Around line 33-34: The base implementation of get_metric currently returns 0.0
which falsely signals a successful metric; change get_metric(self, metric: str)
-> MetricValue to return the metric-error sentinel from the MetricValue contract
(i.e., return METRIC_ERROR or MetricValue.METRIC_ERROR depending on how the
sentinel is exposed) so inheritors propagate metric failures correctly instead
of emitting a bogus zero.

In `@src/cloudai/configurator/cloudai_gym.py`:
- Around line 214-222: The None checks around rewards are redundant: replace the
block that initializes obs_replace (currently setting -1.0 then checking
self.rewards and self.rewards.metric_failure) with a direct assignment
obs_replace = self.rewards.metric_failure (since RewardOverrides and
metric_failure are always set via CloudAIGymEnv.__init__), and leave the
subsequent test_run.get_metric_value(...)/METRIC_ERROR handling unchanged.

In `@src/cloudai/systems/slurm/single_sbatch_runner.py`:
- Around line 219-221: The reconstruction path in single_sbatch_runner.py is
bypassing CloudAIGymEnv.step(), so when unroll_dse() skips combinations and
handle_dse() writes a trajectory it calls gym.get_observation(...) and
gym.compute_reward(...) which ignores rewards.constraint_failure; change
handle_dse()/this reconstruction block to detect a failed constraint (the same
check used by CloudAIGymEnv.step()) and, when that failure is true or metrics
are missing, set the reward to rewards.constraint_failure instead of calling
compute_reward, or alternatively invoke CloudAIGymEnv.step() for that transition
so the existing constraint application in CloudAIGymEnv.step() is honored
(reference CloudAIGymEnv.step, get_observation, compute_reward, unroll_dse,
handle_dse, and rewards.constraint_failure).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro Plus

Run ID: 967377dc-aa26-4f34-9f60-a35cd632462d

📥 Commits

Reviewing files that changed from the base of the PR and between 587d245 and fae011b.

📒 Files selected for processing (14)
  • doc/USER_GUIDE.rst
  • src/cloudai/_core/report_generation_strategy.py
  • src/cloudai/_core/test_scenario.py
  • src/cloudai/configurator/base_agent.py
  • src/cloudai/configurator/cloudai_gym.py
  • src/cloudai/systems/slurm/single_sbatch_runner.py
  • src/cloudai/workloads/aiconfig/report_generation_strategy.py
  • src/cloudai/workloads/common/llm_serving.py
  • src/cloudai/workloads/nemo_run/report_generation_strategy.py
  • src/cloudai/workloads/nixl_bench/report_generation_strategy.py
  • src/cloudai/workloads/nixl_ep/report_generation_strategy.py
  • src/cloudai/workloads/ucc_test/report_generation_strategy.py
  • tests/test_cloudaigym.py
  • tests/workloads/megatron_run/test_report_gen_strategy.py

Comment thread src/cloudai/_core/report_generation_strategy.py Outdated
Comment thread src/cloudai/configurator/cloudai_gym.py Outdated
Comment thread src/cloudai/systems/slurm/single_sbatch_runner.py
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
src/cloudai/_core/report_generation_strategy.py (1)

33-33: ⚠️ Potential issue | 🟡 Minor

Mark intentionally unused parameter to satisfy lint.

metric is unused on Line 33 (Ruff ARG002). Rename it to _metric to make intent explicit and keep lint clean.

🔧 Suggested change
-    def get_metric(self, metric: str) -> MetricValue:
+    def get_metric(self, _metric: str) -> MetricValue:
         return METRIC_ERROR
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/cloudai/_core/report_generation_strategy.py` at line 33, Rename the
unused parameter in the method get_metric (MetricValue get_metric) from metric
to _metric to satisfy the Ruff ARG002 lint; update the function signature in
report_generation_strategy.py to use _metric and adjust any internal references
or overrides (if any) to match the new parameter name so callers and subclass
implementations remain consistent.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@src/cloudai/_core/report_generation_strategy.py`:
- Line 33: Rename the unused parameter in the method get_metric (MetricValue
get_metric) from metric to _metric to satisfy the Ruff ARG002 lint; update the
function signature in report_generation_strategy.py to use _metric and adjust
any internal references or overrides (if any) to match the new parameter name so
callers and subclass implementations remain consistent.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro Plus

Run ID: 1ee7f536-967d-4bf3-b0b1-c0a75c0610f2

📥 Commits

Reviewing files that changed from the base of the PR and between fed3792 and 12bfbfd.

📒 Files selected for processing (2)
  • src/cloudai/_core/report_generation_strategy.py
  • src/cloudai/configurator/cloudai_gym.py

@podkidyshev podkidyshev merged commit 4571758 into NVIDIA:main Apr 14, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants