Skip to content

Conversation

@yuki-97
Copy link
Contributor

@yuki-97 yuki-97 commented Oct 17, 2025

Previously we forgot to multiple the dataloader length with max_num_epochs when calculating train_iters.
This PR will fix it.

Summary by CodeRabbit

  • Bug Fixes
    • Refined the calculation of total training iterations to provide more granular control over training duration through epoch-based configuration, allowing better flexibility in managing training sessions and step limits.

Signed-off-by: Yuki Huang <yukih@nvidia.com>
@yuki-97 yuki-97 requested a review from a team as a code owner October 17, 2025 06:41
@yuki-97 yuki-97 added the CI:L1 Run doctests, unit tests, and functional tests label Oct 17, 2025
@yuki-97 yuki-97 requested review from ashors1 and terrykong October 17, 2025 06:41
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Oct 17, 2025

📝 Walkthrough

Walkthrough

The calculation of total_train_iters in the setup method is updated to incorporate max_num_epochs. The new formula multiplies the dataloader length by max_num_epochs before taking the minimum with max_num_steps, providing epoch-based iteration control instead of a direct step cap.

Changes

Cohort / File(s) Change Summary
Training iteration calculation
nemo_rl/algorithms/grpo.py
Modified total_train_iters calculation to use min(grpo_config["max_num_steps"], grpo_config["max_num_epochs"] * len(dataloader)) instead of min(grpo_config["max_num_steps"], len(dataloader)), enabling granular epoch-based control

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Test Results For Major Changes ⚠️ Warning This PR is a bug fix that modifies the train_iters calculation in the GRPO algorithm's setup function to correctly include the max_num_epochs multiplier. While the change itself is straightforward and logically correct, the CI workflow shows a critical failure: "CI quality check (No tests run: Label CI:L*): FAILURE", indicating that no tests were executed for this PR. The PR description contains no test results, convergence verification, or evidence demonstrating the absence of regressions. Since this change affects the number of training iterations the algorithm may execute (potentially increasing it via the max_num_epochs factor), it could impact training convergence and numerical outcomes, requiring explicit verification before merge. The PR must document that appropriate tests have been run and have passed. Specifically, the PR description should include: (1) confirmation that the CI tests passed (currently showing "CI quality check: FAILURE"), (2) results from the existing unit tests in tests/unit/algorithms/test_grpo.py and functional tests in tests/functional/grpo.sh, and (3) evidence that training with the fix does not cause regressions (e.g., comparable convergence curves or loss values). At minimum, the PR author should apply the appropriate CI labels to trigger test execution and document the test results in the PR description before merging.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The pull request title "fix: fix mcore train_iters in grpo" is clearly related to the main change in the changeset. The PR specifically fixes the calculation of total_train_iters in the GRPO algorithm, and the title accurately identifies both the component being modified (train_iters) and the file/module (grpo). While the title doesn't elaborate on the specific nature of the fix (adding max_num_epochs multiplication) or explain what "mcore" refers to, it is sufficiently clear and specific that a teammate reviewing the commit history would understand the PR addresses a train_iters issue in grpo. The title is not vague, misleading, or off-topic.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch yukih/fix-train-iters

📜 Recent review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between dee3fd9 and b0e62e7.

📒 Files selected for processing (1)
  • nemo_rl/algorithms/grpo.py (1 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

**/*.py: Follow the Google Python Style Guide for all Python code
Target Python 3.12+ for all Python code in NeMo-RL
Indent Python code with 4 spaces; do not use tabs
Python filenames should be snake_case (e.g., some_file.py)
Class names should be PascalCase
Function and method names should be snake_case
Local variable names should be snake_case; if starting with a number, prefix with k (e.g., k_99th_percentile)
Global variables should be UPPER_SNAKE_CASE and prefixed with G_ (e.g., G_MY_GLOBAL)
Constants should be UPPER_SNAKE_CASE
Avoid shadowing variables declared in an outer scope
Initialize all externally visible members of a class in the constructor
For public interfaces used outside a file, prefer docstrings over comments
Use comments mainly for code within a function or interfaces local to a file
Commented-out code must include a nearby comment explaining usage and why it is commented out; otherwise remove before merging
Use Google-style docstrings for classes and functions (Sphinx-parseable)
Avoid using reflection when functionality can be easily achieved without it
Limit except clauses to the smallest specific set of exceptions possible
For duck-typing via try/except, keep the try body minimal and use else for main logic
Add the NVIDIA copyright header (with current year) at the top of all Python files, excluding tests/ and test-only scripts

Files:

  • nemo_rl/algorithms/grpo.py
nemo_rl/**/*.py

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

nemo_rl/**/*.py: Do not set non-None configuration defaults in code; YAML is the single source of truth for defaults
Access required config attributes directly (e.g., policy_cfg["precision"]) and assume presence; do not introduce hidden defaults
Express configuration optionality via TypedDict using typing.NotRequired
When adding a new config key to a TypedDict subclass, document the key’s purpose, valid values/types, and recommended default in code
For any class or function decorated with @ray.remote, add '# pragma: no cover' on the class/def line (and on remote functions)

Files:

  • nemo_rl/algorithms/grpo.py
🧬 Code graph analysis (1)
nemo_rl/algorithms/grpo.py (1)
tests/check_metrics.py (1)
  • min (24-26)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: Docs_Tests
  • GitHub Check: Lint check
  • GitHub Check: Lint check
  • GitHub Check: Post automodel integration comment / Comment on PR
  • GitHub Check: Post submodule check comment / Comment on PR
🔇 Additional comments (1)
nemo_rl/algorithms/grpo.py (1)

423-426: LGTM! The fix correctly calculates total training iterations.

The updated formula now properly accounts for multiple epochs by multiplying the dataloader length by max_num_epochs. This aligns with the training loop behavior (line 650), which iterates through epochs up to max_num_epochs while respecting the max_num_steps limit. The min() ensures we don't exceed the configured maximum steps.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@ashors1 ashors1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix Yuki!

@terrykong
Copy link
Contributor

tests passed, but setting to docs since i needed to add r0.4.0 label https://github.com/NVIDIA-NeMo/RL/actions/runs/18584732148?pr=1383

@terrykong terrykong added r0.4.0 CI:docs Run doctest and removed CI:L1 Run doctests, unit tests, and functional tests labels Oct 17, 2025
@terrykong terrykong enabled auto-merge (squash) October 17, 2025 16:34
@terrykong terrykong merged commit 905a224 into main Oct 17, 2025
70 of 72 checks passed
@terrykong terrykong deleted the yukih/fix-train-iters branch October 17, 2025 16:41
chtruong814 pushed a commit that referenced this pull request Oct 17, 2025
Signed-off-by: Yuki Huang <yukih@nvidia.com>
Signed-off-by: NeMo Bot <nemo-bot@nvidia.com>
lbliii pushed a commit that referenced this pull request Nov 3, 2025
Signed-off-by: Yuki Huang <yukih@nvidia.com>
Signed-off-by: Lawrence Lane <llane@nvidia.com>
PrinsYin pushed a commit to PrinsYin/RL that referenced this pull request Nov 30, 2025
Signed-off-by: Yuki Huang <yukih@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants