Skip to content

fix: Release gradient memory after policy training#1147

Merged
terrykong merged 1 commit intomainfrom
jseppanen/free-grad-memory
Oct 2, 2025
Merged

fix: Release gradient memory after policy training#1147
terrykong merged 1 commit intomainfrom
jseppanen/free-grad-memory

Conversation

@jseppanen
Copy link
Contributor

@jseppanen jseppanen commented Sep 17, 2025

What does this PR do ?

Release gradient memory after policy training to avoid OOM during rollouts.

Issues

Currently the training is likely to OOM at the beginning of 2nd step rollouts because gradient memory is not freed.

Summary by CodeRabbit

  • Bug Fixes
    • Reduced GPU memory usage during multi-rollout training by resetting gradients between rollouts, preventing unintended accumulation.
    • Lowers risk of out-of-memory errors and improves stability on long training runs for dtensor policy workers (v1 and v2).
    • Slight performance improvement from earlier memory release; training behavior and results remain unchanged.
    • No changes to user-facing APIs or configuration.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Sep 17, 2025

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title accurately and concisely describes the primary change of adding a gradient memory release after policy training to prevent OOM errors, directly reflecting the modifications in the train methods of the policy workers.
✨ Finishing touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch jseppanen/free-grad-memory

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes a memory issue by releasing gradient memory after policy training to prevent out-of-memory (OOM) errors during rollouts. The fix addresses OOM problems that occur at the beginning of the second step rollouts when gradient memory accumulates without being freed.

  • Adds optimizer.zero_grad() calls after training loops to release gradient memory
  • Prevents OOM errors during subsequent rollout phases

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
nemo_rl/models/policy/dtensor_policy_worker_v2.py Adds gradient memory cleanup after training loop completion
nemo_rl/models/policy/dtensor_policy_worker.py Adds gradient memory cleanup after training loop completion

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@github-actions
Copy link

ℹ️ File Consistency Check

Check based on commit: 1571f57 (PR #1147 from jseppanen/free-grad-memory)

This is a test comment


This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (6)
nemo_rl/models/policy/dtensor_policy_worker_v2.py (3)

804-805: Use zero_grad(set_to_none=True) to actually free gradient memory.

This better matches the PR goal and reduces allocator pressure vs writing zeros.

-            # release gradient memory before rollouts
-            self.optimizer.zero_grad()
+            # release gradient memory before rollouts
+            self.optimizer.zero_grad(set_to_none=True)

542-543: Make the earlier zero_grad consistent for memory release.

Same rationale; this prevents carrying grads between rollouts.

-                self.optimizer.zero_grad()
+                self.optimizer.zero_grad(set_to_none=True)

777-780: Guard metric accumulation for dummy microbatches.

num_valid_samples can bleed from a prior mb; only append for real mbs.

-                    if num_valid_samples > 0:
-                        mb_losses.append(loss.item())
-                        all_mb_metrics.append(loss_metrics)
+                    # Only keep metrics for non-dummy microbatches with valid samples
+                    if mb_idx < iterator_len and loss_metrics.get("num_valid_samples", 0) > 0:
+                        mb_losses.append(loss.item())
+                        all_mb_metrics.append(loss_metrics)
nemo_rl/models/policy/dtensor_policy_worker.py (3)

860-861: Use zero_grad(set_to_none=True) for real gradient memory release.

Aligns with the PR intent and PyTorch best practice.

-            # release gradient memory before rollouts
-            self.optimizer.zero_grad()
+            # release gradient memory before rollouts
+            self.optimizer.zero_grad(set_to_none=True)

598-599: Apply set_to_none=True at the start of each rollout as well.

Ensures grads from prior step aren’t retained.

-                self.optimizer.zero_grad()
+                self.optimizer.zero_grad(set_to_none=True)

833-836: Avoid leaking num_valid_samples across dummy microbatches.

Append metrics only for non-dummy mbs.

-                if num_valid_samples > 0:
-                    mb_losses.append(loss.item())
-                    all_mb_metrics.append(loss_metrics)
+                if mb_idx < iterator_len and loss_metrics.get("num_valid_samples", 0) > 0:
+                    mb_losses.append(loss.item())
+                    all_mb_metrics.append(loss_metrics)
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5a9f7ac and 1571f57.

📒 Files selected for processing (2)
  • nemo_rl/models/policy/dtensor_policy_worker.py (1 hunks)
  • nemo_rl/models/policy/dtensor_policy_worker_v2.py (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Post submodule check comment / Comment on PR

Copy link
Contributor

@parthchadha parthchadha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for the fix.

@terrykong terrykong added the CI:L1 Run doctests, unit tests, and functional tests label Oct 2, 2025
Signed-off-by: Jarno Seppänen <jseppanen@nvidia.com>
@terrykong terrykong force-pushed the jseppanen/free-grad-memory branch from 1571f57 to cef1c63 Compare October 2, 2025 06:10
@terrykong terrykong requested a review from a team as a code owner October 2, 2025 06:10
@terrykong terrykong added CI:L1 Run doctests, unit tests, and functional tests and removed CI:L1 Run doctests, unit tests, and functional tests labels Oct 2, 2025
@terrykong terrykong enabled auto-merge (squash) October 2, 2025 06:10
@github-actions
Copy link

github-actions bot commented Oct 2, 2025

ℹ️ File Consistency Check

Check based on commit: cef1c63 (PR #1147 from jseppanen/free-grad-memory)

✅ DTensor Policy Worker Synchronization Check

Both DTensor policy worker files were modified in this PR:

  • nemo_rl/models/policy/dtensor_policy_worker.py
  • nemo_rl/models/policy/dtensor_policy_worker_v2.py

Please ensure that the changes are consistent between both files where applicable.


This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.

@terrykong terrykong merged commit 43928aa into main Oct 2, 2025
40 of 41 checks passed
@terrykong terrykong deleted the jseppanen/free-grad-memory branch October 2, 2025 12:35
chtruong814 pushed a commit that referenced this pull request Oct 2, 2025
Signed-off-by: Jarno Seppänen <jseppanen@nvidia.com>
Signed-off-by: NeMo Bot <nemo-bot@nvidia.com>
PrinsYin pushed a commit to PrinsYin/RL that referenced this pull request Nov 30, 2025
Signed-off-by: Jarno Seppänen <jseppanen@nvidia.com>
yuanhangsu1986 pushed a commit to yuanhangsu1986/RL-Nemontron-Edge-Omni that referenced this pull request Feb 21, 2026
Signed-off-by: Jarno Seppänen <jseppanen@nvidia.com>
Signed-off-by: yuanhangs <yuanhangs@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CI:L1 Run doctests, unit tests, and functional tests r0.4.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants