fix: Release gradient memory after policy training by jseppanen · Pull Request #1147 · NVIDIA-NeMo/RL

jseppanen · 2025-09-17T09:41:32Z

What does this PR do ?

Release gradient memory after policy training to avoid OOM during rollouts.

Issues

Currently the training is likely to OOM at the beginning of 2nd step rollouts because gradient memory is not freed.

Summary by CodeRabbit

Bug Fixes
- Reduced GPU memory usage during multi-rollout training by resetting gradients between rollouts, preventing unintended accumulation.
- Lowers risk of out-of-memory errors and improves stability on long training runs for dtensor policy workers (v1 and v2).
- Slight performance improvement from earlier memory release; training behavior and results remain unchanged.
- No changes to user-facing APIs or configuration.

coderabbitai · 2025-09-17T09:41:41Z

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)

Check name	Status	Explanation
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check	✅ Passed	The title accurately and concisely describes the primary change of adding a gradient memory release after policy training to prevent OOM errors, directly reflecting the modifications in the train methods of the policy workers.

✨ Finishing touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch jseppanen/free-grad-memory

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Copilot

Pull Request Overview

This PR fixes a memory issue by releasing gradient memory after policy training to prevent out-of-memory (OOM) errors during rollouts. The fix addresses OOM problems that occur at the beginning of the second step rollouts when gradient memory accumulates without being freed.

Adds optimizer.zero_grad() calls after training loops to release gradient memory
Prevents OOM errors during subsequent rollout phases

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File	Description
nemo_rl/models/policy/dtensor_policy_worker_v2.py	Adds gradient memory cleanup after training loop completion
nemo_rl/models/policy/dtensor_policy_worker.py	Adds gradient memory cleanup after training loop completion

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

github-actions · 2025-09-17T09:41:57Z

ℹ️ File Consistency Check

Check based on commit: 1571f57 (PR #1147 from jseppanen/free-grad-memory)

This is a test comment

_{This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.}

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (6)

nemo_rl/models/policy/dtensor_policy_worker_v2.py (3)

804-805: Use zero_grad(set_to_none=True) to actually free gradient memory.

This better matches the PR goal and reduces allocator pressure vs writing zeros.

-            # release gradient memory before rollouts
-            self.optimizer.zero_grad()
+            # release gradient memory before rollouts
+            self.optimizer.zero_grad(set_to_none=True)

542-543: Make the earlier zero_grad consistent for memory release.

Same rationale; this prevents carrying grads between rollouts.

-                self.optimizer.zero_grad()
+                self.optimizer.zero_grad(set_to_none=True)

777-780: Guard metric accumulation for dummy microbatches.

num_valid_samples can bleed from a prior mb; only append for real mbs.

-                    if num_valid_samples > 0:
-                        mb_losses.append(loss.item())
-                        all_mb_metrics.append(loss_metrics)
+                    # Only keep metrics for non-dummy microbatches with valid samples
+                    if mb_idx < iterator_len and loss_metrics.get("num_valid_samples", 0) > 0:
+                        mb_losses.append(loss.item())
+                        all_mb_metrics.append(loss_metrics)

nemo_rl/models/policy/dtensor_policy_worker.py (3)

860-861: Use zero_grad(set_to_none=True) for real gradient memory release.

Aligns with the PR intent and PyTorch best practice.

-            # release gradient memory before rollouts
-            self.optimizer.zero_grad()
+            # release gradient memory before rollouts
+            self.optimizer.zero_grad(set_to_none=True)

598-599: Apply set_to_none=True at the start of each rollout as well.

Ensures grads from prior step aren’t retained.

-                self.optimizer.zero_grad()
+                self.optimizer.zero_grad(set_to_none=True)

833-836: Avoid leaking num_valid_samples across dummy microbatches.

Append metrics only for non-dummy mbs.

-                if num_valid_samples > 0:
-                    mb_losses.append(loss.item())
-                    all_mb_metrics.append(loss_metrics)
+                if mb_idx < iterator_len and loss_metrics.get("num_valid_samples", 0) > 0:
+                    mb_losses.append(loss.item())
+                    all_mb_metrics.append(loss_metrics)

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5a9f7ac and 1571f57.

📒 Files selected for processing (2)

nemo_rl/models/policy/dtensor_policy_worker.py (1 hunks)
nemo_rl/models/policy/dtensor_policy_worker_v2.py (1 hunks)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Post submodule check comment / Comment on PR

parthchadha

LGTM, thanks for the fix.

Signed-off-by: Jarno Seppänen <jseppanen@nvidia.com>

github-actions · 2025-10-02T06:11:12Z

ℹ️ File Consistency Check

Check based on commit: cef1c63 (PR #1147 from jseppanen/free-grad-memory)

✅ DTensor Policy Worker Synchronization Check

Both DTensor policy worker files were modified in this PR:

nemo_rl/models/policy/dtensor_policy_worker.py
nemo_rl/models/policy/dtensor_policy_worker_v2.py

Please ensure that the changes are consistent between both files where applicable.

_{This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.}

Signed-off-by: Jarno Seppänen <jseppanen@nvidia.com> Signed-off-by: NeMo Bot <nemo-bot@nvidia.com>

Signed-off-by: Jarno Seppänen <jseppanen@nvidia.com>

Signed-off-by: Jarno Seppänen <jseppanen@nvidia.com> Signed-off-by: yuanhangs <yuanhangs@nvidia.com>

jseppanen requested review from Copilot and ffrujeri September 17, 2025 09:41

Copilot AI reviewed Sep 17, 2025

View reviewed changes

coderabbitai bot reviewed Sep 17, 2025

View reviewed changes

parthchadha approved these changes Sep 17, 2025

View reviewed changes

euronymous-aithal added the r0.4.0 label Sep 17, 2025

ffrujeri approved these changes Sep 19, 2025

View reviewed changes

terrykong added the CI:L1 Run doctests, unit tests, and functional tests label Oct 2, 2025

terrykong had a problem deploying to nemo-ci October 2, 2025 06:06 — with GitHub Actions Error

fix: Release gradient memory before rollouts

cef1c63

Signed-off-by: Jarno Seppänen <jseppanen@nvidia.com>

terrykong force-pushed the jseppanen/free-grad-memory branch from 1571f57 to cef1c63 Compare October 2, 2025 06:10

terrykong requested a review from a team as a code owner October 2, 2025 06:10

terrykong added CI:L1 Run doctests, unit tests, and functional tests and removed CI:L1 Run doctests, unit tests, and functional tests labels Oct 2, 2025

terrykong enabled auto-merge (squash) October 2, 2025 06:10

terrykong temporarily deployed to nemo-ci October 2, 2025 06:11 — with GitHub Actions Inactive

terrykong temporarily deployed to nemo-ci October 2, 2025 06:58 — with GitHub Actions Inactive

terrykong temporarily deployed to nemo-ci October 2, 2025 11:19 — with GitHub Actions Inactive

terrykong merged commit 43928aa into main Oct 2, 2025
40 of 41 checks passed

terrykong deleted the jseppanen/free-grad-memory branch October 2, 2025 12:35

chtruong814 pushed a commit that referenced this pull request Oct 2, 2025

fix: Release gradient memory after policy training (#1147)

a7d3360

Signed-off-by: Jarno Seppänen <jseppanen@nvidia.com> Signed-off-by: NeMo Bot <nemo-bot@nvidia.com>

coderabbitai bot mentioned this pull request Oct 2, 2025

cp: fix: Release gradient memory after policy training (1147) into r0.4.0 #1259

Closed

PrinsYin pushed a commit to PrinsYin/RL that referenced this pull request Nov 30, 2025

fix: Release gradient memory after policy training (NVIDIA-NeMo#1147)

77ffdae

Signed-off-by: Jarno Seppänen <jseppanen@nvidia.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Release gradient memory after policy training#1147

fix: Release gradient memory after policy training#1147
terrykong merged 1 commit intomainfrom
jseppanen/free-grad-memory

jseppanen commented Sep 17, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Sep 17, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

github-actions bot commented Sep 17, 2025

Uh oh!

coderabbitai bot left a comment

Uh oh!

parthchadha left a comment

Uh oh!

github-actions bot commented Oct 2, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Conversation

jseppanen commented Sep 17, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do ?

Issues

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Sep 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pre-merge checks and finishing touches

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

github-actions bot commented Sep 17, 2025

ℹ️ File Consistency Check

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

parthchadha left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Oct 2, 2025

ℹ️ File Consistency Check

✅ DTensor Policy Worker Synchronization Check

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

jseppanen commented Sep 17, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Sep 17, 2025 •

edited

Loading