fix: use swap_tensors by yuki-97 · Pull Request #1734 · NVIDIA-NeMo/RL

yuki-97 · 2026-01-07T09:44:18Z

v.data = v.data.to(device) will cause the below error for gemma3. https://github.com/NVIDIA-NeMo/RL/pull/1563/changes#r2626166513

RuntimeError: Attempted to set the storage of a tensor on device "cuda:0" to a storage on different device "cpu". This is no longer allowed; the devices must match.
v = v.to(device) can't update the reference of v, so the old tensor on GPU isn't released, this will cause the below error. https://github.com/NVIDIA-NeMo/RL/actions/runs/20744518055/job/59648725289?pr=1726

assert current_allocated == 0.0, "Memory should be 0 after refit completed"

torch.utils.swap_tensors(v, v.to(device)) could solve both error above.

Signed-off-by: Yuki Huang <yukih@nvidia.com>

github-actions · 2026-01-07T09:44:40Z

ℹ️ File Consistency Check

Check based on commit: 9fb51bc (PR #1734 from yukih/pr-1726)

✅ DTensor Policy Worker Synchronization Check

Both DTensor policy worker files were modified in this PR:

nemo_rl/models/policy/workers/dtensor_policy_worker.py
nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py

Please ensure that the changes are consistent between both files where applicable.

_{This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.}

The Automodel submodule now tracks the fix/gemma4-moe-gate-double-norm branch on the shuangy fork, which is rebased on upstream main (bd942f20) and carries only the single MoE-gate double-norm fix plus its regression tests. This drops the three transformers 5.5 compat patches that have since landed upstream (NVIDIA-NeMo#1734, NVIDIA-NeMo#1769, NVIDIA-NeMo#1764) and collapses our carry-stack from four patches down to one. gemma4-support is preserved on the fork as an A/B fallback — flip .gitmodules branch + re-checkout the submodule to swap. Signed-off-by: Shuang Yu <shuangy@nvidia.com>

use swap_tensors

9fb51bc

Signed-off-by: Yuki Huang <yukih@nvidia.com>

yuki-97 requested review from a team as code owners January 7, 2026 09:44

yuki-97 added the CI:L1 Run doctests, unit tests, and functional tests label Jan 7, 2026

yuki-97 changed the title ~~use swap_tensors~~ fix: use swap_tensors Jan 7, 2026

yuki-97 added CI:L1 Run doctests, unit tests, and functional tests and removed CI:L1 Run doctests, unit tests, and functional tests labels Jan 7, 2026

yuki-97 changed the base branch from test-fix-4 to main January 7, 2026 10:12

yuki-97 added CI:L1 Run doctests, unit tests, and functional tests and removed CI:L1 Run doctests, unit tests, and functional tests labels Jan 7, 2026

yuki-97 temporarily deployed to nemo-ci January 7, 2026 10:13 — with GitHub Actions Inactive

yuki-97 temporarily deployed to nemo-ci January 7, 2026 10:53 — with GitHub Actions Inactive

yuki-97 temporarily deployed to nemo-ci January 7, 2026 13:42 — with GitHub Actions Inactive

yuki-97 changed the base branch from main to test-fix-4 January 7, 2026 14:51

yfw added the r0.5.0 label Jan 7, 2026

yfw approved these changes Jan 7, 2026

View reviewed changes

yfw removed the r0.5.0 label Jan 7, 2026

yfw merged commit b9b98f4 into test-fix-4 Jan 7, 2026
34 of 36 checks passed

yfw deleted the yukih/pr-1726 branch January 7, 2026 17:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: use swap_tensors#1734

fix: use swap_tensors#1734
yfw merged 1 commit intotest-fix-4from
yukih/pr-1726

yuki-97 commented Jan 7, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Jan 7, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

yuki-97 commented Jan 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Jan 7, 2026

ℹ️ File Consistency Check

✅ DTensor Policy Worker Synchronization Check

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

yuki-97 commented Jan 7, 2026 •

edited

Loading