Skip to content

fix: use swap_tensors#1734

Merged
yfw merged 1 commit intotest-fix-4from
yukih/pr-1726
Jan 7, 2026
Merged

fix: use swap_tensors#1734
yfw merged 1 commit intotest-fix-4from
yukih/pr-1726

Conversation

@yuki-97
Copy link
Copy Markdown
Contributor

@yuki-97 yuki-97 commented Jan 7, 2026

  1. v.data = v.data.to(device) will cause the below error for gemma3. https://github.com/NVIDIA-NeMo/RL/pull/1563/changes#r2626166513

    RuntimeError: Attempted to set the storage of a tensor on device "cuda:0" to a storage on different device "cpu". This is no longer allowed; the devices must match.

  2. v = v.to(device) can't update the reference of v, so the old tensor on GPU isn't released, this will cause the below error. https://github.com/NVIDIA-NeMo/RL/actions/runs/20744518055/job/59648725289?pr=1726

    assert current_allocated == 0.0, "Memory should be 0 after refit completed"

torch.utils.swap_tensors(v, v.to(device)) could solve both error above.
image
image

Signed-off-by: Yuki Huang <yukih@nvidia.com>
@yuki-97 yuki-97 requested review from a team as code owners January 7, 2026 09:44
@yuki-97 yuki-97 added the CI:L1 Run doctests, unit tests, and functional tests label Jan 7, 2026
@github-actions
Copy link
Copy Markdown

github-actions bot commented Jan 7, 2026

ℹ️ File Consistency Check

Check based on commit: 9fb51bc (PR #1734 from yukih/pr-1726)

✅ DTensor Policy Worker Synchronization Check

Both DTensor policy worker files were modified in this PR:

  • nemo_rl/models/policy/workers/dtensor_policy_worker.py
  • nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py

Please ensure that the changes are consistent between both files where applicable.


This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.

@yuki-97 yuki-97 changed the title use swap_tensors fix: use swap_tensors Jan 7, 2026
@yuki-97 yuki-97 added CI:L1 Run doctests, unit tests, and functional tests and removed CI:L1 Run doctests, unit tests, and functional tests labels Jan 7, 2026
@yuki-97 yuki-97 changed the base branch from test-fix-4 to main January 7, 2026 10:12
@yuki-97 yuki-97 added CI:L1 Run doctests, unit tests, and functional tests and removed CI:L1 Run doctests, unit tests, and functional tests labels Jan 7, 2026
@yuki-97 yuki-97 changed the base branch from main to test-fix-4 January 7, 2026 14:51
@yfw yfw added the r0.5.0 label Jan 7, 2026
@yfw yfw removed the r0.5.0 label Jan 7, 2026
@yfw yfw merged commit b9b98f4 into test-fix-4 Jan 7, 2026
34 of 36 checks passed
@yfw yfw deleted the yukih/pr-1726 branch January 7, 2026 17:30
sharonyu-115 added a commit to sharonyu-115/RL that referenced this pull request Apr 17, 2026
The Automodel submodule now tracks the fix/gemma4-moe-gate-double-norm
branch on the shuangy fork, which is rebased on upstream main
(bd942f20) and carries only the single MoE-gate double-norm fix plus
its regression tests. This drops the three transformers 5.5 compat
patches that have since landed upstream (NVIDIA-NeMo#1734, NVIDIA-NeMo#1769, NVIDIA-NeMo#1764) and
collapses our carry-stack from four patches down to one.

gemma4-support is preserved on the fork as an A/B fallback — flip
.gitmodules branch + re-checkout the submodule to swap.

Signed-off-by: Shuang Yu <shuangy@nvidia.com>
sharonyu-115 added a commit to sharonyu-115/RL that referenced this pull request Apr 18, 2026
The Automodel submodule now tracks the fix/gemma4-moe-gate-double-norm
branch on the shuangy fork, which is rebased on upstream main
(bd942f20) and carries only the single MoE-gate double-norm fix plus
its regression tests. This drops the three transformers 5.5 compat
patches that have since landed upstream (NVIDIA-NeMo#1734, NVIDIA-NeMo#1769, NVIDIA-NeMo#1764) and
collapses our carry-stack from four patches down to one.

gemma4-support is preserved on the fork as an A/B fallback — flip
.gitmodules branch + re-checkout the submodule to swap.

Signed-off-by: Shuang Yu <shuangy@nvidia.com>
sharonyu-115 added a commit to sharonyu-115/RL that referenced this pull request Apr 18, 2026
The Automodel submodule now tracks the fix/gemma4-moe-gate-double-norm
branch on the shuangy fork, which is rebased on upstream main
(bd942f20) and carries only the single MoE-gate double-norm fix plus
its regression tests. This drops the three transformers 5.5 compat
patches that have since landed upstream (NVIDIA-NeMo#1734, NVIDIA-NeMo#1769, NVIDIA-NeMo#1764) and
collapses our carry-stack from four patches down to one.

gemma4-support is preserved on the fork as an A/B fallback — flip
.gitmodules branch + re-checkout the submodule to swap.

Signed-off-by: Shuang Yu <shuangy@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CI:L1 Run doctests, unit tests, and functional tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants