Add Nemotron 3 Nano 30B multi-node training tutorial#699
Merged
bxyu-nvidia merged 14 commits intomainfrom Feb 19, 2026
Merged
Conversation
Signed-off-by: Sebastian Rogawski <srogawski@nvidia.com>
bxyu-nvidia
requested changes
Feb 15, 2026
Contributor
bxyu-nvidia
left a comment
There was a problem hiding this comment.
this looks amazing, thank you so much! one minor request
| ```bash | ||
| # Set workspace directory (adjust to your cluster's large storage) | ||
| # Examples: /scratch/$USER, /work/$USER, /data/$USER, /lustre/.../users/$USER | ||
| export WORKSPACE=/path/to/large/storage/$USER |
Contributor
There was a problem hiding this comment.
can we avoid exporting this and instead leave it as a local bash variable?
Contributor
Author
There was a problem hiding this comment.
removed export
Signed-off-by: Sebastian Rogawski <srogawski@nvidia.com>
Signed-off-by: Sebastian Rogawski <srogawski@nvidia.com>
b82cd51 to
600316a
Compare
Contributor
|
Could we move the tutorial to a new toctree section
@lbliii can you help offer guidance on tutorial header formatting (goals, duration, pre-reqs etc) |
jbaczek
reviewed
Feb 16, 2026
jbaczek
reviewed
Feb 16, 2026
jbaczek
reviewed
Feb 16, 2026
jbaczek
reviewed
Feb 16, 2026
jbaczek
reviewed
Feb 16, 2026
jbaczek
reviewed
Feb 16, 2026
Signed-off-by: Sebastian Rogawski <srogawski@nvidia.com>
…rocessing, safetensors verification Signed-off-by: Sebastian Rogawski <srogawski@nvidia.com>
95904d1 to
51f7037
Compare
Signed-off-by: Sebastian Rogawski <srogawski@nvidia.com>
f6db5a3 to
15e7293
Compare
…0b-tutorial Signed-off-by: Sebastian Rogawski <srogawski@nvidia.com>
bxyu-nvidia
reviewed
Feb 17, 2026
Contributor
|
re: Move just the new tutorial doc location from docs/training-tutorials/nemo-rl-grpo/nemotron-3-nano-30b-multi-node.md -> docs/model-recipes/nemotron-3-nano.md For links, we can keep the Workplace Assistant doc holistically as a prerequisite, just frame it as the entire tutorial |
bxyu-nvidia
reviewed
Feb 17, 2026
…d GitHub links Signed-off-by: Sebastian Rogawski <srogawski@nvidia.com>
…0b-tutorial Signed-off-by: Sebastian Rogawski <srogawski@nvidia.com>
feb7332 to
8e4b717
Compare
…tainer.sh) to avoid the subshell export issue Signed-off-by: Sebastian Rogawski <srogawski@nvidia.com>
Signed-off-by: Sebastian Rogawski <srogawski@nvidia.com>
<img width="2560" height="1368" alt="image" src="https://github.com/user-attachments/assets/e849b65e-35f5-48f8-a180-62cf4fae0e8d" /> Signed-off-by: Brian Yu <bxyu@nvidia.com>
bxyu-nvidia
approved these changes
Feb 19, 2026
fsiino-nvidia
pushed a commit
that referenced
this pull request
Feb 21, 2026
## Summary Adds a new tutorial for training Nemotron 3 Nano 30B on 32 nodes with GRPO, building on the existing multi-node training guide. ## Changes **New tutorial**: `nemotron-3-nano-30b-multi-node.md` for 32-node (256 GPU) training **Updated**: `multi-node-training.md` to link to the new advanced tutorial ## Testing Tested with: - 2-node jobs (16 GPUs) - 32-node jobs (256 GPUs) - Both completed successfully with proper Ray cluster formation ## Documentation Flow Single Node Training -> Multi-Node Training -> **Nemotron 3 Nano 30B** (new) -> Custom Environment Closes #389 --------- Signed-off-by: Sebastian Rogawski <srogawski@nvidia.com> Signed-off-by: Brian Yu <bxyu@nvidia.com> Co-authored-by: bxyu-nvidia <bxyu@nvidia.com>
fsiino-nvidia
pushed a commit
that referenced
this pull request
Feb 21, 2026
## Summary Adds a new tutorial for training Nemotron 3 Nano 30B on 32 nodes with GRPO, building on the existing multi-node training guide. ## Changes **New tutorial**: `nemotron-3-nano-30b-multi-node.md` for 32-node (256 GPU) training **Updated**: `multi-node-training.md` to link to the new advanced tutorial ## Testing Tested with: - 2-node jobs (16 GPUs) - 32-node jobs (256 GPUs) - Both completed successfully with proper Ray cluster formation ## Documentation Flow Single Node Training -> Multi-Node Training -> **Nemotron 3 Nano 30B** (new) -> Custom Environment Closes #389 --------- Signed-off-by: Sebastian Rogawski <srogawski@nvidia.com> Signed-off-by: Brian Yu <bxyu@nvidia.com> Co-authored-by: bxyu-nvidia <bxyu@nvidia.com>
fsiino-nvidia
pushed a commit
that referenced
this pull request
Feb 21, 2026
## Summary Adds a new tutorial for training Nemotron 3 Nano 30B on 32 nodes with GRPO, building on the existing multi-node training guide. ## Changes **New tutorial**: `nemotron-3-nano-30b-multi-node.md` for 32-node (256 GPU) training **Updated**: `multi-node-training.md` to link to the new advanced tutorial ## Testing Tested with: - 2-node jobs (16 GPUs) - 32-node jobs (256 GPUs) - Both completed successfully with proper Ray cluster formation ## Documentation Flow Single Node Training -> Multi-Node Training -> **Nemotron 3 Nano 30B** (new) -> Custom Environment Closes #389 --------- Signed-off-by: Sebastian Rogawski <srogawski@nvidia.com> Signed-off-by: Brian Yu <bxyu@nvidia.com> Co-authored-by: bxyu-nvidia <bxyu@nvidia.com> Signed-off-by: Frankie Siino <fsiino@nvidia.com>
abubakaria56
pushed a commit
to abubakaria56/Gym
that referenced
this pull request
Mar 2, 2026
## Summary Adds a new tutorial for training Nemotron 3 Nano 30B on 32 nodes with GRPO, building on the existing multi-node training guide. ## Changes **New tutorial**: `nemotron-3-nano-30b-multi-node.md` for 32-node (256 GPU) training **Updated**: `multi-node-training.md` to link to the new advanced tutorial ## Testing Tested with: - 2-node jobs (16 GPUs) - 32-node jobs (256 GPUs) - Both completed successfully with proper Ray cluster formation ## Documentation Flow Single Node Training -> Multi-Node Training -> **Nemotron 3 Nano 30B** (new) -> Custom Environment Closes NVIDIA-NeMo#389 --------- Signed-off-by: Sebastian Rogawski <srogawski@nvidia.com> Signed-off-by: Brian Yu <bxyu@nvidia.com> Co-authored-by: bxyu-nvidia <bxyu@nvidia.com>
abubakaria56
pushed a commit
to abubakaria56/Gym
that referenced
this pull request
Mar 2, 2026
## Summary Adds a new tutorial for training Nemotron 3 Nano 30B on 32 nodes with GRPO, building on the existing multi-node training guide. ## Changes **New tutorial**: `nemotron-3-nano-30b-multi-node.md` for 32-node (256 GPU) training **Updated**: `multi-node-training.md` to link to the new advanced tutorial ## Testing Tested with: - 2-node jobs (16 GPUs) - 32-node jobs (256 GPUs) - Both completed successfully with proper Ray cluster formation ## Documentation Flow Single Node Training -> Multi-Node Training -> **Nemotron 3 Nano 30B** (new) -> Custom Environment Closes NVIDIA-NeMo#389 --------- Signed-off-by: Sebastian Rogawski <srogawski@nvidia.com> Signed-off-by: Brian Yu <bxyu@nvidia.com> Co-authored-by: bxyu-nvidia <bxyu@nvidia.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

Summary
Adds a new tutorial for training Nemotron 3 Nano 30B on 32 nodes with GRPO, building on the existing multi-node training guide.
Changes
New tutorial:
nemotron-3-nano-30b-multi-node.mdfor 32-node (256 GPU) trainingUpdated:
multi-node-training.mdto link to the new advanced tutorialTesting
Tested with:
Documentation Flow
Single Node Training -> Multi-Node Training -> Nemotron 3 Nano 30B (new) -> Custom Environment
Closes #389