Add Nemotron 3 Nano 30B multi-node training tutorial by srogawski-nvidia · Pull Request #699 · NVIDIA-NeMo/Gym

srogawski-nvidia · 2026-02-15T00:03:16Z

Summary

Adds a new tutorial for training Nemotron 3 Nano 30B on 32 nodes with GRPO, building on the existing multi-node training guide.

Changes

New tutorial: nemotron-3-nano-30b-multi-node.md for 32-node (256 GPU) training
Updated: multi-node-training.md to link to the new advanced tutorial

Testing

Tested with:

2-node jobs (16 GPUs)
32-node jobs (256 GPUs)
Both completed successfully with proper Ray cluster formation

Documentation Flow

Single Node Training -> Multi-Node Training -> Nemotron 3 Nano 30B (new) -> Custom Environment

Closes #389

Signed-off-by: Sebastian Rogawski <srogawski@nvidia.com>

copy-pr-bot · 2026-02-15T00:03:20Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

bxyu-nvidia

this looks amazing, thank you so much! one minor request

bxyu-nvidia · 2026-02-15T01:22:42Z

+```bash
+# Set workspace directory (adjust to your cluster's large storage)
+# Examples: /scratch/$USER, /work/$USER, /data/$USER, /lustre/.../users/$USER
+export WORKSPACE=/path/to/large/storage/$USER


can we avoid exporting this and instead leave it as a local bash variable?

removed export

Signed-off-by: Sebastian Rogawski <srogawski@nvidia.com>

cwing-nvidia · 2026-02-16T02:59:42Z

Could we move the tutorial to a new toctree section Model Recipes? I think that will improve discoverability

@lbliii can you help offer guidance on tutorial header formatting (goals, duration, pre-reqs etc)

Signed-off-by: Sebastian Rogawski <srogawski@nvidia.com>

…rocessing, safetensors verification Signed-off-by: Sebastian Rogawski <srogawski@nvidia.com>

…0b-tutorial

Signed-off-by: Sebastian Rogawski <srogawski@nvidia.com>

…0b-tutorial Signed-off-by: Sebastian Rogawski <srogawski@nvidia.com>

bxyu-nvidia · 2026-02-17T17:29:35Z

re: Move just the new tutorial doc location from docs/training-tutorials/nemo-rl-grpo/nemotron-3-nano-30b-multi-node.md -> docs/model-recipes/nemotron-3-nano.md

For links, we can keep the Workplace Assistant doc holistically as a prerequisite, just frame it as the entire tutorial

…d GitHub links Signed-off-by: Sebastian Rogawski <srogawski@nvidia.com>

…0b-tutorial Signed-off-by: Sebastian Rogawski <srogawski@nvidia.com>

…tainer.sh) to avoid the subshell export issue Signed-off-by: Sebastian Rogawski <srogawski@nvidia.com>

Signed-off-by: Sebastian Rogawski <srogawski@nvidia.com>

<img width="2560" height="1368" alt="image" src="https://github.com/user-attachments/assets/e849b65e-35f5-48f8-a180-62cf4fae0e8d" /> Signed-off-by: Brian Yu <bxyu@nvidia.com>

## Summary Adds a new tutorial for training Nemotron 3 Nano 30B on 32 nodes with GRPO, building on the existing multi-node training guide. ## Changes **New tutorial**: `nemotron-3-nano-30b-multi-node.md` for 32-node (256 GPU) training **Updated**: `multi-node-training.md` to link to the new advanced tutorial ## Testing Tested with: - 2-node jobs (16 GPUs) - 32-node jobs (256 GPUs) - Both completed successfully with proper Ray cluster formation ## Documentation Flow Single Node Training -> Multi-Node Training -> **Nemotron 3 Nano 30B** (new) -> Custom Environment Closes #389 --------- Signed-off-by: Sebastian Rogawski <srogawski@nvidia.com> Signed-off-by: Brian Yu <bxyu@nvidia.com> Co-authored-by: bxyu-nvidia <bxyu@nvidia.com>

## Summary Adds a new tutorial for training Nemotron 3 Nano 30B on 32 nodes with GRPO, building on the existing multi-node training guide. ## Changes **New tutorial**: `nemotron-3-nano-30b-multi-node.md` for 32-node (256 GPU) training **Updated**: `multi-node-training.md` to link to the new advanced tutorial ## Testing Tested with: - 2-node jobs (16 GPUs) - 32-node jobs (256 GPUs) - Both completed successfully with proper Ray cluster formation ## Documentation Flow Single Node Training -> Multi-Node Training -> **Nemotron 3 Nano 30B** (new) -> Custom Environment Closes #389 --------- Signed-off-by: Sebastian Rogawski <srogawski@nvidia.com> Signed-off-by: Brian Yu <bxyu@nvidia.com> Co-authored-by: bxyu-nvidia <bxyu@nvidia.com> Signed-off-by: Frankie Siino <fsiino@nvidia.com>

## Summary Adds a new tutorial for training Nemotron 3 Nano 30B on 32 nodes with GRPO, building on the existing multi-node training guide. ## Changes **New tutorial**: `nemotron-3-nano-30b-multi-node.md` for 32-node (256 GPU) training **Updated**: `multi-node-training.md` to link to the new advanced tutorial ## Testing Tested with: - 2-node jobs (16 GPUs) - 32-node jobs (256 GPUs) - Both completed successfully with proper Ray cluster formation ## Documentation Flow Single Node Training -> Multi-Node Training -> **Nemotron 3 Nano 30B** (new) -> Custom Environment Closes NVIDIA-NeMo#389 --------- Signed-off-by: Sebastian Rogawski <srogawski@nvidia.com> Signed-off-by: Brian Yu <bxyu@nvidia.com> Co-authored-by: bxyu-nvidia <bxyu@nvidia.com>

Add Nemotron 3 Nano 30B multi-node training tutorial

80fcb90

Signed-off-by: Sebastian Rogawski <srogawski@nvidia.com>

srogawski-nvidia requested a review from bxyu-nvidia February 15, 2026 00:09

bxyu-nvidia requested changes Feb 15, 2026

View reviewed changes

Use local variable instead of export for WORKSPACE

df90f1f

Signed-off-by: Sebastian Rogawski <srogawski@nvidia.com>

srogawski-nvidia requested a review from bxyu-nvidia February 15, 2026 23:32

adds link to SECTION_JOB-STATE-CODES

600316a

Signed-off-by: Sebastian Rogawski <srogawski@nvidia.com>

srogawski-nvidia force-pushed the add-nemotron-3-nano-30b-tutorial branch from b82cd51 to 600316a Compare February 16, 2026 00:38

cwing-nvidia self-requested a review February 16, 2026 02:56

cwing-nvidia reviewed Feb 16, 2026

View reviewed changes

Comment thread docs/training-tutorials/nemo-rl-grpo/nemotron-3-nano-30b-multi-node.md Outdated

jbaczek reviewed Feb 16, 2026

View reviewed changes

Comment thread docs/training-tutorials/nemo-rl-grpo/nemotron-3-nano-30b-multi-node.md Outdated

jbaczek reviewed Feb 16, 2026

View reviewed changes

Comment thread docs/training-tutorials/nemo-rl-grpo/nemotron-3-nano-30b-multi-node.md Outdated

jbaczek reviewed Feb 16, 2026

View reviewed changes

Comment thread docs/model-recipes/nemotron-3-nano.md

jbaczek reviewed Feb 16, 2026

View reviewed changes

Comment thread docs/model-recipes/nemotron-3-nano.md

jbaczek reviewed Feb 16, 2026

View reviewed changes

Comment thread docs/training-tutorials/nemo-rl-grpo/nemotron-3-nano-30b-multi-node.md Outdated

jbaczek reviewed Feb 16, 2026

View reviewed changes

Comment thread docs/training-tutorials/nemo-rl-grpo/nemotron-3-nano-30b-multi-node.md Outdated

srogawski-nvidia added 2 commits February 16, 2026 20:31

Specify exact container version in prerequisites

9cfeceb

Signed-off-by: Sebastian Rogawski <srogawski@nvidia.com>

Address PR feedback: container version, sbatch params, compute node p…

51f7037

…rocessing, safetensors verification Signed-off-by: Sebastian Rogawski <srogawski@nvidia.com>

srogawski-nvidia force-pushed the add-nemotron-3-nano-30b-tutorial branch from 95904d1 to 51f7037 Compare February 17, 2026 05:25

srogawski-nvidia added 2 commits February 16, 2026 21:26

Merge remote-tracking branch 'origin/main' into add-nemotron-3-nano-3…

ec2032e

…0b-tutorial

Make 2-node default config, add instructions for scaling to 32 nodes

15e7293

Signed-off-by: Sebastian Rogawski <srogawski@nvidia.com>

srogawski-nvidia force-pushed the add-nemotron-3-nano-30b-tutorial branch from f6db5a3 to 15e7293 Compare February 17, 2026 16:47

Merge remote-tracking branch 'origin/main' into add-nemotron-3-nano-3…

4917cfc

…0b-tutorial Signed-off-by: Sebastian Rogawski <srogawski@nvidia.com>

bxyu-nvidia reviewed Feb 17, 2026

View reviewed changes

Comment thread docs/training-tutorials/nemo-rl-grpo/nemotron-3-nano-30b-multi-node.md Outdated

bxyu-nvidia reviewed Feb 17, 2026

View reviewed changes

Comment thread docs/training-tutorials/nemo-rl-grpo/nemotron-3-nano-30b-multi-node.md Outdated

srogawski-nvidia added 2 commits February 17, 2026 15:58

Relocate Nemotron 3 Nano 30B to model-recipes, update prerequisite an…

966f011

…d GitHub links Signed-off-by: Sebastian Rogawski <srogawski@nvidia.com>

Merge remote-tracking branch 'origin/main' into add-nemotron-3-nano-3…

8e4b717

…0b-tutorial Signed-off-by: Sebastian Rogawski <srogawski@nvidia.com>

srogawski-nvidia force-pushed the add-nemotron-3-nano-30b-tutorial branch from feb7332 to 8e4b717 Compare February 18, 2026 00:00

srogawski-nvidia added 2 commits February 17, 2026 16:13

Fixed - WORKSPACE is now defined inside prepare_data.sh (and pull_con…

02811aa

…tainer.sh) to avoid the subshell export issue Signed-off-by: Sebastian Rogawski <srogawski@nvidia.com>

Added comment to export HF_TOKEN

e432f55

Signed-off-by: Sebastian Rogawski <srogawski@nvidia.com>

srogawski-nvidia requested review from bxyu-nvidia, cwing-nvidia and jbaczek February 18, 2026 00:30

srogawski-nvidia and others added 2 commits February 18, 2026 19:28

Merge branch 'main' into add-nemotron-3-nano-30b-tutorial

b3d6541

docs: Add index page; populate in sidebar navigation (#728)

61a20a2

<img width="2560" height="1368" alt="image" src="https://github.com/user-attachments/assets/e849b65e-35f5-48f8-a180-62cf4fae0e8d" /> Signed-off-by: Brian Yu <bxyu@nvidia.com>

bxyu-nvidia approved these changes Feb 19, 2026

View reviewed changes

bxyu-nvidia merged commit 5a5b6e1 into main Feb 19, 2026
5 checks passed

bxyu-nvidia deleted the add-nemotron-3-nano-30b-tutorial branch February 19, 2026 04:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Nemotron 3 Nano 30B multi-node training tutorial#699

Add Nemotron 3 Nano 30B multi-node training tutorial#699
bxyu-nvidia merged 14 commits intomainfrom
add-nemotron-3-nano-30b-tutorial

srogawski-nvidia commented Feb 15, 2026 •

edited

Loading

Uh oh!

copy-pr-bot bot commented Feb 15, 2026

Uh oh!

bxyu-nvidia left a comment

Uh oh!

bxyu-nvidia Feb 15, 2026

Uh oh!

srogawski-nvidia Feb 18, 2026

Uh oh!

cwing-nvidia commented Feb 16, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

bxyu-nvidia commented Feb 17, 2026

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

srogawski-nvidia commented Feb 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Testing

Documentation Flow

Uh oh!

copy-pr-bot bot commented Feb 15, 2026

Uh oh!

bxyu-nvidia left a comment

Choose a reason for hiding this comment

Uh oh!

bxyu-nvidia Feb 15, 2026

Choose a reason for hiding this comment

Uh oh!

srogawski-nvidia Feb 18, 2026

Choose a reason for hiding this comment

Uh oh!

cwing-nvidia commented Feb 16, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

bxyu-nvidia commented Feb 17, 2026

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

srogawski-nvidia commented Feb 15, 2026 •

edited

Loading