Skip to content

Conversation

@scsudhakaran
Copy link
Collaborator

No description provided.

@scsudhakaran scsudhakaran requested a review from ashbhandare May 22, 2025 08:51
Copy link
Collaborator

@ashbhandare ashbhandare left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Signed-off-by: Sanju C Sudhakaran <scsudhakaran@nvidia.com>
@scsudhakaran scsudhakaran force-pushed the scsudhakaran/nemotron-h branch from 7e75d8a to 3c4f81d Compare May 28, 2025 18:39
@ashbhandare ashbhandare merged commit e0e7da0 into llmb-nemo-r2.3.0 May 28, 2025
32 checks passed
@ashbhandare ashbhandare deleted the scsudhakaran/nemotron-h branch May 28, 2025 20:45
nv-mollys pushed a commit that referenced this pull request Jul 8, 2025
Signed-off-by: Sanju C Sudhakaran <scsudhakaran@nvidia.com>
nv-mollys added a commit that referenced this pull request Sep 3, 2025
* Set attention backend to "auto" for Nemotron-H (#14042)

Signed-off-by: Sanju C Sudhakaran <scsudhakaran@nvidia.com>

* Adding TFLOPS per GPU Support for Finetuning (#14048)

Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com>

* Enable optimizations for Nemotron-H (#13915)

Signed-off-by: Sanju C Sudhakaran <scsudhakaran@nvidia.com>

* Disable checkpointing for Nemotron-H (#14001)

Signed-off-by: Sanju C Sudhakaran <scsudhakaran@nvidia.com>

* Cherry pick ea4b47f (#13896)

* perf scripts updates (#13456)

* gb200 recommended cfgs csv fix

Signed-off-by: Malay Nagda <malayn@nvidia.com>

* 495b h100 fix

Signed-off-by: Malay Nagda <malayn@nvidia.com>

* gb200 79b bf16 20 layers recompute

Signed-off-by: Malay Nagda <malayn@nvidia.com>

* csv format fix

Signed-off-by: Malay Nagda <malayn@nvidia.com>

* csv format fix

Signed-off-by: Malay Nagda <malayn@nvidia.com>

* 70b, 340b no fsdp

Signed-off-by: Malay Nagda <malayn@nvidia.com>

* dsv3 perf mode

Signed-off-by: Malay Nagda <malayn@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: malay-nagda <malay-nagda@users.noreply.github.com>

* dsv3 perf mode peft

Signed-off-by: Malay Nagda <malayn@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: malay-nagda <malay-nagda@users.noreply.github.com>

* dsv3 callback

Signed-off-by: Malay Nagda <malayn@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: malay-nagda <malay-nagda@users.noreply.github.com>

* dsv3 callback

Signed-off-by: Malay Nagda <malayn@nvidia.com>

* cudagraphs

Signed-off-by: Malay Nagda <malayn@nvidia.com>

---------

Signed-off-by: Malay Nagda <malayn@nvidia.com>
Signed-off-by: malay-nagda <malay-nagda@users.noreply.github.com>
Signed-off-by: malay-nagda <malayn@nvidia.com>
Co-authored-by: malay-nagda <malay-nagda@users.noreply.github.com>

* import missing callbacks in deepseek recipe

---------

Signed-off-by: Malay Nagda <malayn@nvidia.com>
Signed-off-by: malay-nagda <malay-nagda@users.noreply.github.com>
Signed-off-by: malay-nagda <malayn@nvidia.com>
Co-authored-by: malay-nagda <malayn@nvidia.com>
Co-authored-by: malay-nagda <malay-nagda@users.noreply.github.com>
Co-authored-by: mollys <mollys@mollys.nvidia.com>

* Onboard LLAMA4 Maverick Finetuning(SFT) with SQUAD Dataset Download Fix (#13926)

* Onboard LLAMA4 Maverick Finetuning(SFT) with SQUAD Dataset Download Fix

Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com>

* Tweaks for llama4_e128

Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com>

* Adding flags for skipping the separate SLURM jobs

Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com>

* Arg parse changes and tweaks to remove squad dataset check

Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com>

* Reverting the args for separate SLURM jobs as there is a dependency (run.Partial) with the finetune job

Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com>

* Removing NullTokenizer due to compatability

Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com>

* Separate args to have control over the 3 SLURM jobs

Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com>

* Enabling TokenDropCallback and tp_comm_overlap

Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com>

* Changes to introduce flags for enabling/disabling the 3 SLURM jobs

Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com>

* Changing the exp_name based on the SLURM job being run

Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com>

* Argparse Changes

Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com>

* Fix for standalone checkpoint and dataload jobs

Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com>

* Removing NUMA Factor error for dataset and checkpoint download job

Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com>

* Fix for CUDA Graph error in this version

Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com>

* Changes to peft_scheme

Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com>

* Changes to set peft_scheme to None

Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com>

* Replacing the file name from finetune_ to sft_

Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com>

* Updates to exp_name format

Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com>

* Reverting defaults for --finetuning arg to also include lora

Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com>

* Tweak comment(s) in the finetuine llama4 e128 script

Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com>

* Reverting the recommended config order change in for b200

Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: rhmukundan <rhmukundan@users.noreply.github.com>

---------

Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com>
Signed-off-by: rhmukundan <rhmukundan@users.noreply.github.com>
Co-authored-by: rhmukundan <rhmukundan@users.noreply.github.com>

* Add profiling changes (#13484)

* add profiling changes

Signed-off-by: Aishwarya Bhandare <abhandare@login-eos01.eos.clusters.nvidia.com>

* Apply isort and black reformatting

Signed-off-by: ashbhandare <ashbhandare@users.noreply.github.com>

* More model changes

Signed-off-by: Aishwarya Bhandare <abhandare@login-eos01.eos.clusters.nvidia.com>

* Apply isort and black reformatting

Signed-off-by: ashbhandare <ashbhandare@users.noreply.github.com>

---------

Signed-off-by: Aishwarya Bhandare <abhandare@login-eos01.eos.clusters.nvidia.com>
Signed-off-by: ashbhandare <ashbhandare@users.noreply.github.com>
Co-authored-by: Aishwarya Bhandare <abhandare@login-eos01.eos.clusters.nvidia.com>
Co-authored-by: ashbhandare <ashbhandare@users.noreply.github.com>

* Port nemotron 25.04 patch to r3.2.0 based llmb-nemo (#13533)

* port nemotron patch to r3.2.0 based llmb-nemo

Signed-off-by: Barys Dubauski <bdubauski@nvdia.com>

* update template for experiment names

Signed-off-by: Barys Dubauski <bdubauski@nvdia.com>

* review based updates

Signed-off-by: Barys Dubauski <bdubauski@nvdia.com>

---------

Signed-off-by: Barys Dubauski <bdubauski@nvdia.com>
Co-authored-by: Barys Dubauski <bdubauski@nvdia.com>

* Port run-ai patch to llmb-nemo branch (#13573)

* Port run-ai patch to llmb-nemo branch

Signed-off-by: Barys Dubauski <bdubauski@nvdia.com>

* Apply isort and black reformatting

Signed-off-by: bdubauski <bdubauski@users.noreply.github.com>

---------

Signed-off-by: Barys Dubauski <bdubauski@nvdia.com>
Signed-off-by: bdubauski <bdubauski@users.noreply.github.com>
Co-authored-by: Barys Dubauski <bdubauski@nvdia.com>
Co-authored-by: bdubauski <bdubauski@users.noreply.github.com>

* Add grok recipe (#13586)

* Add grok recipe

Signed-off-by: mollys <mollys@mollys.nvidia.com>

* Apply isort and black reformatting

Signed-off-by: nv-mollys <nv-mollys@users.noreply.github.com>
Signed-off-by: mollys <mollys@mollys.nvidia.com>

* Add copyright header

Signed-off-by: mollys <mollys@mollys.nvidia.com>

---------

Signed-off-by: mollys <mollys@mollys.nvidia.com>
Signed-off-by: nv-mollys <nv-mollys@users.noreply.github.com>
Co-authored-by: mollys <mollys@mollys.nvidia.com>
Co-authored-by: nv-mollys <nv-mollys@users.noreply.github.com>

* transformers_offline=0 and profile changes to llama3.1 405b (#13655)

* transformers_offline=0 and profile changes to llama3.1 405b

Signed-off-by: Sebastian Alberdi <salberdi@salberdi-mlt.client.nvidia.com>

* Apply isort and black reformatting

Signed-off-by: salberdi-nvidia <salberdi-nvidia@users.noreply.github.com>

* nccl added

Signed-off-by: Sebastian Alberdi <salberdi@salberdi-mlt.client.nvidia.com>

* Apply isort and black reformatting

Signed-off-by: salberdi-nvidia <salberdi-nvidia@users.noreply.github.com>

---------

Signed-off-by: Sebastian Alberdi <salberdi@salberdi-mlt.client.nvidia.com>
Signed-off-by: salberdi-nvidia <salberdi-nvidia@users.noreply.github.com>
Co-authored-by: Sebastian Alberdi <salberdi@login-preos02.a51.clusters.nvidia.com>
Co-authored-by: salberdi-nvidia <salberdi-nvidia@users.noreply.github.com>
Co-authored-by: Sebastian Alberdi <salberdi@login-ptyche01.ptyche.clusters.nvidia.com>

* Add perf recipe script for Nemotron-H-56B (#13691)

Signed-off-by: Sanju C Sudhakaran <scsudhakaran@nvidia.com>

* Pretraining Deepseek changes for LLMB (#13752)

* working changes

Signed-off-by: ashbhandare <abhandare@nvidia.com>

* cleanup

Signed-off-by: ashbhandare <abhandare@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: ashbhandare <ashbhandare@users.noreply.github.com>
Signed-off-by: ashbhandare <abhandare@nvidia.com>

* make profiling steps overridable

Signed-off-by: ashbhandare <abhandare@nvidia.com>

* add nccl trace ability, cleanup

Signed-off-by: ashbhandare <abhandare@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: ashbhandare <ashbhandare@users.noreply.github.com>
Signed-off-by: ashbhandare <abhandare@nvidia.com>

---------

Signed-off-by: ashbhandare <abhandare@nvidia.com>
Signed-off-by: ashbhandare <ashbhandare@users.noreply.github.com>
Co-authored-by: Aishwarya Bhandare <abhandare@login-ptyche01.ptyche.clusters.nvidia.com>
Co-authored-by: ashbhandare <ashbhandare@users.noreply.github.com>

* Adding FP8 Default Configs for LLAMA4 Maverick (#13698)

Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com>

* Changing the tokenizer from Scout to Maverick in the pretrain LLAMA4 LLM Recipe (#13664)

Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com>

* Tweaking LLama4 Maverick PreTrain file to adapt to the user configs parameter format (#13690)

Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com>

* Grok Nvbug 5311566 (#13765)

* remove unnecessary nemo root check

* remove comment and unused packages

---------

Co-authored-by: mollys <mollys@mollys.nvidia.com>

* Grok nccl trace fix (#13769)

* remove unnecessary nemo root check

* remove comment and unused packages

* transformers online

* fix env vars

* setting transformers offline here doesn't work

---------

Co-authored-by: mollys <mollys@mollys.nvidia.com>

* Fix for config params in pretrain llama4 e128 (#13764)

* Fix for config params in pretrain llama4 e128

Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com>

* Ignoring unrelated configs

Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com>

* Cleanup of configs

Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com>

* Adding all the params in get_user_configs func

Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com>

---------

Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com>

* Nsys Tweaks to llama4 pretrain  (#13778)

* Removign hardcoding of nsys profiling ranges

Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com>

* Adding NCCL Trace support for pretrain recipe (llama4)

Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: rhmukundan <rhmukundan@users.noreply.github.com>

---------

Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com>
Signed-off-by: rhmukundan <rhmukundan@users.noreply.github.com>
Co-authored-by: rhmukundan <rhmukundan@users.noreply.github.com>

* Disable checkpointing for Nemotron-H (#13786)

* Disable checkpointing for Nemotron-H

Signed-off-by: Sanju C Sudhakaran <scsudhakaran@nvidia.com>

* Nemotron-H NCCL trace support

Signed-off-by: Sanju C Sudhakaran <scsudhakaran@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: scsudhakaran <scsudhakaran@users.noreply.github.com>

---------

Signed-off-by: Sanju C Sudhakaran <scsudhakaran@nvidia.com>
Signed-off-by: scsudhakaran <scsudhakaran@users.noreply.github.com>
Co-authored-by: scsudhakaran <scsudhakaran@users.noreply.github.com>

* Llmb nemo r2.3.0 (#13806)

* set NCCL_NET_GDR_LEVEL=PHB for deepseekv3, grok1_314b, llama31_405b, llama4_e128, nemotron4_15b+340b, nemotronh_56b

Signed-off-by: Sebastian Alberdi <salberdi@login-ptyche01.ptyche.clusters.nvidia.com>

* Apply isort and black reformatting

Signed-off-by: salberdi-nvidia <salberdi-nvidia@users.noreply.github.com>

* Apply isort and black reformatting

Signed-off-by: salberdi-nvidia <salberdi-nvidia@users.noreply.github.com>

---------

Signed-off-by: Sebastian Alberdi <salberdi@login-ptyche01.ptyche.clusters.nvidia.com>
Signed-off-by: salberdi-nvidia <salberdi-nvidia@users.noreply.github.com>
Co-authored-by: Sebastian Alberdi <salberdi@login-ptyche01.ptyche.clusters.nvidia.com>
Co-authored-by: salberdi-nvidia <salberdi-nvidia@users.noreply.github.com>

* add all environment variables to container environment (#13808)

Co-authored-by: mollys <mollys@mollys.nvidia.com>

* fix numactl (#13809)

Co-authored-by: mollys <mollys@mollys.nvidia.com>

* Llmb nemo r2.3.0 (#13807)

* set NCCL_NET_GDR_LEVEL=PHB for deepseekv3, grok1_314b, llama31_405b, llama4_e128, nemotron4_15b+340b, nemotronh_56b

Signed-off-by: Sebastian Alberdi <salberdi@login-ptyche01.ptyche.clusters.nvidia.com>

* Apply isort and black reformatting

Signed-off-by: salberdi-nvidia <salberdi-nvidia@users.noreply.github.com>

* Apply isort and black reformatting

Signed-off-by: salberdi-nvidia <salberdi-nvidia@users.noreply.github.com>

* made experiment naming match standard

Signed-off-by: Sebastian Alberdi <salberdi@login-ptyche01.ptyche.clusters.nvidia.com>

* Apply isort and black reformatting

Signed-off-by: salberdi-nvidia <salberdi-nvidia@users.noreply.github.com>

* standardized exp_name for relevant workloads

Signed-off-by: Sebastian Alberdi <salberdi@login-ptyche01.ptyche.clusters.nvidia.com>

* Apply isort and black reformatting

Signed-off-by: salberdi-nvidia <salberdi-nvidia@users.noreply.github.com>

---------

Signed-off-by: Sebastian Alberdi <salberdi@login-ptyche01.ptyche.clusters.nvidia.com>
Signed-off-by: salberdi-nvidia <salberdi-nvidia@users.noreply.github.com>
Co-authored-by: Sebastian Alberdi <salberdi@login-ptyche01.ptyche.clusters.nvidia.com>
Co-authored-by: salberdi-nvidia <salberdi-nvidia@users.noreply.github.com>

* fixing QA checkpoint bug for nemotron4 (#13843)

* fixing QA checkpoint bug for nemotron4

* Apply isort and black reformatting

Signed-off-by: sshiddib <sshiddib@users.noreply.github.com>

* arg name change

* Apply isort and black reformatting

Signed-off-by: sshiddib <sshiddib@users.noreply.github.com>

---------

Signed-off-by: sshiddib <sshiddib@users.noreply.github.com>
Co-authored-by: Sharada Shiddibhavi <sshiddibhavi@cw-dfw-cs-001-vscode-02.cm.cluster>
Co-authored-by: sshiddib <sshiddib@users.noreply.github.com>

* Add gpu metrics option (#13882)

* gpu metrics option

Signed-off-by: ashbhandare <abhandare@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: ashbhandare <ashbhandare@users.noreply.github.com>

* specify nemo run commit

Signed-off-by: ashbhandare <abhandare@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: artbataev <artbataev@users.noreply.github.com>

* Fix linting error

Signed-off-by: ashbhandare <abhandare@nvidia.com>

---------

Signed-off-by: ashbhandare <abhandare@nvidia.com>
Signed-off-by: ashbhandare <ashbhandare@users.noreply.github.com>
Signed-off-by: artbataev <artbataev@users.noreply.github.com>
Co-authored-by: ashbhandare <ashbhandare@users.noreply.github.com>
Co-authored-by: artbataev <artbataev@users.noreply.github.com>

* LLAMA4 Maverick SFT Recipe + SQUAD Dataset Download Error Fix

Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com>

* Revert "LLAMA4 Maverick SFT Recipe + SQUAD Dataset Download Error Fix"

This reverts commit 755fd36.

* fix nemo/collections/llm/recipes/__init__.py

* fix nemo/collections/llm/recipes/deepseek_v3.py

* new line

* fix nemo/collections/llm/recipes/llama4_e128.py

* fix scripts/performance/llm/finetune_llama4_e128.py

* small updates for grok

* modified:   scripts/performance/llm/pretrain_grok1_314b.py
        modified:   scripts/performance/llm/pretrain_nemotron4_340b.py

* manually add util changes to helpers.py and executors.py

* Fix in Nemotron-H script (#14251)

* Fix in Nemotron-H script

Signed-off-by: Sanju C Sudhakaran <scsudhakaran@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: scsudhakaran <scsudhakaran@users.noreply.github.com>

* Apply isort and black reformatting

Signed-off-by: artbataev <artbataev@users.noreply.github.com>

* Fix in Nemotron-H perf script

Signed-off-by: Sanju C Sudhakaran <scsudhakaran@nvidia.com>

---------

Signed-off-by: Sanju C Sudhakaran <scsudhakaran@nvidia.com>
Signed-off-by: scsudhakaran <scsudhakaran@users.noreply.github.com>
Signed-off-by: artbataev <artbataev@users.noreply.github.com>
Co-authored-by: scsudhakaran <scsudhakaran@users.noreply.github.com>
Co-authored-by: artbataev <artbataev@users.noreply.github.com>

* updated with some things from NeMo main (double_buffer) (#14305)

* updated with some things from NeMo main (double_buffer)

Signed-off-by: Sebastian Alberdi <salberdi@login-ptyche01.ptyche.clusters.nvidia.com>

* Apply isort and black reformatting

Signed-off-by: salberdi-nvidia <salberdi-nvidia@users.noreply.github.com>

* Apply isort and black reformatting

Signed-off-by: artbataev <artbataev@users.noreply.github.com>

---------

Signed-off-by: Sebastian Alberdi <salberdi@login-ptyche01.ptyche.clusters.nvidia.com>
Signed-off-by: salberdi-nvidia <salberdi-nvidia@users.noreply.github.com>
Signed-off-by: artbataev <artbataev@users.noreply.github.com>
Co-authored-by: Sebastian Alberdi <salberdi@login-ptyche01.ptyche.clusters.nvidia.com>
Co-authored-by: salberdi-nvidia <salberdi-nvidia@users.noreply.github.com>
Co-authored-by: artbataev <artbataev@users.noreply.github.com>

* Took out cudNN lines b/c of regression with cuDNN normalization kernel (#14360)

* added conditional cudnn to align with nemo main (#14324)

* added conditional cudnn to align with nemo main

Signed-off-by: Sebastian Alberdi <salberdi@cw-dfw-cs-001-login-02.cm.cluster>

* fixed num optimizer instances bug

Signed-off-by: Sebastian Alberdi <salberdi@login-ptyche01.ptyche.clusters.nvidia.com>

---------

Signed-off-by: Sebastian Alberdi <salberdi@cw-dfw-cs-001-login-02.cm.cluster>
Signed-off-by: Sebastian Alberdi <salberdi@login-ptyche01.ptyche.clusters.nvidia.com>
Co-authored-by: Sebastian Alberdi <salberdi@cw-dfw-cs-001-login-02.cm.cluster>
Co-authored-by: Sebastian Alberdi <salberdi@login-ptyche01.ptyche.clusters.nvidia.com>

* Adds pyxis container writable and no mount home flags (#14386)

* Add pyxis flags for writable and no-mount home.

Signed-off-by: Alex Filby <afilby@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: sudostock <sudostock@users.noreply.github.com>

---------

Signed-off-by: Alex Filby <afilby@nvidia.com>
Signed-off-by: sudostock <sudostock@users.noreply.github.com>
Co-authored-by: sudostock <sudostock@users.noreply.github.com>

* Update DeepSeek-V3 perf scripts (#14377)

* Fix callbacks in DSV3 script (#14350)

Signed-off-by: Guyue Huang <guyueh@nvidia.com>
Signed-off-by: Sanju C Sudhakaran <scsudhakaran@nvidia.com>

* Changes to grok to alleviate error: TypeError: '>' not supported betw… (#14326)

* Changes to grok to alleviate error: TypeError: '>' not supported between instances of 'str' and 'int'

* Apply isort and black reformatting

Signed-off-by: rsalagame-nvidia <rsalagame-nvidia@users.noreply.github.com>

* Made the changes where it's not default values hard coded. User can change thru cli

* Apply isort and black reformatting

Signed-off-by: rsalagame-nvidia <rsalagame-nvidia@users.noreply.github.com>

* made suggested changes. Verified successful.

* Apply isort and black reformatting

Signed-off-by: rsalagame-nvidia <rsalagame-nvidia@users.noreply.github.com>

* Made suggested change.

---------

Signed-off-by: rsalagame-nvidia <rsalagame-nvidia@users.noreply.github.com>
Co-authored-by: rsalagame-nvidia <rsalagame-nvidia@users.noreply.github.com>

* Make VBoost activation conditional  (#14453)

* Refactor performance scripts to use build_perf_env_plugin function

* Replaced direct instantiation of PerfEnvPlugin with build_perf_env_plugin in multiple LLM finetuning and pretraining scripts for consistency and maintainability.
* Added build_perf_env_plugin function to helpers.py to streamline performance environment setup based on GPU and pipeline parallelism settings.

This change enhances code readability and reduces redundancy across scripts.

* control vboost enablement via cli

* Update finetune_llama4_e128.py to import build_perf_env_plugin function

* Added the build_perf_env_plugin import to enhance performance environment setup consistency across scripts.

This change aligns with recent refactoring efforts to streamline performance script management.

---------

Co-authored-by: Barys Dubauski <bdubauski@nvdia.com>

* turned off tp overlap comms for >128 gpus on gb200 so jobs are functi… (#14460)

* turned off tp overlap comms for >128 gpus on gb200 so jobs are functional

Signed-off-by: Sebastian Alberdi <salberdi@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: salberdi-nvidia <salberdi-nvidia@users.noreply.github.com>

---------

Signed-off-by: Sebastian Alberdi <salberdi@nvidia.com>
Signed-off-by: salberdi-nvidia <salberdi-nvidia@users.noreply.github.com>
Co-authored-by: salberdi-nvidia <salberdi-nvidia@users.noreply.github.com>

* Remove NCCL tracing option and clean up imports in performance scripts (#14467)

* Remove NCCL tracing option and clean up imports in performance scripts. Updated multiple LLM finetuning and pretraining scripts to eliminate the use of PerfEnvPlugin, enhancing consistency and maintainability.

* Apply isort and black reformatting

Signed-off-by: bdubauski <bdubauski@users.noreply.github.com>

---------

Signed-off-by: bdubauski <bdubauski@users.noreply.github.com>
Co-authored-by: Barys Dubauski <bdubauski@nvdia.com>
Co-authored-by: bdubauski <bdubauski@users.noreply.github.com>

* Disable tp_comm_overlap for 512 gpus on GB200 (#14474)

...to fix functionality issue

Signed-off-by: Sanju C Sudhakaran <scsudhakaran@nvidia.com>

* Workaround for MXFP8 functionality issue (#14426)

* Workaround for MXFP8 functionality issue

Signed-off-by: Sanju C Sudhakaran <scsudhakaran@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: scsudhakaran <scsudhakaran@users.noreply.github.com>

---------

Signed-off-by: Sanju C Sudhakaran <scsudhakaran@nvidia.com>
Signed-off-by: scsudhakaran <scsudhakaran@users.noreply.github.com>
Co-authored-by: scsudhakaran <scsudhakaran@users.noreply.github.com>

* previous commit was buggy (#14477)

* previous was buggy

Signed-off-by: Sebastian Alberdi <salberdi@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: salberdi-nvidia <salberdi-nvidia@users.noreply.github.com>

---------

Signed-off-by: Sebastian Alberdi <salberdi@nvidia.com>
Signed-off-by: salberdi-nvidia <salberdi-nvidia@users.noreply.github.com>
Co-authored-by: salberdi-nvidia <salberdi-nvidia@users.noreply.github.com>

* checkpoint save/load functionality with HF token (#14538)

* checkpoint save/load functionality with HF token

* Apply isort and black reformatting

Signed-off-by: rsalagame-nvidia <rsalagame-nvidia@users.noreply.github.com>

* using use_hf_tokenizer

* reverting back to hf_token

---------

Signed-off-by: rsalagame-nvidia <rsalagame-nvidia@users.noreply.github.com>
Co-authored-by: rsalagame-nvidia <rsalagame-nvidia@users.noreply.github.com>

* added hf import for 15b/340b pretrain (#14565)

* Llmb nemo r2.4.0 (#14607)

* Update mixed_precision.py

Signed-off-by: guyueh1 <140554423+guyueh1@users.noreply.github.com>

* Fix reuse_grad_buf_for_mxfp8_param_ag for mxfp8

Signed-off-by: Guyue Huang <guyueh@nvidia.com>

---------

Signed-off-by: guyueh1 <140554423+guyueh1@users.noreply.github.com>
Signed-off-by: Guyue Huang <guyueh@nvidia.com>
Co-authored-by: guyueh1 <140554423+guyueh1@users.noreply.github.com>
Co-authored-by: Guyue Huang <guyueh@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: nv-mollys <nv-mollys@users.noreply.github.com>

---------

Signed-off-by: Sanju C Sudhakaran <scsudhakaran@nvidia.com>
Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com>
Signed-off-by: Malay Nagda <malayn@nvidia.com>
Signed-off-by: malay-nagda <malay-nagda@users.noreply.github.com>
Signed-off-by: malay-nagda <malayn@nvidia.com>
Signed-off-by: rhmukundan <rhmukundan@users.noreply.github.com>
Signed-off-by: Aishwarya Bhandare <abhandare@login-eos01.eos.clusters.nvidia.com>
Signed-off-by: ashbhandare <ashbhandare@users.noreply.github.com>
Signed-off-by: Barys Dubauski <bdubauski@nvdia.com>
Signed-off-by: bdubauski <bdubauski@users.noreply.github.com>
Signed-off-by: mollys <mollys@mollys.nvidia.com>
Signed-off-by: nv-mollys <nv-mollys@users.noreply.github.com>
Signed-off-by: Sebastian Alberdi <salberdi@salberdi-mlt.client.nvidia.com>
Signed-off-by: salberdi-nvidia <salberdi-nvidia@users.noreply.github.com>
Signed-off-by: ashbhandare <abhandare@nvidia.com>
Signed-off-by: scsudhakaran <scsudhakaran@users.noreply.github.com>
Signed-off-by: Sebastian Alberdi <salberdi@login-ptyche01.ptyche.clusters.nvidia.com>
Signed-off-by: sshiddib <sshiddib@users.noreply.github.com>
Signed-off-by: artbataev <artbataev@users.noreply.github.com>
Signed-off-by: Sebastian Alberdi <salberdi@cw-dfw-cs-001-login-02.cm.cluster>
Signed-off-by: Alex Filby <afilby@nvidia.com>
Signed-off-by: sudostock <sudostock@users.noreply.github.com>
Signed-off-by: Guyue Huang <guyueh@nvidia.com>
Signed-off-by: rsalagame-nvidia <rsalagame-nvidia@users.noreply.github.com>
Signed-off-by: Sebastian Alberdi <salberdi@nvidia.com>
Signed-off-by: guyueh1 <140554423+guyueh1@users.noreply.github.com>
Signed-off-by: nv-mollys <149841089+nv-mollys@users.noreply.github.com>
Co-authored-by: scsudhakaran <scsudhakaran@nvidia.com>
Co-authored-by: rhmukundan <102543536+rhmukundan@users.noreply.github.com>
Co-authored-by: malay-nagda <malayn@nvidia.com>
Co-authored-by: malay-nagda <malay-nagda@users.noreply.github.com>
Co-authored-by: mollys <mollys@mollys.nvidia.com>
Co-authored-by: rhmukundan <rhmukundan@users.noreply.github.com>
Co-authored-by: ashbhandare <ash.bhandare@gmail.com>
Co-authored-by: Aishwarya Bhandare <abhandare@login-eos01.eos.clusters.nvidia.com>
Co-authored-by: ashbhandare <ashbhandare@users.noreply.github.com>
Co-authored-by: bdubauski <80418713+bdubauski@users.noreply.github.com>
Co-authored-by: Barys Dubauski <bdubauski@nvdia.com>
Co-authored-by: bdubauski <bdubauski@users.noreply.github.com>
Co-authored-by: nv-mollys <nv-mollys@users.noreply.github.com>
Co-authored-by: salberdi-nvidia <salberdi@nvidia.com>
Co-authored-by: Sebastian Alberdi <salberdi@login-preos02.a51.clusters.nvidia.com>
Co-authored-by: salberdi-nvidia <salberdi-nvidia@users.noreply.github.com>
Co-authored-by: Sebastian Alberdi <salberdi@login-ptyche01.ptyche.clusters.nvidia.com>
Co-authored-by: ashbhandare <abhandare@nvidia.com>
Co-authored-by: Aishwarya Bhandare <abhandare@login-ptyche01.ptyche.clusters.nvidia.com>
Co-authored-by: scsudhakaran <scsudhakaran@users.noreply.github.com>
Co-authored-by: Sharada Shiddibhavi <sshiddibhavi@nvidia.com>
Co-authored-by: Sharada Shiddibhavi <sshiddibhavi@cw-dfw-cs-001-vscode-02.cm.cluster>
Co-authored-by: sshiddib <sshiddib@users.noreply.github.com>
Co-authored-by: artbataev <artbataev@users.noreply.github.com>
Co-authored-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com>
Co-authored-by: rsalagame-nvidia <rsalagame@nvidia.com>
Co-authored-by: Sebastian Alberdi <salberdi@cw-dfw-cs-001-login-02.cm.cluster>
Co-authored-by: Alex Filby <alexfilby@gmail.com>
Co-authored-by: sudostock <sudostock@users.noreply.github.com>
Co-authored-by: rsalagame-nvidia <rsalagame-nvidia@users.noreply.github.com>
Co-authored-by: guyueh1 <140554423+guyueh1@users.noreply.github.com>
Co-authored-by: Guyue Huang <guyueh@nvidia.com>
nv-mollys added a commit that referenced this pull request Sep 3, 2025
* Set attention backend to "auto" for Nemotron-H (#14042)

Signed-off-by: Sanju C Sudhakaran <scsudhakaran@nvidia.com>

* Adding TFLOPS per GPU Support for Finetuning (#14048)

Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com>

* Enable optimizations for Nemotron-H (#13915)

Signed-off-by: Sanju C Sudhakaran <scsudhakaran@nvidia.com>

* Disable checkpointing for Nemotron-H (#14001)

Signed-off-by: Sanju C Sudhakaran <scsudhakaran@nvidia.com>

* Cherry pick ea4b47f (#13896)

* perf scripts updates (#13456)

* gb200 recommended cfgs csv fix

Signed-off-by: Malay Nagda <malayn@nvidia.com>

* 495b h100 fix

Signed-off-by: Malay Nagda <malayn@nvidia.com>

* gb200 79b bf16 20 layers recompute

Signed-off-by: Malay Nagda <malayn@nvidia.com>

* csv format fix

Signed-off-by: Malay Nagda <malayn@nvidia.com>

* csv format fix

Signed-off-by: Malay Nagda <malayn@nvidia.com>

* 70b, 340b no fsdp

Signed-off-by: Malay Nagda <malayn@nvidia.com>

* dsv3 perf mode

Signed-off-by: Malay Nagda <malayn@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: malay-nagda <malay-nagda@users.noreply.github.com>

* dsv3 perf mode peft

Signed-off-by: Malay Nagda <malayn@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: malay-nagda <malay-nagda@users.noreply.github.com>

* dsv3 callback

Signed-off-by: Malay Nagda <malayn@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: malay-nagda <malay-nagda@users.noreply.github.com>

* dsv3 callback

Signed-off-by: Malay Nagda <malayn@nvidia.com>

* cudagraphs

Signed-off-by: Malay Nagda <malayn@nvidia.com>

---------

Signed-off-by: Malay Nagda <malayn@nvidia.com>
Signed-off-by: malay-nagda <malay-nagda@users.noreply.github.com>
Signed-off-by: malay-nagda <malayn@nvidia.com>
Co-authored-by: malay-nagda <malay-nagda@users.noreply.github.com>

* import missing callbacks in deepseek recipe

---------

Signed-off-by: Malay Nagda <malayn@nvidia.com>
Signed-off-by: malay-nagda <malay-nagda@users.noreply.github.com>
Signed-off-by: malay-nagda <malayn@nvidia.com>
Co-authored-by: malay-nagda <malayn@nvidia.com>
Co-authored-by: malay-nagda <malay-nagda@users.noreply.github.com>
Co-authored-by: mollys <mollys@mollys.nvidia.com>

* Onboard LLAMA4 Maverick Finetuning(SFT) with SQUAD Dataset Download Fix (#13926)

* Onboard LLAMA4 Maverick Finetuning(SFT) with SQUAD Dataset Download Fix

Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com>

* Tweaks for llama4_e128

Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com>

* Adding flags for skipping the separate SLURM jobs

Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com>

* Arg parse changes and tweaks to remove squad dataset check

Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com>

* Reverting the args for separate SLURM jobs as there is a dependency (run.Partial) with the finetune job

Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com>

* Removing NullTokenizer due to compatability

Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com>

* Separate args to have control over the 3 SLURM jobs

Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com>

* Enabling TokenDropCallback and tp_comm_overlap

Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com>

* Changes to introduce flags for enabling/disabling the 3 SLURM jobs

Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com>

* Changing the exp_name based on the SLURM job being run

Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com>

* Argparse Changes

Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com>

* Fix for standalone checkpoint and dataload jobs

Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com>

* Removing NUMA Factor error for dataset and checkpoint download job

Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com>

* Fix for CUDA Graph error in this version

Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com>

* Changes to peft_scheme

Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com>

* Changes to set peft_scheme to None

Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com>

* Replacing the file name from finetune_ to sft_

Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com>

* Updates to exp_name format

Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com>

* Reverting defaults for --finetuning arg to also include lora

Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com>

* Tweak comment(s) in the finetuine llama4 e128 script

Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com>

* Reverting the recommended config order change in for b200

Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: rhmukundan <rhmukundan@users.noreply.github.com>

---------

Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com>
Signed-off-by: rhmukundan <rhmukundan@users.noreply.github.com>
Co-authored-by: rhmukundan <rhmukundan@users.noreply.github.com>

* Add profiling changes (#13484)

* add profiling changes

Signed-off-by: Aishwarya Bhandare <abhandare@login-eos01.eos.clusters.nvidia.com>

* Apply isort and black reformatting

Signed-off-by: ashbhandare <ashbhandare@users.noreply.github.com>

* More model changes

Signed-off-by: Aishwarya Bhandare <abhandare@login-eos01.eos.clusters.nvidia.com>

* Apply isort and black reformatting

Signed-off-by: ashbhandare <ashbhandare@users.noreply.github.com>

---------

Signed-off-by: Aishwarya Bhandare <abhandare@login-eos01.eos.clusters.nvidia.com>
Signed-off-by: ashbhandare <ashbhandare@users.noreply.github.com>
Co-authored-by: Aishwarya Bhandare <abhandare@login-eos01.eos.clusters.nvidia.com>
Co-authored-by: ashbhandare <ashbhandare@users.noreply.github.com>

* Port nemotron 25.04 patch to r3.2.0 based llmb-nemo (#13533)

* port nemotron patch to r3.2.0 based llmb-nemo

Signed-off-by: Barys Dubauski <bdubauski@nvdia.com>

* update template for experiment names

Signed-off-by: Barys Dubauski <bdubauski@nvdia.com>

* review based updates

Signed-off-by: Barys Dubauski <bdubauski@nvdia.com>

---------

Signed-off-by: Barys Dubauski <bdubauski@nvdia.com>
Co-authored-by: Barys Dubauski <bdubauski@nvdia.com>

* Port run-ai patch to llmb-nemo branch (#13573)

* Port run-ai patch to llmb-nemo branch

Signed-off-by: Barys Dubauski <bdubauski@nvdia.com>

* Apply isort and black reformatting

Signed-off-by: bdubauski <bdubauski@users.noreply.github.com>

---------

Signed-off-by: Barys Dubauski <bdubauski@nvdia.com>
Signed-off-by: bdubauski <bdubauski@users.noreply.github.com>
Co-authored-by: Barys Dubauski <bdubauski@nvdia.com>
Co-authored-by: bdubauski <bdubauski@users.noreply.github.com>

* Add grok recipe (#13586)

* Add grok recipe

Signed-off-by: mollys <mollys@mollys.nvidia.com>

* Apply isort and black reformatting

Signed-off-by: nv-mollys <nv-mollys@users.noreply.github.com>
Signed-off-by: mollys <mollys@mollys.nvidia.com>

* Add copyright header

Signed-off-by: mollys <mollys@mollys.nvidia.com>

---------

Signed-off-by: mollys <mollys@mollys.nvidia.com>
Signed-off-by: nv-mollys <nv-mollys@users.noreply.github.com>
Co-authored-by: mollys <mollys@mollys.nvidia.com>
Co-authored-by: nv-mollys <nv-mollys@users.noreply.github.com>

* transformers_offline=0 and profile changes to llama3.1 405b (#13655)

* transformers_offline=0 and profile changes to llama3.1 405b

Signed-off-by: Sebastian Alberdi <salberdi@salberdi-mlt.client.nvidia.com>

* Apply isort and black reformatting

Signed-off-by: salberdi-nvidia <salberdi-nvidia@users.noreply.github.com>

* nccl added

Signed-off-by: Sebastian Alberdi <salberdi@salberdi-mlt.client.nvidia.com>

* Apply isort and black reformatting

Signed-off-by: salberdi-nvidia <salberdi-nvidia@users.noreply.github.com>

---------

Signed-off-by: Sebastian Alberdi <salberdi@salberdi-mlt.client.nvidia.com>
Signed-off-by: salberdi-nvidia <salberdi-nvidia@users.noreply.github.com>
Co-authored-by: Sebastian Alberdi <salberdi@login-preos02.a51.clusters.nvidia.com>
Co-authored-by: salberdi-nvidia <salberdi-nvidia@users.noreply.github.com>
Co-authored-by: Sebastian Alberdi <salberdi@login-ptyche01.ptyche.clusters.nvidia.com>

* Add perf recipe script for Nemotron-H-56B (#13691)

Signed-off-by: Sanju C Sudhakaran <scsudhakaran@nvidia.com>

* Pretraining Deepseek changes for LLMB (#13752)

* working changes

Signed-off-by: ashbhandare <abhandare@nvidia.com>

* cleanup

Signed-off-by: ashbhandare <abhandare@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: ashbhandare <ashbhandare@users.noreply.github.com>
Signed-off-by: ashbhandare <abhandare@nvidia.com>

* make profiling steps overridable

Signed-off-by: ashbhandare <abhandare@nvidia.com>

* add nccl trace ability, cleanup

Signed-off-by: ashbhandare <abhandare@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: ashbhandare <ashbhandare@users.noreply.github.com>
Signed-off-by: ashbhandare <abhandare@nvidia.com>

---------

Signed-off-by: ashbhandare <abhandare@nvidia.com>
Signed-off-by: ashbhandare <ashbhandare@users.noreply.github.com>
Co-authored-by: Aishwarya Bhandare <abhandare@login-ptyche01.ptyche.clusters.nvidia.com>
Co-authored-by: ashbhandare <ashbhandare@users.noreply.github.com>

* Adding FP8 Default Configs for LLAMA4 Maverick (#13698)

Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com>

* Changing the tokenizer from Scout to Maverick in the pretrain LLAMA4 LLM Recipe (#13664)

Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com>

* Tweaking LLama4 Maverick PreTrain file to adapt to the user configs parameter format (#13690)

Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com>

* Grok Nvbug 5311566 (#13765)

* remove unnecessary nemo root check

* remove comment and unused packages

---------

Co-authored-by: mollys <mollys@mollys.nvidia.com>

* Grok nccl trace fix (#13769)

* remove unnecessary nemo root check

* remove comment and unused packages

* transformers online

* fix env vars

* setting transformers offline here doesn't work

---------

Co-authored-by: mollys <mollys@mollys.nvidia.com>

* Fix for config params in pretrain llama4 e128 (#13764)

* Fix for config params in pretrain llama4 e128

Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com>

* Ignoring unrelated configs

Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com>

* Cleanup of configs

Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com>

* Adding all the params in get_user_configs func

Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com>

---------

Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com>

* Nsys Tweaks to llama4 pretrain  (#13778)

* Removign hardcoding of nsys profiling ranges

Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com>

* Adding NCCL Trace support for pretrain recipe (llama4)

Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: rhmukundan <rhmukundan@users.noreply.github.com>

---------

Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com>
Signed-off-by: rhmukundan <rhmukundan@users.noreply.github.com>
Co-authored-by: rhmukundan <rhmukundan@users.noreply.github.com>

* Disable checkpointing for Nemotron-H (#13786)

* Disable checkpointing for Nemotron-H

Signed-off-by: Sanju C Sudhakaran <scsudhakaran@nvidia.com>

* Nemotron-H NCCL trace support

Signed-off-by: Sanju C Sudhakaran <scsudhakaran@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: scsudhakaran <scsudhakaran@users.noreply.github.com>

---------

Signed-off-by: Sanju C Sudhakaran <scsudhakaran@nvidia.com>
Signed-off-by: scsudhakaran <scsudhakaran@users.noreply.github.com>
Co-authored-by: scsudhakaran <scsudhakaran@users.noreply.github.com>

* Llmb nemo r2.3.0 (#13806)

* set NCCL_NET_GDR_LEVEL=PHB for deepseekv3, grok1_314b, llama31_405b, llama4_e128, nemotron4_15b+340b, nemotronh_56b

Signed-off-by: Sebastian Alberdi <salberdi@login-ptyche01.ptyche.clusters.nvidia.com>

* Apply isort and black reformatting

Signed-off-by: salberdi-nvidia <salberdi-nvidia@users.noreply.github.com>

* Apply isort and black reformatting

Signed-off-by: salberdi-nvidia <salberdi-nvidia@users.noreply.github.com>

---------

Signed-off-by: Sebastian Alberdi <salberdi@login-ptyche01.ptyche.clusters.nvidia.com>
Signed-off-by: salberdi-nvidia <salberdi-nvidia@users.noreply.github.com>
Co-authored-by: Sebastian Alberdi <salberdi@login-ptyche01.ptyche.clusters.nvidia.com>
Co-authored-by: salberdi-nvidia <salberdi-nvidia@users.noreply.github.com>

* add all environment variables to container environment (#13808)

Co-authored-by: mollys <mollys@mollys.nvidia.com>

* fix numactl (#13809)

Co-authored-by: mollys <mollys@mollys.nvidia.com>

* Llmb nemo r2.3.0 (#13807)

* set NCCL_NET_GDR_LEVEL=PHB for deepseekv3, grok1_314b, llama31_405b, llama4_e128, nemotron4_15b+340b, nemotronh_56b

Signed-off-by: Sebastian Alberdi <salberdi@login-ptyche01.ptyche.clusters.nvidia.com>

* Apply isort and black reformatting

Signed-off-by: salberdi-nvidia <salberdi-nvidia@users.noreply.github.com>

* Apply isort and black reformatting

Signed-off-by: salberdi-nvidia <salberdi-nvidia@users.noreply.github.com>

* made experiment naming match standard

Signed-off-by: Sebastian Alberdi <salberdi@login-ptyche01.ptyche.clusters.nvidia.com>

* Apply isort and black reformatting

Signed-off-by: salberdi-nvidia <salberdi-nvidia@users.noreply.github.com>

* standardized exp_name for relevant workloads

Signed-off-by: Sebastian Alberdi <salberdi@login-ptyche01.ptyche.clusters.nvidia.com>

* Apply isort and black reformatting

Signed-off-by: salberdi-nvidia <salberdi-nvidia@users.noreply.github.com>

---------

Signed-off-by: Sebastian Alberdi <salberdi@login-ptyche01.ptyche.clusters.nvidia.com>
Signed-off-by: salberdi-nvidia <salberdi-nvidia@users.noreply.github.com>
Co-authored-by: Sebastian Alberdi <salberdi@login-ptyche01.ptyche.clusters.nvidia.com>
Co-authored-by: salberdi-nvidia <salberdi-nvidia@users.noreply.github.com>

* fixing QA checkpoint bug for nemotron4 (#13843)

* fixing QA checkpoint bug for nemotron4

* Apply isort and black reformatting

Signed-off-by: sshiddib <sshiddib@users.noreply.github.com>

* arg name change

* Apply isort and black reformatting

Signed-off-by: sshiddib <sshiddib@users.noreply.github.com>

---------

Signed-off-by: sshiddib <sshiddib@users.noreply.github.com>
Co-authored-by: Sharada Shiddibhavi <sshiddibhavi@cw-dfw-cs-001-vscode-02.cm.cluster>
Co-authored-by: sshiddib <sshiddib@users.noreply.github.com>

* Add gpu metrics option (#13882)

* gpu metrics option

Signed-off-by: ashbhandare <abhandare@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: ashbhandare <ashbhandare@users.noreply.github.com>

* specify nemo run commit

Signed-off-by: ashbhandare <abhandare@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: artbataev <artbataev@users.noreply.github.com>

* Fix linting error

Signed-off-by: ashbhandare <abhandare@nvidia.com>

---------

Signed-off-by: ashbhandare <abhandare@nvidia.com>
Signed-off-by: ashbhandare <ashbhandare@users.noreply.github.com>
Signed-off-by: artbataev <artbataev@users.noreply.github.com>
Co-authored-by: ashbhandare <ashbhandare@users.noreply.github.com>
Co-authored-by: artbataev <artbataev@users.noreply.github.com>

* LLAMA4 Maverick SFT Recipe + SQUAD Dataset Download Error Fix

Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com>

* Revert "LLAMA4 Maverick SFT Recipe + SQUAD Dataset Download Error Fix"

This reverts commit 755fd36.

* fix nemo/collections/llm/recipes/__init__.py

* fix nemo/collections/llm/recipes/deepseek_v3.py

* new line

* fix nemo/collections/llm/recipes/llama4_e128.py

* fix scripts/performance/llm/finetune_llama4_e128.py

* small updates for grok

* modified:   scripts/performance/llm/pretrain_grok1_314b.py
        modified:   scripts/performance/llm/pretrain_nemotron4_340b.py

* manually add util changes to helpers.py and executors.py

* Fix in Nemotron-H script (#14251)

* Fix in Nemotron-H script

Signed-off-by: Sanju C Sudhakaran <scsudhakaran@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: scsudhakaran <scsudhakaran@users.noreply.github.com>

* Apply isort and black reformatting

Signed-off-by: artbataev <artbataev@users.noreply.github.com>

* Fix in Nemotron-H perf script

Signed-off-by: Sanju C Sudhakaran <scsudhakaran@nvidia.com>

---------

Signed-off-by: Sanju C Sudhakaran <scsudhakaran@nvidia.com>
Signed-off-by: scsudhakaran <scsudhakaran@users.noreply.github.com>
Signed-off-by: artbataev <artbataev@users.noreply.github.com>
Co-authored-by: scsudhakaran <scsudhakaran@users.noreply.github.com>
Co-authored-by: artbataev <artbataev@users.noreply.github.com>

* updated with some things from NeMo main (double_buffer) (#14305)

* updated with some things from NeMo main (double_buffer)

Signed-off-by: Sebastian Alberdi <salberdi@login-ptyche01.ptyche.clusters.nvidia.com>

* Apply isort and black reformatting

Signed-off-by: salberdi-nvidia <salberdi-nvidia@users.noreply.github.com>

* Apply isort and black reformatting

Signed-off-by: artbataev <artbataev@users.noreply.github.com>

---------

Signed-off-by: Sebastian Alberdi <salberdi@login-ptyche01.ptyche.clusters.nvidia.com>
Signed-off-by: salberdi-nvidia <salberdi-nvidia@users.noreply.github.com>
Signed-off-by: artbataev <artbataev@users.noreply.github.com>
Co-authored-by: Sebastian Alberdi <salberdi@login-ptyche01.ptyche.clusters.nvidia.com>
Co-authored-by: salberdi-nvidia <salberdi-nvidia@users.noreply.github.com>
Co-authored-by: artbataev <artbataev@users.noreply.github.com>

* Took out cudNN lines b/c of regression with cuDNN normalization kernel (#14360)

* added conditional cudnn to align with nemo main (#14324)

* added conditional cudnn to align with nemo main

Signed-off-by: Sebastian Alberdi <salberdi@cw-dfw-cs-001-login-02.cm.cluster>

* fixed num optimizer instances bug

Signed-off-by: Sebastian Alberdi <salberdi@login-ptyche01.ptyche.clusters.nvidia.com>

---------

Signed-off-by: Sebastian Alberdi <salberdi@cw-dfw-cs-001-login-02.cm.cluster>
Signed-off-by: Sebastian Alberdi <salberdi@login-ptyche01.ptyche.clusters.nvidia.com>
Co-authored-by: Sebastian Alberdi <salberdi@cw-dfw-cs-001-login-02.cm.cluster>
Co-authored-by: Sebastian Alberdi <salberdi@login-ptyche01.ptyche.clusters.nvidia.com>

* Adds pyxis container writable and no mount home flags (#14386)

* Add pyxis flags for writable and no-mount home.

Signed-off-by: Alex Filby <afilby@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: sudostock <sudostock@users.noreply.github.com>

---------

Signed-off-by: Alex Filby <afilby@nvidia.com>
Signed-off-by: sudostock <sudostock@users.noreply.github.com>
Co-authored-by: sudostock <sudostock@users.noreply.github.com>

* Update DeepSeek-V3 perf scripts (#14377)

* Fix callbacks in DSV3 script (#14350)

Signed-off-by: Guyue Huang <guyueh@nvidia.com>
Signed-off-by: Sanju C Sudhakaran <scsudhakaran@nvidia.com>

* Changes to grok to alleviate error: TypeError: '>' not supported betw… (#14326)

* Changes to grok to alleviate error: TypeError: '>' not supported between instances of 'str' and 'int'

* Apply isort and black reformatting

Signed-off-by: rsalagame-nvidia <rsalagame-nvidia@users.noreply.github.com>

* Made the changes where it's not default values hard coded. User can change thru cli

* Apply isort and black reformatting

Signed-off-by: rsalagame-nvidia <rsalagame-nvidia@users.noreply.github.com>

* made suggested changes. Verified successful.

* Apply isort and black reformatting

Signed-off-by: rsalagame-nvidia <rsalagame-nvidia@users.noreply.github.com>

* Made suggested change.

---------

Signed-off-by: rsalagame-nvidia <rsalagame-nvidia@users.noreply.github.com>
Co-authored-by: rsalagame-nvidia <rsalagame-nvidia@users.noreply.github.com>

* Make VBoost activation conditional  (#14453)

* Refactor performance scripts to use build_perf_env_plugin function

* Replaced direct instantiation of PerfEnvPlugin with build_perf_env_plugin in multiple LLM finetuning and pretraining scripts for consistency and maintainability.
* Added build_perf_env_plugin function to helpers.py to streamline performance environment setup based on GPU and pipeline parallelism settings.

This change enhances code readability and reduces redundancy across scripts.

* control vboost enablement via cli

* Update finetune_llama4_e128.py to import build_perf_env_plugin function

* Added the build_perf_env_plugin import to enhance performance environment setup consistency across scripts.

This change aligns with recent refactoring efforts to streamline performance script management.

---------

Co-authored-by: Barys Dubauski <bdubauski@nvdia.com>

* turned off tp overlap comms for >128 gpus on gb200 so jobs are functi… (#14460)

* turned off tp overlap comms for >128 gpus on gb200 so jobs are functional

Signed-off-by: Sebastian Alberdi <salberdi@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: salberdi-nvidia <salberdi-nvidia@users.noreply.github.com>

---------

Signed-off-by: Sebastian Alberdi <salberdi@nvidia.com>
Signed-off-by: salberdi-nvidia <salberdi-nvidia@users.noreply.github.com>
Co-authored-by: salberdi-nvidia <salberdi-nvidia@users.noreply.github.com>

* Remove NCCL tracing option and clean up imports in performance scripts (#14467)

* Remove NCCL tracing option and clean up imports in performance scripts. Updated multiple LLM finetuning and pretraining scripts to eliminate the use of PerfEnvPlugin, enhancing consistency and maintainability.

* Apply isort and black reformatting

Signed-off-by: bdubauski <bdubauski@users.noreply.github.com>

---------

Signed-off-by: bdubauski <bdubauski@users.noreply.github.com>
Co-authored-by: Barys Dubauski <bdubauski@nvdia.com>
Co-authored-by: bdubauski <bdubauski@users.noreply.github.com>

* Disable tp_comm_overlap for 512 gpus on GB200 (#14474)

...to fix functionality issue

Signed-off-by: Sanju C Sudhakaran <scsudhakaran@nvidia.com>

* Workaround for MXFP8 functionality issue (#14426)

* Workaround for MXFP8 functionality issue

Signed-off-by: Sanju C Sudhakaran <scsudhakaran@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: scsudhakaran <scsudhakaran@users.noreply.github.com>

---------

Signed-off-by: Sanju C Sudhakaran <scsudhakaran@nvidia.com>
Signed-off-by: scsudhakaran <scsudhakaran@users.noreply.github.com>
Co-authored-by: scsudhakaran <scsudhakaran@users.noreply.github.com>

* previous commit was buggy (#14477)

* previous was buggy

Signed-off-by: Sebastian Alberdi <salberdi@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: salberdi-nvidia <salberdi-nvidia@users.noreply.github.com>

---------

Signed-off-by: Sebastian Alberdi <salberdi@nvidia.com>
Signed-off-by: salberdi-nvidia <salberdi-nvidia@users.noreply.github.com>
Co-authored-by: salberdi-nvidia <salberdi-nvidia@users.noreply.github.com>

* checkpoint save/load functionality with HF token (#14538)

* checkpoint save/load functionality with HF token

* Apply isort and black reformatting

Signed-off-by: rsalagame-nvidia <rsalagame-nvidia@users.noreply.github.com>

* using use_hf_tokenizer

* reverting back to hf_token

---------

Signed-off-by: rsalagame-nvidia <rsalagame-nvidia@users.noreply.github.com>
Co-authored-by: rsalagame-nvidia <rsalagame-nvidia@users.noreply.github.com>

* added hf import for 15b/340b pretrain (#14565)

* Llmb nemo r2.4.0 (#14607)

* Update mixed_precision.py

Signed-off-by: guyueh1 <140554423+guyueh1@users.noreply.github.com>

* Fix reuse_grad_buf_for_mxfp8_param_ag for mxfp8

Signed-off-by: Guyue Huang <guyueh@nvidia.com>

---------

Signed-off-by: guyueh1 <140554423+guyueh1@users.noreply.github.com>
Signed-off-by: Guyue Huang <guyueh@nvidia.com>
Co-authored-by: guyueh1 <140554423+guyueh1@users.noreply.github.com>
Co-authored-by: Guyue Huang <guyueh@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: nv-mollys <nv-mollys@users.noreply.github.com>

---------

Signed-off-by: Sanju C Sudhakaran <scsudhakaran@nvidia.com>
Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com>
Signed-off-by: Malay Nagda <malayn@nvidia.com>
Signed-off-by: malay-nagda <malay-nagda@users.noreply.github.com>
Signed-off-by: malay-nagda <malayn@nvidia.com>
Signed-off-by: rhmukundan <rhmukundan@users.noreply.github.com>
Signed-off-by: Aishwarya Bhandare <abhandare@login-eos01.eos.clusters.nvidia.com>
Signed-off-by: ashbhandare <ashbhandare@users.noreply.github.com>
Signed-off-by: Barys Dubauski <bdubauski@nvdia.com>
Signed-off-by: bdubauski <bdubauski@users.noreply.github.com>
Signed-off-by: mollys <mollys@mollys.nvidia.com>
Signed-off-by: nv-mollys <nv-mollys@users.noreply.github.com>
Signed-off-by: Sebastian Alberdi <salberdi@salberdi-mlt.client.nvidia.com>
Signed-off-by: salberdi-nvidia <salberdi-nvidia@users.noreply.github.com>
Signed-off-by: ashbhandare <abhandare@nvidia.com>
Signed-off-by: scsudhakaran <scsudhakaran@users.noreply.github.com>
Signed-off-by: Sebastian Alberdi <salberdi@login-ptyche01.ptyche.clusters.nvidia.com>
Signed-off-by: sshiddib <sshiddib@users.noreply.github.com>
Signed-off-by: artbataev <artbataev@users.noreply.github.com>
Signed-off-by: Sebastian Alberdi <salberdi@cw-dfw-cs-001-login-02.cm.cluster>
Signed-off-by: Alex Filby <afilby@nvidia.com>
Signed-off-by: sudostock <sudostock@users.noreply.github.com>
Signed-off-by: Guyue Huang <guyueh@nvidia.com>
Signed-off-by: rsalagame-nvidia <rsalagame-nvidia@users.noreply.github.com>
Signed-off-by: Sebastian Alberdi <salberdi@nvidia.com>
Signed-off-by: guyueh1 <140554423+guyueh1@users.noreply.github.com>
Signed-off-by: nv-mollys <149841089+nv-mollys@users.noreply.github.com>
Co-authored-by: scsudhakaran <scsudhakaran@nvidia.com>
Co-authored-by: rhmukundan <102543536+rhmukundan@users.noreply.github.com>
Co-authored-by: malay-nagda <malayn@nvidia.com>
Co-authored-by: malay-nagda <malay-nagda@users.noreply.github.com>
Co-authored-by: mollys <mollys@mollys.nvidia.com>
Co-authored-by: rhmukundan <rhmukundan@users.noreply.github.com>
Co-authored-by: ashbhandare <ash.bhandare@gmail.com>
Co-authored-by: Aishwarya Bhandare <abhandare@login-eos01.eos.clusters.nvidia.com>
Co-authored-by: ashbhandare <ashbhandare@users.noreply.github.com>
Co-authored-by: bdubauski <80418713+bdubauski@users.noreply.github.com>
Co-authored-by: Barys Dubauski <bdubauski@nvdia.com>
Co-authored-by: bdubauski <bdubauski@users.noreply.github.com>
Co-authored-by: nv-mollys <nv-mollys@users.noreply.github.com>
Co-authored-by: salberdi-nvidia <salberdi@nvidia.com>
Co-authored-by: Sebastian Alberdi <salberdi@login-preos02.a51.clusters.nvidia.com>
Co-authored-by: salberdi-nvidia <salberdi-nvidia@users.noreply.github.com>
Co-authored-by: Sebastian Alberdi <salberdi@login-ptyche01.ptyche.clusters.nvidia.com>
Co-authored-by: ashbhandare <abhandare@nvidia.com>
Co-authored-by: Aishwarya Bhandare <abhandare@login-ptyche01.ptyche.clusters.nvidia.com>
Co-authored-by: scsudhakaran <scsudhakaran@users.noreply.github.com>
Co-authored-by: Sharada Shiddibhavi <sshiddibhavi@nvidia.com>
Co-authored-by: Sharada Shiddibhavi <sshiddibhavi@cw-dfw-cs-001-vscode-02.cm.cluster>
Co-authored-by: sshiddib <sshiddib@users.noreply.github.com>
Co-authored-by: artbataev <artbataev@users.noreply.github.com>
Co-authored-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com>
Co-authored-by: rsalagame-nvidia <rsalagame@nvidia.com>
Co-authored-by: Sebastian Alberdi <salberdi@cw-dfw-cs-001-login-02.cm.cluster>
Co-authored-by: Alex Filby <alexfilby@gmail.com>
Co-authored-by: sudostock <sudostock@users.noreply.github.com>
Co-authored-by: rsalagame-nvidia <rsalagame-nvidia@users.noreply.github.com>
Co-authored-by: guyueh1 <140554423+guyueh1@users.noreply.github.com>
Co-authored-by: Guyue Huang <guyueh@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants