-
Notifications
You must be signed in to change notification settings - Fork 3.2k
Add perf recipe script for Nemotron-H-56B #13691
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
ashbhandare
approved these changes
May 28, 2025
Collaborator
ashbhandare
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Signed-off-by: Sanju C Sudhakaran <scsudhakaran@nvidia.com>
7e75d8a to
3c4f81d
Compare
nv-mollys
pushed a commit
that referenced
this pull request
Jul 8, 2025
Signed-off-by: Sanju C Sudhakaran <scsudhakaran@nvidia.com>
nv-mollys
added a commit
that referenced
this pull request
Sep 3, 2025
* Set attention backend to "auto" for Nemotron-H (#14042) Signed-off-by: Sanju C Sudhakaran <scsudhakaran@nvidia.com> * Adding TFLOPS per GPU Support for Finetuning (#14048) Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com> * Enable optimizations for Nemotron-H (#13915) Signed-off-by: Sanju C Sudhakaran <scsudhakaran@nvidia.com> * Disable checkpointing for Nemotron-H (#14001) Signed-off-by: Sanju C Sudhakaran <scsudhakaran@nvidia.com> * Cherry pick ea4b47f (#13896) * perf scripts updates (#13456) * gb200 recommended cfgs csv fix Signed-off-by: Malay Nagda <malayn@nvidia.com> * 495b h100 fix Signed-off-by: Malay Nagda <malayn@nvidia.com> * gb200 79b bf16 20 layers recompute Signed-off-by: Malay Nagda <malayn@nvidia.com> * csv format fix Signed-off-by: Malay Nagda <malayn@nvidia.com> * csv format fix Signed-off-by: Malay Nagda <malayn@nvidia.com> * 70b, 340b no fsdp Signed-off-by: Malay Nagda <malayn@nvidia.com> * dsv3 perf mode Signed-off-by: Malay Nagda <malayn@nvidia.com> * Apply isort and black reformatting Signed-off-by: malay-nagda <malay-nagda@users.noreply.github.com> * dsv3 perf mode peft Signed-off-by: Malay Nagda <malayn@nvidia.com> * Apply isort and black reformatting Signed-off-by: malay-nagda <malay-nagda@users.noreply.github.com> * dsv3 callback Signed-off-by: Malay Nagda <malayn@nvidia.com> * Apply isort and black reformatting Signed-off-by: malay-nagda <malay-nagda@users.noreply.github.com> * dsv3 callback Signed-off-by: Malay Nagda <malayn@nvidia.com> * cudagraphs Signed-off-by: Malay Nagda <malayn@nvidia.com> --------- Signed-off-by: Malay Nagda <malayn@nvidia.com> Signed-off-by: malay-nagda <malay-nagda@users.noreply.github.com> Signed-off-by: malay-nagda <malayn@nvidia.com> Co-authored-by: malay-nagda <malay-nagda@users.noreply.github.com> * import missing callbacks in deepseek recipe --------- Signed-off-by: Malay Nagda <malayn@nvidia.com> Signed-off-by: malay-nagda <malay-nagda@users.noreply.github.com> Signed-off-by: malay-nagda <malayn@nvidia.com> Co-authored-by: malay-nagda <malayn@nvidia.com> Co-authored-by: malay-nagda <malay-nagda@users.noreply.github.com> Co-authored-by: mollys <mollys@mollys.nvidia.com> * Onboard LLAMA4 Maverick Finetuning(SFT) with SQUAD Dataset Download Fix (#13926) * Onboard LLAMA4 Maverick Finetuning(SFT) with SQUAD Dataset Download Fix Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com> * Tweaks for llama4_e128 Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com> * Adding flags for skipping the separate SLURM jobs Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com> * Arg parse changes and tweaks to remove squad dataset check Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com> * Reverting the args for separate SLURM jobs as there is a dependency (run.Partial) with the finetune job Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com> * Removing NullTokenizer due to compatability Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com> * Separate args to have control over the 3 SLURM jobs Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com> * Enabling TokenDropCallback and tp_comm_overlap Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com> * Changes to introduce flags for enabling/disabling the 3 SLURM jobs Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com> * Changing the exp_name based on the SLURM job being run Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com> * Argparse Changes Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com> * Fix for standalone checkpoint and dataload jobs Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com> * Removing NUMA Factor error for dataset and checkpoint download job Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com> * Fix for CUDA Graph error in this version Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com> * Changes to peft_scheme Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com> * Changes to set peft_scheme to None Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com> * Replacing the file name from finetune_ to sft_ Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com> * Updates to exp_name format Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com> * Reverting defaults for --finetuning arg to also include lora Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com> * Tweak comment(s) in the finetuine llama4 e128 script Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com> * Reverting the recommended config order change in for b200 Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com> * Apply isort and black reformatting Signed-off-by: rhmukundan <rhmukundan@users.noreply.github.com> --------- Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com> Signed-off-by: rhmukundan <rhmukundan@users.noreply.github.com> Co-authored-by: rhmukundan <rhmukundan@users.noreply.github.com> * Add profiling changes (#13484) * add profiling changes Signed-off-by: Aishwarya Bhandare <abhandare@login-eos01.eos.clusters.nvidia.com> * Apply isort and black reformatting Signed-off-by: ashbhandare <ashbhandare@users.noreply.github.com> * More model changes Signed-off-by: Aishwarya Bhandare <abhandare@login-eos01.eos.clusters.nvidia.com> * Apply isort and black reformatting Signed-off-by: ashbhandare <ashbhandare@users.noreply.github.com> --------- Signed-off-by: Aishwarya Bhandare <abhandare@login-eos01.eos.clusters.nvidia.com> Signed-off-by: ashbhandare <ashbhandare@users.noreply.github.com> Co-authored-by: Aishwarya Bhandare <abhandare@login-eos01.eos.clusters.nvidia.com> Co-authored-by: ashbhandare <ashbhandare@users.noreply.github.com> * Port nemotron 25.04 patch to r3.2.0 based llmb-nemo (#13533) * port nemotron patch to r3.2.0 based llmb-nemo Signed-off-by: Barys Dubauski <bdubauski@nvdia.com> * update template for experiment names Signed-off-by: Barys Dubauski <bdubauski@nvdia.com> * review based updates Signed-off-by: Barys Dubauski <bdubauski@nvdia.com> --------- Signed-off-by: Barys Dubauski <bdubauski@nvdia.com> Co-authored-by: Barys Dubauski <bdubauski@nvdia.com> * Port run-ai patch to llmb-nemo branch (#13573) * Port run-ai patch to llmb-nemo branch Signed-off-by: Barys Dubauski <bdubauski@nvdia.com> * Apply isort and black reformatting Signed-off-by: bdubauski <bdubauski@users.noreply.github.com> --------- Signed-off-by: Barys Dubauski <bdubauski@nvdia.com> Signed-off-by: bdubauski <bdubauski@users.noreply.github.com> Co-authored-by: Barys Dubauski <bdubauski@nvdia.com> Co-authored-by: bdubauski <bdubauski@users.noreply.github.com> * Add grok recipe (#13586) * Add grok recipe Signed-off-by: mollys <mollys@mollys.nvidia.com> * Apply isort and black reformatting Signed-off-by: nv-mollys <nv-mollys@users.noreply.github.com> Signed-off-by: mollys <mollys@mollys.nvidia.com> * Add copyright header Signed-off-by: mollys <mollys@mollys.nvidia.com> --------- Signed-off-by: mollys <mollys@mollys.nvidia.com> Signed-off-by: nv-mollys <nv-mollys@users.noreply.github.com> Co-authored-by: mollys <mollys@mollys.nvidia.com> Co-authored-by: nv-mollys <nv-mollys@users.noreply.github.com> * transformers_offline=0 and profile changes to llama3.1 405b (#13655) * transformers_offline=0 and profile changes to llama3.1 405b Signed-off-by: Sebastian Alberdi <salberdi@salberdi-mlt.client.nvidia.com> * Apply isort and black reformatting Signed-off-by: salberdi-nvidia <salberdi-nvidia@users.noreply.github.com> * nccl added Signed-off-by: Sebastian Alberdi <salberdi@salberdi-mlt.client.nvidia.com> * Apply isort and black reformatting Signed-off-by: salberdi-nvidia <salberdi-nvidia@users.noreply.github.com> --------- Signed-off-by: Sebastian Alberdi <salberdi@salberdi-mlt.client.nvidia.com> Signed-off-by: salberdi-nvidia <salberdi-nvidia@users.noreply.github.com> Co-authored-by: Sebastian Alberdi <salberdi@login-preos02.a51.clusters.nvidia.com> Co-authored-by: salberdi-nvidia <salberdi-nvidia@users.noreply.github.com> Co-authored-by: Sebastian Alberdi <salberdi@login-ptyche01.ptyche.clusters.nvidia.com> * Add perf recipe script for Nemotron-H-56B (#13691) Signed-off-by: Sanju C Sudhakaran <scsudhakaran@nvidia.com> * Pretraining Deepseek changes for LLMB (#13752) * working changes Signed-off-by: ashbhandare <abhandare@nvidia.com> * cleanup Signed-off-by: ashbhandare <abhandare@nvidia.com> * Apply isort and black reformatting Signed-off-by: ashbhandare <ashbhandare@users.noreply.github.com> Signed-off-by: ashbhandare <abhandare@nvidia.com> * make profiling steps overridable Signed-off-by: ashbhandare <abhandare@nvidia.com> * add nccl trace ability, cleanup Signed-off-by: ashbhandare <abhandare@nvidia.com> * Apply isort and black reformatting Signed-off-by: ashbhandare <ashbhandare@users.noreply.github.com> Signed-off-by: ashbhandare <abhandare@nvidia.com> --------- Signed-off-by: ashbhandare <abhandare@nvidia.com> Signed-off-by: ashbhandare <ashbhandare@users.noreply.github.com> Co-authored-by: Aishwarya Bhandare <abhandare@login-ptyche01.ptyche.clusters.nvidia.com> Co-authored-by: ashbhandare <ashbhandare@users.noreply.github.com> * Adding FP8 Default Configs for LLAMA4 Maverick (#13698) Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com> * Changing the tokenizer from Scout to Maverick in the pretrain LLAMA4 LLM Recipe (#13664) Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com> * Tweaking LLama4 Maverick PreTrain file to adapt to the user configs parameter format (#13690) Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com> * Grok Nvbug 5311566 (#13765) * remove unnecessary nemo root check * remove comment and unused packages --------- Co-authored-by: mollys <mollys@mollys.nvidia.com> * Grok nccl trace fix (#13769) * remove unnecessary nemo root check * remove comment and unused packages * transformers online * fix env vars * setting transformers offline here doesn't work --------- Co-authored-by: mollys <mollys@mollys.nvidia.com> * Fix for config params in pretrain llama4 e128 (#13764) * Fix for config params in pretrain llama4 e128 Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com> * Ignoring unrelated configs Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com> * Cleanup of configs Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com> * Adding all the params in get_user_configs func Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com> --------- Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com> * Nsys Tweaks to llama4 pretrain (#13778) * Removign hardcoding of nsys profiling ranges Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com> * Adding NCCL Trace support for pretrain recipe (llama4) Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com> * Apply isort and black reformatting Signed-off-by: rhmukundan <rhmukundan@users.noreply.github.com> --------- Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com> Signed-off-by: rhmukundan <rhmukundan@users.noreply.github.com> Co-authored-by: rhmukundan <rhmukundan@users.noreply.github.com> * Disable checkpointing for Nemotron-H (#13786) * Disable checkpointing for Nemotron-H Signed-off-by: Sanju C Sudhakaran <scsudhakaran@nvidia.com> * Nemotron-H NCCL trace support Signed-off-by: Sanju C Sudhakaran <scsudhakaran@nvidia.com> * Apply isort and black reformatting Signed-off-by: scsudhakaran <scsudhakaran@users.noreply.github.com> --------- Signed-off-by: Sanju C Sudhakaran <scsudhakaran@nvidia.com> Signed-off-by: scsudhakaran <scsudhakaran@users.noreply.github.com> Co-authored-by: scsudhakaran <scsudhakaran@users.noreply.github.com> * Llmb nemo r2.3.0 (#13806) * set NCCL_NET_GDR_LEVEL=PHB for deepseekv3, grok1_314b, llama31_405b, llama4_e128, nemotron4_15b+340b, nemotronh_56b Signed-off-by: Sebastian Alberdi <salberdi@login-ptyche01.ptyche.clusters.nvidia.com> * Apply isort and black reformatting Signed-off-by: salberdi-nvidia <salberdi-nvidia@users.noreply.github.com> * Apply isort and black reformatting Signed-off-by: salberdi-nvidia <salberdi-nvidia@users.noreply.github.com> --------- Signed-off-by: Sebastian Alberdi <salberdi@login-ptyche01.ptyche.clusters.nvidia.com> Signed-off-by: salberdi-nvidia <salberdi-nvidia@users.noreply.github.com> Co-authored-by: Sebastian Alberdi <salberdi@login-ptyche01.ptyche.clusters.nvidia.com> Co-authored-by: salberdi-nvidia <salberdi-nvidia@users.noreply.github.com> * add all environment variables to container environment (#13808) Co-authored-by: mollys <mollys@mollys.nvidia.com> * fix numactl (#13809) Co-authored-by: mollys <mollys@mollys.nvidia.com> * Llmb nemo r2.3.0 (#13807) * set NCCL_NET_GDR_LEVEL=PHB for deepseekv3, grok1_314b, llama31_405b, llama4_e128, nemotron4_15b+340b, nemotronh_56b Signed-off-by: Sebastian Alberdi <salberdi@login-ptyche01.ptyche.clusters.nvidia.com> * Apply isort and black reformatting Signed-off-by: salberdi-nvidia <salberdi-nvidia@users.noreply.github.com> * Apply isort and black reformatting Signed-off-by: salberdi-nvidia <salberdi-nvidia@users.noreply.github.com> * made experiment naming match standard Signed-off-by: Sebastian Alberdi <salberdi@login-ptyche01.ptyche.clusters.nvidia.com> * Apply isort and black reformatting Signed-off-by: salberdi-nvidia <salberdi-nvidia@users.noreply.github.com> * standardized exp_name for relevant workloads Signed-off-by: Sebastian Alberdi <salberdi@login-ptyche01.ptyche.clusters.nvidia.com> * Apply isort and black reformatting Signed-off-by: salberdi-nvidia <salberdi-nvidia@users.noreply.github.com> --------- Signed-off-by: Sebastian Alberdi <salberdi@login-ptyche01.ptyche.clusters.nvidia.com> Signed-off-by: salberdi-nvidia <salberdi-nvidia@users.noreply.github.com> Co-authored-by: Sebastian Alberdi <salberdi@login-ptyche01.ptyche.clusters.nvidia.com> Co-authored-by: salberdi-nvidia <salberdi-nvidia@users.noreply.github.com> * fixing QA checkpoint bug for nemotron4 (#13843) * fixing QA checkpoint bug for nemotron4 * Apply isort and black reformatting Signed-off-by: sshiddib <sshiddib@users.noreply.github.com> * arg name change * Apply isort and black reformatting Signed-off-by: sshiddib <sshiddib@users.noreply.github.com> --------- Signed-off-by: sshiddib <sshiddib@users.noreply.github.com> Co-authored-by: Sharada Shiddibhavi <sshiddibhavi@cw-dfw-cs-001-vscode-02.cm.cluster> Co-authored-by: sshiddib <sshiddib@users.noreply.github.com> * Add gpu metrics option (#13882) * gpu metrics option Signed-off-by: ashbhandare <abhandare@nvidia.com> * Apply isort and black reformatting Signed-off-by: ashbhandare <ashbhandare@users.noreply.github.com> * specify nemo run commit Signed-off-by: ashbhandare <abhandare@nvidia.com> * Apply isort and black reformatting Signed-off-by: artbataev <artbataev@users.noreply.github.com> * Fix linting error Signed-off-by: ashbhandare <abhandare@nvidia.com> --------- Signed-off-by: ashbhandare <abhandare@nvidia.com> Signed-off-by: ashbhandare <ashbhandare@users.noreply.github.com> Signed-off-by: artbataev <artbataev@users.noreply.github.com> Co-authored-by: ashbhandare <ashbhandare@users.noreply.github.com> Co-authored-by: artbataev <artbataev@users.noreply.github.com> * LLAMA4 Maverick SFT Recipe + SQUAD Dataset Download Error Fix Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com> * Revert "LLAMA4 Maverick SFT Recipe + SQUAD Dataset Download Error Fix" This reverts commit 755fd36. * fix nemo/collections/llm/recipes/__init__.py * fix nemo/collections/llm/recipes/deepseek_v3.py * new line * fix nemo/collections/llm/recipes/llama4_e128.py * fix scripts/performance/llm/finetune_llama4_e128.py * small updates for grok * modified: scripts/performance/llm/pretrain_grok1_314b.py modified: scripts/performance/llm/pretrain_nemotron4_340b.py * manually add util changes to helpers.py and executors.py * Fix in Nemotron-H script (#14251) * Fix in Nemotron-H script Signed-off-by: Sanju C Sudhakaran <scsudhakaran@nvidia.com> * Apply isort and black reformatting Signed-off-by: scsudhakaran <scsudhakaran@users.noreply.github.com> * Apply isort and black reformatting Signed-off-by: artbataev <artbataev@users.noreply.github.com> * Fix in Nemotron-H perf script Signed-off-by: Sanju C Sudhakaran <scsudhakaran@nvidia.com> --------- Signed-off-by: Sanju C Sudhakaran <scsudhakaran@nvidia.com> Signed-off-by: scsudhakaran <scsudhakaran@users.noreply.github.com> Signed-off-by: artbataev <artbataev@users.noreply.github.com> Co-authored-by: scsudhakaran <scsudhakaran@users.noreply.github.com> Co-authored-by: artbataev <artbataev@users.noreply.github.com> * updated with some things from NeMo main (double_buffer) (#14305) * updated with some things from NeMo main (double_buffer) Signed-off-by: Sebastian Alberdi <salberdi@login-ptyche01.ptyche.clusters.nvidia.com> * Apply isort and black reformatting Signed-off-by: salberdi-nvidia <salberdi-nvidia@users.noreply.github.com> * Apply isort and black reformatting Signed-off-by: artbataev <artbataev@users.noreply.github.com> --------- Signed-off-by: Sebastian Alberdi <salberdi@login-ptyche01.ptyche.clusters.nvidia.com> Signed-off-by: salberdi-nvidia <salberdi-nvidia@users.noreply.github.com> Signed-off-by: artbataev <artbataev@users.noreply.github.com> Co-authored-by: Sebastian Alberdi <salberdi@login-ptyche01.ptyche.clusters.nvidia.com> Co-authored-by: salberdi-nvidia <salberdi-nvidia@users.noreply.github.com> Co-authored-by: artbataev <artbataev@users.noreply.github.com> * Took out cudNN lines b/c of regression with cuDNN normalization kernel (#14360) * added conditional cudnn to align with nemo main (#14324) * added conditional cudnn to align with nemo main Signed-off-by: Sebastian Alberdi <salberdi@cw-dfw-cs-001-login-02.cm.cluster> * fixed num optimizer instances bug Signed-off-by: Sebastian Alberdi <salberdi@login-ptyche01.ptyche.clusters.nvidia.com> --------- Signed-off-by: Sebastian Alberdi <salberdi@cw-dfw-cs-001-login-02.cm.cluster> Signed-off-by: Sebastian Alberdi <salberdi@login-ptyche01.ptyche.clusters.nvidia.com> Co-authored-by: Sebastian Alberdi <salberdi@cw-dfw-cs-001-login-02.cm.cluster> Co-authored-by: Sebastian Alberdi <salberdi@login-ptyche01.ptyche.clusters.nvidia.com> * Adds pyxis container writable and no mount home flags (#14386) * Add pyxis flags for writable and no-mount home. Signed-off-by: Alex Filby <afilby@nvidia.com> * Apply isort and black reformatting Signed-off-by: sudostock <sudostock@users.noreply.github.com> --------- Signed-off-by: Alex Filby <afilby@nvidia.com> Signed-off-by: sudostock <sudostock@users.noreply.github.com> Co-authored-by: sudostock <sudostock@users.noreply.github.com> * Update DeepSeek-V3 perf scripts (#14377) * Fix callbacks in DSV3 script (#14350) Signed-off-by: Guyue Huang <guyueh@nvidia.com> Signed-off-by: Sanju C Sudhakaran <scsudhakaran@nvidia.com> * Changes to grok to alleviate error: TypeError: '>' not supported betw… (#14326) * Changes to grok to alleviate error: TypeError: '>' not supported between instances of 'str' and 'int' * Apply isort and black reformatting Signed-off-by: rsalagame-nvidia <rsalagame-nvidia@users.noreply.github.com> * Made the changes where it's not default values hard coded. User can change thru cli * Apply isort and black reformatting Signed-off-by: rsalagame-nvidia <rsalagame-nvidia@users.noreply.github.com> * made suggested changes. Verified successful. * Apply isort and black reformatting Signed-off-by: rsalagame-nvidia <rsalagame-nvidia@users.noreply.github.com> * Made suggested change. --------- Signed-off-by: rsalagame-nvidia <rsalagame-nvidia@users.noreply.github.com> Co-authored-by: rsalagame-nvidia <rsalagame-nvidia@users.noreply.github.com> * Make VBoost activation conditional (#14453) * Refactor performance scripts to use build_perf_env_plugin function * Replaced direct instantiation of PerfEnvPlugin with build_perf_env_plugin in multiple LLM finetuning and pretraining scripts for consistency and maintainability. * Added build_perf_env_plugin function to helpers.py to streamline performance environment setup based on GPU and pipeline parallelism settings. This change enhances code readability and reduces redundancy across scripts. * control vboost enablement via cli * Update finetune_llama4_e128.py to import build_perf_env_plugin function * Added the build_perf_env_plugin import to enhance performance environment setup consistency across scripts. This change aligns with recent refactoring efforts to streamline performance script management. --------- Co-authored-by: Barys Dubauski <bdubauski@nvdia.com> * turned off tp overlap comms for >128 gpus on gb200 so jobs are functi… (#14460) * turned off tp overlap comms for >128 gpus on gb200 so jobs are functional Signed-off-by: Sebastian Alberdi <salberdi@nvidia.com> * Apply isort and black reformatting Signed-off-by: salberdi-nvidia <salberdi-nvidia@users.noreply.github.com> --------- Signed-off-by: Sebastian Alberdi <salberdi@nvidia.com> Signed-off-by: salberdi-nvidia <salberdi-nvidia@users.noreply.github.com> Co-authored-by: salberdi-nvidia <salberdi-nvidia@users.noreply.github.com> * Remove NCCL tracing option and clean up imports in performance scripts (#14467) * Remove NCCL tracing option and clean up imports in performance scripts. Updated multiple LLM finetuning and pretraining scripts to eliminate the use of PerfEnvPlugin, enhancing consistency and maintainability. * Apply isort and black reformatting Signed-off-by: bdubauski <bdubauski@users.noreply.github.com> --------- Signed-off-by: bdubauski <bdubauski@users.noreply.github.com> Co-authored-by: Barys Dubauski <bdubauski@nvdia.com> Co-authored-by: bdubauski <bdubauski@users.noreply.github.com> * Disable tp_comm_overlap for 512 gpus on GB200 (#14474) ...to fix functionality issue Signed-off-by: Sanju C Sudhakaran <scsudhakaran@nvidia.com> * Workaround for MXFP8 functionality issue (#14426) * Workaround for MXFP8 functionality issue Signed-off-by: Sanju C Sudhakaran <scsudhakaran@nvidia.com> * Apply isort and black reformatting Signed-off-by: scsudhakaran <scsudhakaran@users.noreply.github.com> --------- Signed-off-by: Sanju C Sudhakaran <scsudhakaran@nvidia.com> Signed-off-by: scsudhakaran <scsudhakaran@users.noreply.github.com> Co-authored-by: scsudhakaran <scsudhakaran@users.noreply.github.com> * previous commit was buggy (#14477) * previous was buggy Signed-off-by: Sebastian Alberdi <salberdi@nvidia.com> * Apply isort and black reformatting Signed-off-by: salberdi-nvidia <salberdi-nvidia@users.noreply.github.com> --------- Signed-off-by: Sebastian Alberdi <salberdi@nvidia.com> Signed-off-by: salberdi-nvidia <salberdi-nvidia@users.noreply.github.com> Co-authored-by: salberdi-nvidia <salberdi-nvidia@users.noreply.github.com> * checkpoint save/load functionality with HF token (#14538) * checkpoint save/load functionality with HF token * Apply isort and black reformatting Signed-off-by: rsalagame-nvidia <rsalagame-nvidia@users.noreply.github.com> * using use_hf_tokenizer * reverting back to hf_token --------- Signed-off-by: rsalagame-nvidia <rsalagame-nvidia@users.noreply.github.com> Co-authored-by: rsalagame-nvidia <rsalagame-nvidia@users.noreply.github.com> * added hf import for 15b/340b pretrain (#14565) * Llmb nemo r2.4.0 (#14607) * Update mixed_precision.py Signed-off-by: guyueh1 <140554423+guyueh1@users.noreply.github.com> * Fix reuse_grad_buf_for_mxfp8_param_ag for mxfp8 Signed-off-by: Guyue Huang <guyueh@nvidia.com> --------- Signed-off-by: guyueh1 <140554423+guyueh1@users.noreply.github.com> Signed-off-by: Guyue Huang <guyueh@nvidia.com> Co-authored-by: guyueh1 <140554423+guyueh1@users.noreply.github.com> Co-authored-by: Guyue Huang <guyueh@nvidia.com> * Apply isort and black reformatting Signed-off-by: nv-mollys <nv-mollys@users.noreply.github.com> --------- Signed-off-by: Sanju C Sudhakaran <scsudhakaran@nvidia.com> Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com> Signed-off-by: Malay Nagda <malayn@nvidia.com> Signed-off-by: malay-nagda <malay-nagda@users.noreply.github.com> Signed-off-by: malay-nagda <malayn@nvidia.com> Signed-off-by: rhmukundan <rhmukundan@users.noreply.github.com> Signed-off-by: Aishwarya Bhandare <abhandare@login-eos01.eos.clusters.nvidia.com> Signed-off-by: ashbhandare <ashbhandare@users.noreply.github.com> Signed-off-by: Barys Dubauski <bdubauski@nvdia.com> Signed-off-by: bdubauski <bdubauski@users.noreply.github.com> Signed-off-by: mollys <mollys@mollys.nvidia.com> Signed-off-by: nv-mollys <nv-mollys@users.noreply.github.com> Signed-off-by: Sebastian Alberdi <salberdi@salberdi-mlt.client.nvidia.com> Signed-off-by: salberdi-nvidia <salberdi-nvidia@users.noreply.github.com> Signed-off-by: ashbhandare <abhandare@nvidia.com> Signed-off-by: scsudhakaran <scsudhakaran@users.noreply.github.com> Signed-off-by: Sebastian Alberdi <salberdi@login-ptyche01.ptyche.clusters.nvidia.com> Signed-off-by: sshiddib <sshiddib@users.noreply.github.com> Signed-off-by: artbataev <artbataev@users.noreply.github.com> Signed-off-by: Sebastian Alberdi <salberdi@cw-dfw-cs-001-login-02.cm.cluster> Signed-off-by: Alex Filby <afilby@nvidia.com> Signed-off-by: sudostock <sudostock@users.noreply.github.com> Signed-off-by: Guyue Huang <guyueh@nvidia.com> Signed-off-by: rsalagame-nvidia <rsalagame-nvidia@users.noreply.github.com> Signed-off-by: Sebastian Alberdi <salberdi@nvidia.com> Signed-off-by: guyueh1 <140554423+guyueh1@users.noreply.github.com> Signed-off-by: nv-mollys <149841089+nv-mollys@users.noreply.github.com> Co-authored-by: scsudhakaran <scsudhakaran@nvidia.com> Co-authored-by: rhmukundan <102543536+rhmukundan@users.noreply.github.com> Co-authored-by: malay-nagda <malayn@nvidia.com> Co-authored-by: malay-nagda <malay-nagda@users.noreply.github.com> Co-authored-by: mollys <mollys@mollys.nvidia.com> Co-authored-by: rhmukundan <rhmukundan@users.noreply.github.com> Co-authored-by: ashbhandare <ash.bhandare@gmail.com> Co-authored-by: Aishwarya Bhandare <abhandare@login-eos01.eos.clusters.nvidia.com> Co-authored-by: ashbhandare <ashbhandare@users.noreply.github.com> Co-authored-by: bdubauski <80418713+bdubauski@users.noreply.github.com> Co-authored-by: Barys Dubauski <bdubauski@nvdia.com> Co-authored-by: bdubauski <bdubauski@users.noreply.github.com> Co-authored-by: nv-mollys <nv-mollys@users.noreply.github.com> Co-authored-by: salberdi-nvidia <salberdi@nvidia.com> Co-authored-by: Sebastian Alberdi <salberdi@login-preos02.a51.clusters.nvidia.com> Co-authored-by: salberdi-nvidia <salberdi-nvidia@users.noreply.github.com> Co-authored-by: Sebastian Alberdi <salberdi@login-ptyche01.ptyche.clusters.nvidia.com> Co-authored-by: ashbhandare <abhandare@nvidia.com> Co-authored-by: Aishwarya Bhandare <abhandare@login-ptyche01.ptyche.clusters.nvidia.com> Co-authored-by: scsudhakaran <scsudhakaran@users.noreply.github.com> Co-authored-by: Sharada Shiddibhavi <sshiddibhavi@nvidia.com> Co-authored-by: Sharada Shiddibhavi <sshiddibhavi@cw-dfw-cs-001-vscode-02.cm.cluster> Co-authored-by: sshiddib <sshiddib@users.noreply.github.com> Co-authored-by: artbataev <artbataev@users.noreply.github.com> Co-authored-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com> Co-authored-by: rsalagame-nvidia <rsalagame@nvidia.com> Co-authored-by: Sebastian Alberdi <salberdi@cw-dfw-cs-001-login-02.cm.cluster> Co-authored-by: Alex Filby <alexfilby@gmail.com> Co-authored-by: sudostock <sudostock@users.noreply.github.com> Co-authored-by: rsalagame-nvidia <rsalagame-nvidia@users.noreply.github.com> Co-authored-by: guyueh1 <140554423+guyueh1@users.noreply.github.com> Co-authored-by: Guyue Huang <guyueh@nvidia.com>
nv-mollys
added a commit
that referenced
this pull request
Sep 3, 2025
* Set attention backend to "auto" for Nemotron-H (#14042) Signed-off-by: Sanju C Sudhakaran <scsudhakaran@nvidia.com> * Adding TFLOPS per GPU Support for Finetuning (#14048) Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com> * Enable optimizations for Nemotron-H (#13915) Signed-off-by: Sanju C Sudhakaran <scsudhakaran@nvidia.com> * Disable checkpointing for Nemotron-H (#14001) Signed-off-by: Sanju C Sudhakaran <scsudhakaran@nvidia.com> * Cherry pick ea4b47f (#13896) * perf scripts updates (#13456) * gb200 recommended cfgs csv fix Signed-off-by: Malay Nagda <malayn@nvidia.com> * 495b h100 fix Signed-off-by: Malay Nagda <malayn@nvidia.com> * gb200 79b bf16 20 layers recompute Signed-off-by: Malay Nagda <malayn@nvidia.com> * csv format fix Signed-off-by: Malay Nagda <malayn@nvidia.com> * csv format fix Signed-off-by: Malay Nagda <malayn@nvidia.com> * 70b, 340b no fsdp Signed-off-by: Malay Nagda <malayn@nvidia.com> * dsv3 perf mode Signed-off-by: Malay Nagda <malayn@nvidia.com> * Apply isort and black reformatting Signed-off-by: malay-nagda <malay-nagda@users.noreply.github.com> * dsv3 perf mode peft Signed-off-by: Malay Nagda <malayn@nvidia.com> * Apply isort and black reformatting Signed-off-by: malay-nagda <malay-nagda@users.noreply.github.com> * dsv3 callback Signed-off-by: Malay Nagda <malayn@nvidia.com> * Apply isort and black reformatting Signed-off-by: malay-nagda <malay-nagda@users.noreply.github.com> * dsv3 callback Signed-off-by: Malay Nagda <malayn@nvidia.com> * cudagraphs Signed-off-by: Malay Nagda <malayn@nvidia.com> --------- Signed-off-by: Malay Nagda <malayn@nvidia.com> Signed-off-by: malay-nagda <malay-nagda@users.noreply.github.com> Signed-off-by: malay-nagda <malayn@nvidia.com> Co-authored-by: malay-nagda <malay-nagda@users.noreply.github.com> * import missing callbacks in deepseek recipe --------- Signed-off-by: Malay Nagda <malayn@nvidia.com> Signed-off-by: malay-nagda <malay-nagda@users.noreply.github.com> Signed-off-by: malay-nagda <malayn@nvidia.com> Co-authored-by: malay-nagda <malayn@nvidia.com> Co-authored-by: malay-nagda <malay-nagda@users.noreply.github.com> Co-authored-by: mollys <mollys@mollys.nvidia.com> * Onboard LLAMA4 Maverick Finetuning(SFT) with SQUAD Dataset Download Fix (#13926) * Onboard LLAMA4 Maverick Finetuning(SFT) with SQUAD Dataset Download Fix Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com> * Tweaks for llama4_e128 Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com> * Adding flags for skipping the separate SLURM jobs Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com> * Arg parse changes and tweaks to remove squad dataset check Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com> * Reverting the args for separate SLURM jobs as there is a dependency (run.Partial) with the finetune job Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com> * Removing NullTokenizer due to compatability Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com> * Separate args to have control over the 3 SLURM jobs Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com> * Enabling TokenDropCallback and tp_comm_overlap Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com> * Changes to introduce flags for enabling/disabling the 3 SLURM jobs Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com> * Changing the exp_name based on the SLURM job being run Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com> * Argparse Changes Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com> * Fix for standalone checkpoint and dataload jobs Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com> * Removing NUMA Factor error for dataset and checkpoint download job Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com> * Fix for CUDA Graph error in this version Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com> * Changes to peft_scheme Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com> * Changes to set peft_scheme to None Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com> * Replacing the file name from finetune_ to sft_ Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com> * Updates to exp_name format Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com> * Reverting defaults for --finetuning arg to also include lora Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com> * Tweak comment(s) in the finetuine llama4 e128 script Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com> * Reverting the recommended config order change in for b200 Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com> * Apply isort and black reformatting Signed-off-by: rhmukundan <rhmukundan@users.noreply.github.com> --------- Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com> Signed-off-by: rhmukundan <rhmukundan@users.noreply.github.com> Co-authored-by: rhmukundan <rhmukundan@users.noreply.github.com> * Add profiling changes (#13484) * add profiling changes Signed-off-by: Aishwarya Bhandare <abhandare@login-eos01.eos.clusters.nvidia.com> * Apply isort and black reformatting Signed-off-by: ashbhandare <ashbhandare@users.noreply.github.com> * More model changes Signed-off-by: Aishwarya Bhandare <abhandare@login-eos01.eos.clusters.nvidia.com> * Apply isort and black reformatting Signed-off-by: ashbhandare <ashbhandare@users.noreply.github.com> --------- Signed-off-by: Aishwarya Bhandare <abhandare@login-eos01.eos.clusters.nvidia.com> Signed-off-by: ashbhandare <ashbhandare@users.noreply.github.com> Co-authored-by: Aishwarya Bhandare <abhandare@login-eos01.eos.clusters.nvidia.com> Co-authored-by: ashbhandare <ashbhandare@users.noreply.github.com> * Port nemotron 25.04 patch to r3.2.0 based llmb-nemo (#13533) * port nemotron patch to r3.2.0 based llmb-nemo Signed-off-by: Barys Dubauski <bdubauski@nvdia.com> * update template for experiment names Signed-off-by: Barys Dubauski <bdubauski@nvdia.com> * review based updates Signed-off-by: Barys Dubauski <bdubauski@nvdia.com> --------- Signed-off-by: Barys Dubauski <bdubauski@nvdia.com> Co-authored-by: Barys Dubauski <bdubauski@nvdia.com> * Port run-ai patch to llmb-nemo branch (#13573) * Port run-ai patch to llmb-nemo branch Signed-off-by: Barys Dubauski <bdubauski@nvdia.com> * Apply isort and black reformatting Signed-off-by: bdubauski <bdubauski@users.noreply.github.com> --------- Signed-off-by: Barys Dubauski <bdubauski@nvdia.com> Signed-off-by: bdubauski <bdubauski@users.noreply.github.com> Co-authored-by: Barys Dubauski <bdubauski@nvdia.com> Co-authored-by: bdubauski <bdubauski@users.noreply.github.com> * Add grok recipe (#13586) * Add grok recipe Signed-off-by: mollys <mollys@mollys.nvidia.com> * Apply isort and black reformatting Signed-off-by: nv-mollys <nv-mollys@users.noreply.github.com> Signed-off-by: mollys <mollys@mollys.nvidia.com> * Add copyright header Signed-off-by: mollys <mollys@mollys.nvidia.com> --------- Signed-off-by: mollys <mollys@mollys.nvidia.com> Signed-off-by: nv-mollys <nv-mollys@users.noreply.github.com> Co-authored-by: mollys <mollys@mollys.nvidia.com> Co-authored-by: nv-mollys <nv-mollys@users.noreply.github.com> * transformers_offline=0 and profile changes to llama3.1 405b (#13655) * transformers_offline=0 and profile changes to llama3.1 405b Signed-off-by: Sebastian Alberdi <salberdi@salberdi-mlt.client.nvidia.com> * Apply isort and black reformatting Signed-off-by: salberdi-nvidia <salberdi-nvidia@users.noreply.github.com> * nccl added Signed-off-by: Sebastian Alberdi <salberdi@salberdi-mlt.client.nvidia.com> * Apply isort and black reformatting Signed-off-by: salberdi-nvidia <salberdi-nvidia@users.noreply.github.com> --------- Signed-off-by: Sebastian Alberdi <salberdi@salberdi-mlt.client.nvidia.com> Signed-off-by: salberdi-nvidia <salberdi-nvidia@users.noreply.github.com> Co-authored-by: Sebastian Alberdi <salberdi@login-preos02.a51.clusters.nvidia.com> Co-authored-by: salberdi-nvidia <salberdi-nvidia@users.noreply.github.com> Co-authored-by: Sebastian Alberdi <salberdi@login-ptyche01.ptyche.clusters.nvidia.com> * Add perf recipe script for Nemotron-H-56B (#13691) Signed-off-by: Sanju C Sudhakaran <scsudhakaran@nvidia.com> * Pretraining Deepseek changes for LLMB (#13752) * working changes Signed-off-by: ashbhandare <abhandare@nvidia.com> * cleanup Signed-off-by: ashbhandare <abhandare@nvidia.com> * Apply isort and black reformatting Signed-off-by: ashbhandare <ashbhandare@users.noreply.github.com> Signed-off-by: ashbhandare <abhandare@nvidia.com> * make profiling steps overridable Signed-off-by: ashbhandare <abhandare@nvidia.com> * add nccl trace ability, cleanup Signed-off-by: ashbhandare <abhandare@nvidia.com> * Apply isort and black reformatting Signed-off-by: ashbhandare <ashbhandare@users.noreply.github.com> Signed-off-by: ashbhandare <abhandare@nvidia.com> --------- Signed-off-by: ashbhandare <abhandare@nvidia.com> Signed-off-by: ashbhandare <ashbhandare@users.noreply.github.com> Co-authored-by: Aishwarya Bhandare <abhandare@login-ptyche01.ptyche.clusters.nvidia.com> Co-authored-by: ashbhandare <ashbhandare@users.noreply.github.com> * Adding FP8 Default Configs for LLAMA4 Maverick (#13698) Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com> * Changing the tokenizer from Scout to Maverick in the pretrain LLAMA4 LLM Recipe (#13664) Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com> * Tweaking LLama4 Maverick PreTrain file to adapt to the user configs parameter format (#13690) Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com> * Grok Nvbug 5311566 (#13765) * remove unnecessary nemo root check * remove comment and unused packages --------- Co-authored-by: mollys <mollys@mollys.nvidia.com> * Grok nccl trace fix (#13769) * remove unnecessary nemo root check * remove comment and unused packages * transformers online * fix env vars * setting transformers offline here doesn't work --------- Co-authored-by: mollys <mollys@mollys.nvidia.com> * Fix for config params in pretrain llama4 e128 (#13764) * Fix for config params in pretrain llama4 e128 Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com> * Ignoring unrelated configs Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com> * Cleanup of configs Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com> * Adding all the params in get_user_configs func Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com> --------- Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com> * Nsys Tweaks to llama4 pretrain (#13778) * Removign hardcoding of nsys profiling ranges Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com> * Adding NCCL Trace support for pretrain recipe (llama4) Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com> * Apply isort and black reformatting Signed-off-by: rhmukundan <rhmukundan@users.noreply.github.com> --------- Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com> Signed-off-by: rhmukundan <rhmukundan@users.noreply.github.com> Co-authored-by: rhmukundan <rhmukundan@users.noreply.github.com> * Disable checkpointing for Nemotron-H (#13786) * Disable checkpointing for Nemotron-H Signed-off-by: Sanju C Sudhakaran <scsudhakaran@nvidia.com> * Nemotron-H NCCL trace support Signed-off-by: Sanju C Sudhakaran <scsudhakaran@nvidia.com> * Apply isort and black reformatting Signed-off-by: scsudhakaran <scsudhakaran@users.noreply.github.com> --------- Signed-off-by: Sanju C Sudhakaran <scsudhakaran@nvidia.com> Signed-off-by: scsudhakaran <scsudhakaran@users.noreply.github.com> Co-authored-by: scsudhakaran <scsudhakaran@users.noreply.github.com> * Llmb nemo r2.3.0 (#13806) * set NCCL_NET_GDR_LEVEL=PHB for deepseekv3, grok1_314b, llama31_405b, llama4_e128, nemotron4_15b+340b, nemotronh_56b Signed-off-by: Sebastian Alberdi <salberdi@login-ptyche01.ptyche.clusters.nvidia.com> * Apply isort and black reformatting Signed-off-by: salberdi-nvidia <salberdi-nvidia@users.noreply.github.com> * Apply isort and black reformatting Signed-off-by: salberdi-nvidia <salberdi-nvidia@users.noreply.github.com> --------- Signed-off-by: Sebastian Alberdi <salberdi@login-ptyche01.ptyche.clusters.nvidia.com> Signed-off-by: salberdi-nvidia <salberdi-nvidia@users.noreply.github.com> Co-authored-by: Sebastian Alberdi <salberdi@login-ptyche01.ptyche.clusters.nvidia.com> Co-authored-by: salberdi-nvidia <salberdi-nvidia@users.noreply.github.com> * add all environment variables to container environment (#13808) Co-authored-by: mollys <mollys@mollys.nvidia.com> * fix numactl (#13809) Co-authored-by: mollys <mollys@mollys.nvidia.com> * Llmb nemo r2.3.0 (#13807) * set NCCL_NET_GDR_LEVEL=PHB for deepseekv3, grok1_314b, llama31_405b, llama4_e128, nemotron4_15b+340b, nemotronh_56b Signed-off-by: Sebastian Alberdi <salberdi@login-ptyche01.ptyche.clusters.nvidia.com> * Apply isort and black reformatting Signed-off-by: salberdi-nvidia <salberdi-nvidia@users.noreply.github.com> * Apply isort and black reformatting Signed-off-by: salberdi-nvidia <salberdi-nvidia@users.noreply.github.com> * made experiment naming match standard Signed-off-by: Sebastian Alberdi <salberdi@login-ptyche01.ptyche.clusters.nvidia.com> * Apply isort and black reformatting Signed-off-by: salberdi-nvidia <salberdi-nvidia@users.noreply.github.com> * standardized exp_name for relevant workloads Signed-off-by: Sebastian Alberdi <salberdi@login-ptyche01.ptyche.clusters.nvidia.com> * Apply isort and black reformatting Signed-off-by: salberdi-nvidia <salberdi-nvidia@users.noreply.github.com> --------- Signed-off-by: Sebastian Alberdi <salberdi@login-ptyche01.ptyche.clusters.nvidia.com> Signed-off-by: salberdi-nvidia <salberdi-nvidia@users.noreply.github.com> Co-authored-by: Sebastian Alberdi <salberdi@login-ptyche01.ptyche.clusters.nvidia.com> Co-authored-by: salberdi-nvidia <salberdi-nvidia@users.noreply.github.com> * fixing QA checkpoint bug for nemotron4 (#13843) * fixing QA checkpoint bug for nemotron4 * Apply isort and black reformatting Signed-off-by: sshiddib <sshiddib@users.noreply.github.com> * arg name change * Apply isort and black reformatting Signed-off-by: sshiddib <sshiddib@users.noreply.github.com> --------- Signed-off-by: sshiddib <sshiddib@users.noreply.github.com> Co-authored-by: Sharada Shiddibhavi <sshiddibhavi@cw-dfw-cs-001-vscode-02.cm.cluster> Co-authored-by: sshiddib <sshiddib@users.noreply.github.com> * Add gpu metrics option (#13882) * gpu metrics option Signed-off-by: ashbhandare <abhandare@nvidia.com> * Apply isort and black reformatting Signed-off-by: ashbhandare <ashbhandare@users.noreply.github.com> * specify nemo run commit Signed-off-by: ashbhandare <abhandare@nvidia.com> * Apply isort and black reformatting Signed-off-by: artbataev <artbataev@users.noreply.github.com> * Fix linting error Signed-off-by: ashbhandare <abhandare@nvidia.com> --------- Signed-off-by: ashbhandare <abhandare@nvidia.com> Signed-off-by: ashbhandare <ashbhandare@users.noreply.github.com> Signed-off-by: artbataev <artbataev@users.noreply.github.com> Co-authored-by: ashbhandare <ashbhandare@users.noreply.github.com> Co-authored-by: artbataev <artbataev@users.noreply.github.com> * LLAMA4 Maverick SFT Recipe + SQUAD Dataset Download Error Fix Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com> * Revert "LLAMA4 Maverick SFT Recipe + SQUAD Dataset Download Error Fix" This reverts commit 755fd36. * fix nemo/collections/llm/recipes/__init__.py * fix nemo/collections/llm/recipes/deepseek_v3.py * new line * fix nemo/collections/llm/recipes/llama4_e128.py * fix scripts/performance/llm/finetune_llama4_e128.py * small updates for grok * modified: scripts/performance/llm/pretrain_grok1_314b.py modified: scripts/performance/llm/pretrain_nemotron4_340b.py * manually add util changes to helpers.py and executors.py * Fix in Nemotron-H script (#14251) * Fix in Nemotron-H script Signed-off-by: Sanju C Sudhakaran <scsudhakaran@nvidia.com> * Apply isort and black reformatting Signed-off-by: scsudhakaran <scsudhakaran@users.noreply.github.com> * Apply isort and black reformatting Signed-off-by: artbataev <artbataev@users.noreply.github.com> * Fix in Nemotron-H perf script Signed-off-by: Sanju C Sudhakaran <scsudhakaran@nvidia.com> --------- Signed-off-by: Sanju C Sudhakaran <scsudhakaran@nvidia.com> Signed-off-by: scsudhakaran <scsudhakaran@users.noreply.github.com> Signed-off-by: artbataev <artbataev@users.noreply.github.com> Co-authored-by: scsudhakaran <scsudhakaran@users.noreply.github.com> Co-authored-by: artbataev <artbataev@users.noreply.github.com> * updated with some things from NeMo main (double_buffer) (#14305) * updated with some things from NeMo main (double_buffer) Signed-off-by: Sebastian Alberdi <salberdi@login-ptyche01.ptyche.clusters.nvidia.com> * Apply isort and black reformatting Signed-off-by: salberdi-nvidia <salberdi-nvidia@users.noreply.github.com> * Apply isort and black reformatting Signed-off-by: artbataev <artbataev@users.noreply.github.com> --------- Signed-off-by: Sebastian Alberdi <salberdi@login-ptyche01.ptyche.clusters.nvidia.com> Signed-off-by: salberdi-nvidia <salberdi-nvidia@users.noreply.github.com> Signed-off-by: artbataev <artbataev@users.noreply.github.com> Co-authored-by: Sebastian Alberdi <salberdi@login-ptyche01.ptyche.clusters.nvidia.com> Co-authored-by: salberdi-nvidia <salberdi-nvidia@users.noreply.github.com> Co-authored-by: artbataev <artbataev@users.noreply.github.com> * Took out cudNN lines b/c of regression with cuDNN normalization kernel (#14360) * added conditional cudnn to align with nemo main (#14324) * added conditional cudnn to align with nemo main Signed-off-by: Sebastian Alberdi <salberdi@cw-dfw-cs-001-login-02.cm.cluster> * fixed num optimizer instances bug Signed-off-by: Sebastian Alberdi <salberdi@login-ptyche01.ptyche.clusters.nvidia.com> --------- Signed-off-by: Sebastian Alberdi <salberdi@cw-dfw-cs-001-login-02.cm.cluster> Signed-off-by: Sebastian Alberdi <salberdi@login-ptyche01.ptyche.clusters.nvidia.com> Co-authored-by: Sebastian Alberdi <salberdi@cw-dfw-cs-001-login-02.cm.cluster> Co-authored-by: Sebastian Alberdi <salberdi@login-ptyche01.ptyche.clusters.nvidia.com> * Adds pyxis container writable and no mount home flags (#14386) * Add pyxis flags for writable and no-mount home. Signed-off-by: Alex Filby <afilby@nvidia.com> * Apply isort and black reformatting Signed-off-by: sudostock <sudostock@users.noreply.github.com> --------- Signed-off-by: Alex Filby <afilby@nvidia.com> Signed-off-by: sudostock <sudostock@users.noreply.github.com> Co-authored-by: sudostock <sudostock@users.noreply.github.com> * Update DeepSeek-V3 perf scripts (#14377) * Fix callbacks in DSV3 script (#14350) Signed-off-by: Guyue Huang <guyueh@nvidia.com> Signed-off-by: Sanju C Sudhakaran <scsudhakaran@nvidia.com> * Changes to grok to alleviate error: TypeError: '>' not supported betw… (#14326) * Changes to grok to alleviate error: TypeError: '>' not supported between instances of 'str' and 'int' * Apply isort and black reformatting Signed-off-by: rsalagame-nvidia <rsalagame-nvidia@users.noreply.github.com> * Made the changes where it's not default values hard coded. User can change thru cli * Apply isort and black reformatting Signed-off-by: rsalagame-nvidia <rsalagame-nvidia@users.noreply.github.com> * made suggested changes. Verified successful. * Apply isort and black reformatting Signed-off-by: rsalagame-nvidia <rsalagame-nvidia@users.noreply.github.com> * Made suggested change. --------- Signed-off-by: rsalagame-nvidia <rsalagame-nvidia@users.noreply.github.com> Co-authored-by: rsalagame-nvidia <rsalagame-nvidia@users.noreply.github.com> * Make VBoost activation conditional (#14453) * Refactor performance scripts to use build_perf_env_plugin function * Replaced direct instantiation of PerfEnvPlugin with build_perf_env_plugin in multiple LLM finetuning and pretraining scripts for consistency and maintainability. * Added build_perf_env_plugin function to helpers.py to streamline performance environment setup based on GPU and pipeline parallelism settings. This change enhances code readability and reduces redundancy across scripts. * control vboost enablement via cli * Update finetune_llama4_e128.py to import build_perf_env_plugin function * Added the build_perf_env_plugin import to enhance performance environment setup consistency across scripts. This change aligns with recent refactoring efforts to streamline performance script management. --------- Co-authored-by: Barys Dubauski <bdubauski@nvdia.com> * turned off tp overlap comms for >128 gpus on gb200 so jobs are functi… (#14460) * turned off tp overlap comms for >128 gpus on gb200 so jobs are functional Signed-off-by: Sebastian Alberdi <salberdi@nvidia.com> * Apply isort and black reformatting Signed-off-by: salberdi-nvidia <salberdi-nvidia@users.noreply.github.com> --------- Signed-off-by: Sebastian Alberdi <salberdi@nvidia.com> Signed-off-by: salberdi-nvidia <salberdi-nvidia@users.noreply.github.com> Co-authored-by: salberdi-nvidia <salberdi-nvidia@users.noreply.github.com> * Remove NCCL tracing option and clean up imports in performance scripts (#14467) * Remove NCCL tracing option and clean up imports in performance scripts. Updated multiple LLM finetuning and pretraining scripts to eliminate the use of PerfEnvPlugin, enhancing consistency and maintainability. * Apply isort and black reformatting Signed-off-by: bdubauski <bdubauski@users.noreply.github.com> --------- Signed-off-by: bdubauski <bdubauski@users.noreply.github.com> Co-authored-by: Barys Dubauski <bdubauski@nvdia.com> Co-authored-by: bdubauski <bdubauski@users.noreply.github.com> * Disable tp_comm_overlap for 512 gpus on GB200 (#14474) ...to fix functionality issue Signed-off-by: Sanju C Sudhakaran <scsudhakaran@nvidia.com> * Workaround for MXFP8 functionality issue (#14426) * Workaround for MXFP8 functionality issue Signed-off-by: Sanju C Sudhakaran <scsudhakaran@nvidia.com> * Apply isort and black reformatting Signed-off-by: scsudhakaran <scsudhakaran@users.noreply.github.com> --------- Signed-off-by: Sanju C Sudhakaran <scsudhakaran@nvidia.com> Signed-off-by: scsudhakaran <scsudhakaran@users.noreply.github.com> Co-authored-by: scsudhakaran <scsudhakaran@users.noreply.github.com> * previous commit was buggy (#14477) * previous was buggy Signed-off-by: Sebastian Alberdi <salberdi@nvidia.com> * Apply isort and black reformatting Signed-off-by: salberdi-nvidia <salberdi-nvidia@users.noreply.github.com> --------- Signed-off-by: Sebastian Alberdi <salberdi@nvidia.com> Signed-off-by: salberdi-nvidia <salberdi-nvidia@users.noreply.github.com> Co-authored-by: salberdi-nvidia <salberdi-nvidia@users.noreply.github.com> * checkpoint save/load functionality with HF token (#14538) * checkpoint save/load functionality with HF token * Apply isort and black reformatting Signed-off-by: rsalagame-nvidia <rsalagame-nvidia@users.noreply.github.com> * using use_hf_tokenizer * reverting back to hf_token --------- Signed-off-by: rsalagame-nvidia <rsalagame-nvidia@users.noreply.github.com> Co-authored-by: rsalagame-nvidia <rsalagame-nvidia@users.noreply.github.com> * added hf import for 15b/340b pretrain (#14565) * Llmb nemo r2.4.0 (#14607) * Update mixed_precision.py Signed-off-by: guyueh1 <140554423+guyueh1@users.noreply.github.com> * Fix reuse_grad_buf_for_mxfp8_param_ag for mxfp8 Signed-off-by: Guyue Huang <guyueh@nvidia.com> --------- Signed-off-by: guyueh1 <140554423+guyueh1@users.noreply.github.com> Signed-off-by: Guyue Huang <guyueh@nvidia.com> Co-authored-by: guyueh1 <140554423+guyueh1@users.noreply.github.com> Co-authored-by: Guyue Huang <guyueh@nvidia.com> * Apply isort and black reformatting Signed-off-by: nv-mollys <nv-mollys@users.noreply.github.com> --------- Signed-off-by: Sanju C Sudhakaran <scsudhakaran@nvidia.com> Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com> Signed-off-by: Malay Nagda <malayn@nvidia.com> Signed-off-by: malay-nagda <malay-nagda@users.noreply.github.com> Signed-off-by: malay-nagda <malayn@nvidia.com> Signed-off-by: rhmukundan <rhmukundan@users.noreply.github.com> Signed-off-by: Aishwarya Bhandare <abhandare@login-eos01.eos.clusters.nvidia.com> Signed-off-by: ashbhandare <ashbhandare@users.noreply.github.com> Signed-off-by: Barys Dubauski <bdubauski@nvdia.com> Signed-off-by: bdubauski <bdubauski@users.noreply.github.com> Signed-off-by: mollys <mollys@mollys.nvidia.com> Signed-off-by: nv-mollys <nv-mollys@users.noreply.github.com> Signed-off-by: Sebastian Alberdi <salberdi@salberdi-mlt.client.nvidia.com> Signed-off-by: salberdi-nvidia <salberdi-nvidia@users.noreply.github.com> Signed-off-by: ashbhandare <abhandare@nvidia.com> Signed-off-by: scsudhakaran <scsudhakaran@users.noreply.github.com> Signed-off-by: Sebastian Alberdi <salberdi@login-ptyche01.ptyche.clusters.nvidia.com> Signed-off-by: sshiddib <sshiddib@users.noreply.github.com> Signed-off-by: artbataev <artbataev@users.noreply.github.com> Signed-off-by: Sebastian Alberdi <salberdi@cw-dfw-cs-001-login-02.cm.cluster> Signed-off-by: Alex Filby <afilby@nvidia.com> Signed-off-by: sudostock <sudostock@users.noreply.github.com> Signed-off-by: Guyue Huang <guyueh@nvidia.com> Signed-off-by: rsalagame-nvidia <rsalagame-nvidia@users.noreply.github.com> Signed-off-by: Sebastian Alberdi <salberdi@nvidia.com> Signed-off-by: guyueh1 <140554423+guyueh1@users.noreply.github.com> Signed-off-by: nv-mollys <149841089+nv-mollys@users.noreply.github.com> Co-authored-by: scsudhakaran <scsudhakaran@nvidia.com> Co-authored-by: rhmukundan <102543536+rhmukundan@users.noreply.github.com> Co-authored-by: malay-nagda <malayn@nvidia.com> Co-authored-by: malay-nagda <malay-nagda@users.noreply.github.com> Co-authored-by: mollys <mollys@mollys.nvidia.com> Co-authored-by: rhmukundan <rhmukundan@users.noreply.github.com> Co-authored-by: ashbhandare <ash.bhandare@gmail.com> Co-authored-by: Aishwarya Bhandare <abhandare@login-eos01.eos.clusters.nvidia.com> Co-authored-by: ashbhandare <ashbhandare@users.noreply.github.com> Co-authored-by: bdubauski <80418713+bdubauski@users.noreply.github.com> Co-authored-by: Barys Dubauski <bdubauski@nvdia.com> Co-authored-by: bdubauski <bdubauski@users.noreply.github.com> Co-authored-by: nv-mollys <nv-mollys@users.noreply.github.com> Co-authored-by: salberdi-nvidia <salberdi@nvidia.com> Co-authored-by: Sebastian Alberdi <salberdi@login-preos02.a51.clusters.nvidia.com> Co-authored-by: salberdi-nvidia <salberdi-nvidia@users.noreply.github.com> Co-authored-by: Sebastian Alberdi <salberdi@login-ptyche01.ptyche.clusters.nvidia.com> Co-authored-by: ashbhandare <abhandare@nvidia.com> Co-authored-by: Aishwarya Bhandare <abhandare@login-ptyche01.ptyche.clusters.nvidia.com> Co-authored-by: scsudhakaran <scsudhakaran@users.noreply.github.com> Co-authored-by: Sharada Shiddibhavi <sshiddibhavi@nvidia.com> Co-authored-by: Sharada Shiddibhavi <sshiddibhavi@cw-dfw-cs-001-vscode-02.cm.cluster> Co-authored-by: sshiddib <sshiddib@users.noreply.github.com> Co-authored-by: artbataev <artbataev@users.noreply.github.com> Co-authored-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com> Co-authored-by: rsalagame-nvidia <rsalagame@nvidia.com> Co-authored-by: Sebastian Alberdi <salberdi@cw-dfw-cs-001-login-02.cm.cluster> Co-authored-by: Alex Filby <alexfilby@gmail.com> Co-authored-by: sudostock <sudostock@users.noreply.github.com> Co-authored-by: rsalagame-nvidia <rsalagame-nvidia@users.noreply.github.com> Co-authored-by: guyueh1 <140554423+guyueh1@users.noreply.github.com> Co-authored-by: Guyue Huang <guyueh@nvidia.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.