Skip to content

GB200 LLM performance scripts tuning#12791

Merged
erhoo82 merged 17 commits intoNVIDIA-NeMo:mainfrom
guyueh1:gb200_perf_scripts
Apr 7, 2025
Merged

GB200 LLM performance scripts tuning#12791
erhoo82 merged 17 commits intoNVIDIA-NeMo:mainfrom
guyueh1:gb200_perf_scripts

Conversation

@guyueh1
Copy link
Collaborator

@guyueh1 guyueh1 commented Mar 26, 2025

Important

The Update branch button must only be pressed in very rare occassions.
An outdated branch is never blocking the merge of a PR.
Please reach out to the automation team before pressing that button.

What does this PR do ?

Tune the performance benchmarks of LLMs for GB200.

Collection: [Note which collection this PR will affect]

Changelog

  • Add specific line by line info of high level changes in this PR.

Usage

  • You can potentially add a usage example below
# Add a code snippet demonstrating how to use this 

GitHub Actions CI

The Jenkins CI system has been replaced by GitHub Actions self-hosted runners.

The GitHub Actions CI will run automatically when the "Run CICD" label is added to the PR.
To re-run CI remove and add the label again.
To run CI on an untrusted fork, a NeMo user with write access must first click "Approve and run".

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you add or update any necessary documentation?
  • Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
    • Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

  • New Feature
  • Bugfix
  • Documentation

If you haven't finished some of the above items you can still open "Draft" PR.

Who can review?

Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.

Additional Information

  • Related to # (issue)

guyueh1 added 11 commits March 26, 2025 11:33
Signed-off-by: Guyue Huang <guyueh@nvidia.com>
Signed-off-by: Guyue Huang <guyueh@nvidia.com>
Signed-off-by: Guyue Huang <guyueh@nvidia.com>
Signed-off-by: Guyue Huang <guyueh@nvidia.com>
Signed-off-by: Guyue Huang <guyueh@nvidia.com>
Signed-off-by: Guyue Huang <guyueh@nvidia.com>
Signed-off-by: Guyue Huang <guyueh@nvidia.com>
Signed-off-by: Guyue Huang <guyueh@nvidia.com>
Signed-off-by: Guyue Huang <guyueh@nvidia.com>
Signed-off-by: Guyue Huang <guyueh@nvidia.com>
Signed-off-by: Guyue Huang <guyueh@nvidia.com>
@guyueh1 guyueh1 marked this pull request as ready for review April 1, 2025 05:31
guyueh1 added 4 commits April 1, 2025 16:38
Signed-off-by: Guyue Huang <guyueh@nvidia.com>
Signed-off-by: Guyue Huang <guyueh@nvidia.com>
Signed-off-by: Guyue Huang <guyueh@nvidia.com>
Signed-off-by: Guyue Huang <guyueh@nvidia.com>
parser.add_argument(
"-fsdp",
"--use_mcore_fsdp",
help="Enable Mcore FSDP. Disabled by default",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix to Enable Megatron Core (Mcore) FSDP. Disabled by default

Signed-off-by: Guyue Huang <guyueh@nvidia.com>
@ko3n1g
Copy link
Collaborator

ko3n1g commented Apr 6, 2025

stopping pipeline since merge-conflict in the context of limited time before code-freeze.

Signed-off-by: guyueh1 <140554423+guyueh1@users.noreply.github.com>
@ko3n1g ko3n1g added Run CICD and removed Run CICD labels Apr 6, 2025
@guyueh1
Copy link
Collaborator Author

guyueh1 commented Apr 7, 2025

@erhoo82 all checks passed, we can merge it if you approve

Copy link
Collaborator

@erhoo82 erhoo82 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@erhoo82 erhoo82 merged commit 9657892 into NVIDIA-NeMo:main Apr 7, 2025
243 of 244 checks passed
@erhoo82 erhoo82 added the r2.3.0 Pick this label for auto-cherrypicking into v2.3.0 label Apr 7, 2025
ko3n1g pushed a commit that referenced this pull request Apr 7, 2025
* Add gb200 config, support FSDP and recompute in perf script

Signed-off-by: Guyue Huang <guyueh@nvidia.com>

* Support offloading; change recipes

Signed-off-by: Guyue Huang <guyueh@nvidia.com>

* Fix tp overlap config for gb200

Signed-off-by: Guyue Huang <guyueh@nvidia.com>

* Fix csv column name

Signed-off-by: Guyue Huang <guyueh@nvidia.com>

* Fix ckpt layers in gb200 config

Signed-off-by: Guyue Huang <guyueh@nvidia.com>

* Make Nemotron FP8 B200 use ring-exchange for GEMM+RS

Signed-off-by: Guyue Huang <guyueh@nvidia.com>

* Tune configs for 405b bf16 and gpt

Signed-off-by: Guyue Huang <guyueh@nvidia.com>

* Use pp config for nemotron 340b

Signed-off-by: Guyue Huang <guyueh@nvidia.com>

* Fix

Signed-off-by: Guyue Huang <guyueh@nvidia.com>

* Fix tp overlap error by aggregate=False

Signed-off-by: Guyue Huang <guyueh@nvidia.com>

* Make nemotron-340b bf16 use fsdp

Signed-off-by: Guyue Huang <guyueh@nvidia.com>

* Handle segment for GB200

Signed-off-by: Guyue Huang <guyueh@nvidia.com>

* Disable CG when using FSDP

Signed-off-by: Guyue Huang <guyueh@nvidia.com>

* Rm dp_size from B200 csv

Signed-off-by: Guyue Huang <guyueh@nvidia.com>

* Address comments

Signed-off-by: Guyue Huang <guyueh@nvidia.com>

---------

Signed-off-by: Guyue Huang <guyueh@nvidia.com>
Signed-off-by: guyueh1 <140554423+guyueh1@users.noreply.github.com>
jomitchellnv pushed a commit to jomitchellnv/NeMo that referenced this pull request Apr 8, 2025
* Add gb200 config, support FSDP and recompute in perf script

Signed-off-by: Guyue Huang <guyueh@nvidia.com>

* Support offloading; change recipes

Signed-off-by: Guyue Huang <guyueh@nvidia.com>

* Fix tp overlap config for gb200

Signed-off-by: Guyue Huang <guyueh@nvidia.com>

* Fix csv column name

Signed-off-by: Guyue Huang <guyueh@nvidia.com>

* Fix ckpt layers in gb200 config

Signed-off-by: Guyue Huang <guyueh@nvidia.com>

* Make Nemotron FP8 B200 use ring-exchange for GEMM+RS

Signed-off-by: Guyue Huang <guyueh@nvidia.com>

* Tune configs for 405b bf16 and gpt

Signed-off-by: Guyue Huang <guyueh@nvidia.com>

* Use pp config for nemotron 340b

Signed-off-by: Guyue Huang <guyueh@nvidia.com>

* Fix

Signed-off-by: Guyue Huang <guyueh@nvidia.com>

* Fix tp overlap error by aggregate=False

Signed-off-by: Guyue Huang <guyueh@nvidia.com>

* Make nemotron-340b bf16 use fsdp

Signed-off-by: Guyue Huang <guyueh@nvidia.com>

* Handle segment for GB200

Signed-off-by: Guyue Huang <guyueh@nvidia.com>

* Disable CG when using FSDP

Signed-off-by: Guyue Huang <guyueh@nvidia.com>

* Rm dp_size from B200 csv

Signed-off-by: Guyue Huang <guyueh@nvidia.com>

* Address comments

Signed-off-by: Guyue Huang <guyueh@nvidia.com>

---------

Signed-off-by: Guyue Huang <guyueh@nvidia.com>
Signed-off-by: guyueh1 <140554423+guyueh1@users.noreply.github.com>
Signed-off-by: Jonathan Mitchell <jomitchell@nvidia.com>
chtruong814 pushed a commit that referenced this pull request Apr 10, 2025
* Add gb200 config, support FSDP and recompute in perf script



* Support offloading; change recipes



* Fix tp overlap config for gb200



* Fix csv column name



* Fix ckpt layers in gb200 config



* Make Nemotron FP8 B200 use ring-exchange for GEMM+RS



* Tune configs for 405b bf16 and gpt



* Use pp config for nemotron 340b



* Fix



* Fix tp overlap error by aggregate=False



* Make nemotron-340b bf16 use fsdp



* Handle segment for GB200



* Disable CG when using FSDP



* Rm dp_size from B200 csv



* Address comments



---------

Signed-off-by: Guyue Huang <guyueh@nvidia.com>
Signed-off-by: guyueh1 <140554423+guyueh1@users.noreply.github.com>
Co-authored-by: guyueh1 <140554423+guyueh1@users.noreply.github.com>
rrbale-nvidia pushed a commit to rrbale-nvidia/NeMo that referenced this pull request Apr 11, 2025
* Add gb200 config, support FSDP and recompute in perf script

Signed-off-by: Guyue Huang <guyueh@nvidia.com>

* Support offloading; change recipes

Signed-off-by: Guyue Huang <guyueh@nvidia.com>

* Fix tp overlap config for gb200

Signed-off-by: Guyue Huang <guyueh@nvidia.com>

* Fix csv column name

Signed-off-by: Guyue Huang <guyueh@nvidia.com>

* Fix ckpt layers in gb200 config

Signed-off-by: Guyue Huang <guyueh@nvidia.com>

* Make Nemotron FP8 B200 use ring-exchange for GEMM+RS

Signed-off-by: Guyue Huang <guyueh@nvidia.com>

* Tune configs for 405b bf16 and gpt

Signed-off-by: Guyue Huang <guyueh@nvidia.com>

* Use pp config for nemotron 340b

Signed-off-by: Guyue Huang <guyueh@nvidia.com>

* Fix

Signed-off-by: Guyue Huang <guyueh@nvidia.com>

* Fix tp overlap error by aggregate=False

Signed-off-by: Guyue Huang <guyueh@nvidia.com>

* Make nemotron-340b bf16 use fsdp

Signed-off-by: Guyue Huang <guyueh@nvidia.com>

* Handle segment for GB200

Signed-off-by: Guyue Huang <guyueh@nvidia.com>

* Disable CG when using FSDP

Signed-off-by: Guyue Huang <guyueh@nvidia.com>

* Rm dp_size from B200 csv

Signed-off-by: Guyue Huang <guyueh@nvidia.com>

* Address comments

Signed-off-by: Guyue Huang <guyueh@nvidia.com>

---------

Signed-off-by: Guyue Huang <guyueh@nvidia.com>
Signed-off-by: guyueh1 <140554423+guyueh1@users.noreply.github.com>
Signed-off-by: Ritvik Bale <rbale@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

r2.3.0 Pick this label for auto-cherrypicking into v2.3.0 Run CICD

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants

Comments