Skip to content

fix: fix gb200 release/performance#2189

Merged
terrykong merged 13 commits intomainfrom
yukih/release-perf-fix-gb200
Apr 10, 2026
Merged

fix: fix gb200 release/performance#2189
terrykong merged 13 commits intomainfrom
yukih/release-perf-fix-gb200

Conversation

@yuki-97
Copy link
Copy Markdown
Contributor

@yuki-97 yuki-97 commented Apr 2, 2026

All Clear (with Megatron-Bridge version in #2223)

  1. deepseek related: fix: fix dsv3 by disable mtp #2191
    • grpo-deepseek-v3-32n4g
    • grpo-deepseek-v3-64n4g-async-1off
    • grpo-dapomath17k-dsv3-32n4g-megatron: 90faf90
  2. logprob_batch_size related: fix: revert logprob_batch_size to keep same perf as before #2192
    • grpo-qwen3-30ba3b-8n4g-megatron: f99d1f3
    • grpo-qwen3-32b-4n4g: c7ee7aa
  3. missing tb: 6725805
    • grpo-qwen3-235b-16n4g
    • grpo-qwen3-235b-32n4g-async-1off
  4. grpo-gemma3-27b-it-8n4g-fsdp2tp4-actckpt-long: fix: fix gemma3 #2185
  5. grpo-gptoss-20b-8n4g-megatron: [model] fix: fix gpt-oss down_proj weight handling Megatron-Bridge#3162, 3e02d25

@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented Apr 2, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@yuki-97 yuki-97 force-pushed the yukih/release-perf-fix-gb200 branch 3 times, most recently from e0a0adf to f99d1f3 Compare April 3, 2026 09:37
@yuki-97 yuki-97 force-pushed the yukih/release-perf-fix branch from 3875c8a to 40c2dc0 Compare April 7, 2026 15:01
@yuki-97 yuki-97 force-pushed the yukih/release-perf-fix-gb200 branch from 3e02d25 to 4839e92 Compare April 7, 2026 15:08
@yuki-97 yuki-97 force-pushed the yukih/release-perf-fix branch from 40c2dc0 to 0945485 Compare April 8, 2026 06:13
@yuki-97 yuki-97 force-pushed the yukih/release-perf-fix-gb200 branch from 4839e92 to f5ea355 Compare April 8, 2026 06:27
@yuki-97 yuki-97 force-pushed the yukih/release-perf-fix branch 2 times, most recently from ae11a64 to e0ee956 Compare April 8, 2026 13:29
@yuki-97 yuki-97 force-pushed the yukih/release-perf-fix-gb200 branch 3 times, most recently from 1e927fb to 4974d23 Compare April 9, 2026 11:13
@yuki-97 yuki-97 force-pushed the yukih/release-perf-fix branch from 3f93034 to 4fa69d0 Compare April 10, 2026 03:06
@yuki-97 yuki-97 force-pushed the yukih/release-perf-fix-gb200 branch from 4974d23 to 30818bc Compare April 10, 2026 03:21
Comment thread examples/configs/recipes/llm/performance/grpo-deepseek-v3-32n4g.yaml Outdated
Comment thread examples/configs/recipes/llm/grpo-gptoss-20b-8n4g-megatron.yaml
Comment thread examples/configs/recipes/llm/grpo-qwen3-30ba3b-8n4g-megatron.yaml
Comment thread examples/configs/recipes/llm/grpo-gptoss-20b-8n4g-megatron.yaml
Base automatically changed from yukih/release-perf-fix to main April 10, 2026 04:26
yuki-97 added 9 commits April 9, 2026 23:04
Signed-off-by: Yuki Huang <yukih@nvidia.com>
- grpo-qwen2.5-7b-instruct-4n4g-megatron
- sft-llama3.1-70b-8n4g-tp2pp2-long-megatron
- sft-llama3.1-8b-1n4g-fsdp2tp1-long

Signed-off-by: Yuki Huang <yukih@nvidia.com>
- distillation-qwen3-32b-to-4b-base-2n4g-fsdp2tp1-long.v1
- dapo-qwen2.5-7b-16n4g-fsdp2cp2
- grpo-llama3.1-8b-instruct-4n4g-fsdp2tp1-long.v3

Signed-off-by: Yuki Huang <yukih@nvidia.com>
Signed-off-by: Yuki Huang <yukih@nvidia.com>
Signed-off-by: Yuki Huang <yukih@nvidia.com>
…-32b-4n4g)

Signed-off-by: Yuki Huang <yukih@nvidia.com>
…qwen3-32b-4n8g

Signed-off-by: Yuki Huang <yukih@nvidia.com>
Signed-off-by: Yuki Huang <yukih@nvidia.com>
- revert logprob_batch_size
- update test time

Signed-off-by: Yuki Huang <yukih@nvidia.com>
yuki-97 added 3 commits April 9, 2026 23:04
- update parallel setting
- add NCCL_NVLS_ENABLE
- update test time and metrics

Signed-off-by: Yuki Huang <yukih@nvidia.com>
…settings and limit metric

Signed-off-by: Yuki Huang <yukih@nvidia.com>
Signed-off-by: Yuki Huang <yukih@nvidia.com>
@yuki-97 yuki-97 force-pushed the yukih/release-perf-fix-gb200 branch from 30818bc to 89af640 Compare April 10, 2026 06:04
Signed-off-by: Yuki Huang <yukih@nvidia.com>
@yuki-97 yuki-97 marked this pull request as ready for review April 10, 2026 06:19
@yuki-97 yuki-97 requested review from a team as code owners April 10, 2026 06:19
@terrykong terrykong enabled auto-merge (squash) April 10, 2026 06:32
@yuki-97 yuki-97 added the CI:Lfast Runs a fast test suite and re-use nightly `main` container (but sync dependencies to PRs version) label Apr 10, 2026
@yuki-97
Copy link
Copy Markdown
Contributor Author

yuki-97 commented Apr 10, 2026

/ok to test 16312e5

1 similar comment
@terrykong
Copy link
Copy Markdown
Collaborator

/ok to test 16312e5

@terrykong terrykong merged commit 4e03c8c into main Apr 10, 2026
27 checks passed
@terrykong terrykong deleted the yukih/release-perf-fix-gb200 branch April 10, 2026 07:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CI:Lfast Runs a fast test suite and re-use nightly `main` container (but sync dependencies to PRs version)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants