[Dev] add support for deepep/hybridep dispatcher under thd format training#4816
Conversation
|
@HaochenYuan Could please you add an UT to guard this and submit a PR to main. cc @Autumn1998 for review |
0450ef9 to
f61785f
Compare
5e73b40 to
38db167
Compare
UT added. |
|
/claude strict-review |
Code Review SummaryPR: Add support for deepep/hybridep dispatcher under THD format training OverviewThis PR enables HybridEP and DeepEP MoE dispatcher backends to work with THD-format sequence packing (variable-length token counts per rank). The core approach — all-reduce MAX to find the group-wide padded token count, pad routing metadata and hidden states before dispatch, trim after combine — is sound. The gradient flow through pad/trim is correct since The assertion update in Findings
IMPORTANT:
SUGGESTION:
Risk Assessment: Low-MediumThe core dispatcher changes are well-contained in |
Co-authored-by: claude[bot] <209825114+claude[bot]@users.noreply.github.com>
…nto thd_e2e_deepep
yaox12
left a comment
There was a problem hiding this comment.
LGTM. Please fix the CI failure.
|
🔄 Merge queue validation started! You can track the progress here: https://github.com/NVIDIA/Megatron-LM/actions/runs/26200974444 |
|
🔄 Merge queue validation started! You can track the progress here: https://github.com/NVIDIA/Megatron-LM/actions/runs/26206185854 |
What does this PR do ?
Previous PR added support for thd format in training, but for MoE dispatcher, only all2all type is supported. This PR adds the deepep & hybridep backend.
Issue tracking
For PRs from open-source community contributors:
Linked issue:
Contribution process
Pre-checks
Code review
Feel free to message or comment the @mcore-oncall to help accelerate your merge into main. The less complex your PR is, the faster it will be approved and merged!
All PRs start as draft. If you open a non-draft PR, it will be automatically converted to draft.
Step 1: Mark PR as "Ready for Review"
.github/CODEOWNERS.Final Review might get declined if these requirements are not fulfilled.
Step 2: Final Review
For PRs that change
megatron/core, once all expert reviewers have approved, theFinal Reviewlabel is applied automatically and final reviewers are assigned.For PRs outside
megatron/core, this step is skipped.Step 3: Approved
Once all required reviewers have approved, the
Approvedlabel is applied automatically.Merge
Any member of mcore-engineers will be able to merge your PR.
For MRs into `dev` branch
The proposed review process for `dev` branch is under active discussion.MRs are mergable after one approval by either
eharper@nvidia.comorzijiey@nvidia.com.