[None][feat] Support DeepSeek-V4 model#14751
Draft
lfr-0531 wants to merge 59 commits into
Draft
Conversation
Signed-off-by: Yuhang He <58161490+heyuhhh@users.noreply.github.com> Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com> (cherry picked from commit 344b9e9) Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com>
….llm_args (NVIDIA#13568) Signed-off-by: Yuewei Na <nv-yna@users.noreply.github.com> Co-authored-by: Yuewei Na <nv-yna@users.noreply.github.com> Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com> (cherry picked from commit 085e2e1) Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com>
Signed-off-by: Mingyang Hao <mingyangHao@users.noreply.github.com> Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com> (cherry picked from commit 3303f1c) Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com>
Signed-off-by: Jiagan Cheng <jiaganc@nvidia.com> Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com> (cherry picked from commit 7a7935c) Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com>
Signed-off-by: Mingyang Hao <mingyangHao@users.noreply.github.com> Co-authored-by: Mingyang Hao <mingyangHao@users.noreply.github.com> Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com> (cherry picked from commit b3f45bb) Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com>
Signed-off-by: Barry Kang <43644113+Barry-Delaney@users.noreply.github.com> Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com> (cherry picked from commit eb85528) Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com>
Signed-off-by: Tracin <10434017+Tracin@users.noreply.github.com> Signed-off-by: Qi Zhang (qizh) <10434017+Tracin@users.noreply.github.com> Co-authored-by: OpenAI Codex <codex@openai.com> Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com> (cherry picked from commit eb97997) Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com>
Filtered out disaggregation serving/router test changes for user/fanrongl/dsv4_model. Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com>
…DIA#13628) Signed-off-by: Shicheng Li <shicli@nvidia.com> Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com> (cherry picked from commit 096f86e) Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com>
…q_b_proj+norm) (NVIDIA#13629) Signed-off-by: Shicheng Li <shicli@nvidia.com> Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com> (cherry picked from commit 7a8c7c9) Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com>
Signed-off-by: Mingyang Hao <mingyangHao@users.noreply.github.com> Co-authored-by: Mingyang Hao <mingyangHao@users.noreply.github.com> Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com> (cherry picked from commit 7f0224b) Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com>
Signed-off-by: Yuhang He <58161490+heyuhhh@users.noreply.github.com> Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com> (cherry picked from commit e43a9c8) Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com>
Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> Signed-off-by: Emma Qiao <qqiao@nvidia.com> Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com> (cherry picked from commit 9ceb421) Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com>
…up shapes (NVIDIA#13657) Signed-off-by: Lance Liao <laliao@login-bia01.bia.clusters.nvidia.com> Signed-off-by: Lanyu Liao <lancelly@users.noreply.github.com> Co-authored-by: Lance Liao <laliao@login-bia01.bia.clusters.nvidia.com> Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com> (cherry picked from commit 8e8f37d) Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com>
Signed-off-by: Barry Kang <43644113+Barry-Delaney@users.noreply.github.com> Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com> (cherry picked from commit fa1e55e) Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com>
Signed-off-by: Mingyang Hao <mingyangHao@users.noreply.github.com> Co-authored-by: Mingyang Hao <mingyangHao@users.noreply.github.com> Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com> (cherry picked from commit d2ea57d) Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com>
Signed-off-by: Yuhang He <58161490+heyuhhh@users.noreply.github.com> Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com> (cherry picked from commit 3f1313b) Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com>
…ed MoE (NVIDIA#13767) Signed-off-by: Barry Kang <43644113+Barry-Delaney@users.noreply.github.com> (cherry picked from commit 1a52b72) Signed-off-by: Yuhang He <58161490+heyuhhh@users.noreply.github.com> Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com> (cherry picked from commit 7a9b0ca) Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com>
…IA#13646) Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com> Co-authored-by: peihengh <259410613+peihu-nv@users.noreply.github.com> (cherry picked from commit a3f3775) Signed-off-by: Yuhang He <58161490+heyuhhh@users.noreply.github.com> Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com> (cherry picked from commit aa67e3e) Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com>
Signed-off-by: Mingyang Hao <mingyangHao@users.noreply.github.com> Co-authored-by: Mingyang Hao <mingyangHao@users.noreply.github.com> (cherry picked from commit b33d2dc) Signed-off-by: Yuhang He <58161490+heyuhhh@users.noreply.github.com> Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com> (cherry picked from commit 0956712) Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com>
…r mode Signed-off-by: Yuhang He <58161490+heyuhhh@users.noreply.github.com> Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com> (cherry picked from commit 044c13c) Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com>
Signed-off-by: Oseltamivir <bryansg2013@gmail.com> Signed-off-by: Mingyang Hao <mingyangHao@users.noreply.github.com> Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com> Co-authored-by: Oseltamivir <bryansg2013@gmail.com> Co-authored-by: Mingyang Hao <mingyangHao@users.noreply.github.com> (cherry picked from commit 9aa3715) Signed-off-by: yuhangh <58161490+heyuhhh@users.noreply.github.com> Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com> (cherry picked from commit e62b6c2) Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com>
) Signed-off-by: Mingyang Hao <mingyangHao@users.noreply.github.com> Co-authored-by: Mingyang Hao <mingyangHao@users.noreply.github.com> (cherry picked from commit 7c83907) Signed-off-by: yuhangh <58161490+heyuhhh@users.noreply.github.com> Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com> (cherry picked from commit 4a42788) Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com>
Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com> Co-authored-by: Fanrong Li <lfr-0531@users.noreply.github.com> Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com> (cherry picked from commit 5da90ad) Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com>
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com> Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com> (cherry picked from commit 189e659) Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com>
Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com> Co-authored-by: Fanrong Li <lfr-0531@users.noreply.github.com> Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com> (cherry picked from commit 6504890) Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com>
Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com> Co-authored-by: Fanrong Li <lfr-0531@users.noreply.github.com> Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com> (cherry picked from commit e41dccb) Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com>
Signed-off-by: Yuhang He <58161490+heyuhhh@users.noreply.github.com> (cherry picked from commit f1ebebb) Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com>
Signed-off-by: Yuhang He <58161490+heyuhhh@users.noreply.github.com> (cherry picked from commit aeed21e4af5db02c7739c0b139089002a7ef8ff8) (cherry picked from commit 0778c48) Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com>
NVIDIA#14124) Signed-off-by: Mingyang Hao <mingyangh@nvidia.com> (cherry picked from commit 82ebfc7) Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com>
Signed-off-by: Mingyang Hao <mingyangh@nvidia.com> (cherry picked from commit 4aad70b) Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com>
Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com> Signed-off-by: Barry Kang <43644113+Barry-Delaney@users.noreply.github.com> Co-authored-by: Fanrong Li <lfr-0531@users.noreply.github.com> Co-authored-by: Barry Kang <43644113+Barry-Delaney@users.noreply.github.com> (cherry picked from commit e6339b5) Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com>
…cy test (NVIDIA#14212) Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com> Co-authored-by: Fanrong Li <lfr-0531@users.noreply.github.com> (cherry picked from commit 37c2e05) Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com>
Signed-off-by: Mike Iovine <miovine@nvidia.com> Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com> Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com> Co-authored-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com> (cherry picked from commit 7fbe349) Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com>
Signed-off-by: Mingyang Hao <mingyangHao@users.noreply.github.com> Signed-off-by: Mingyang Hao <mingyangh@nvidia.com> Co-authored-by: Mingyang Hao <mingyangHao@users.noreply.github.com> (cherry picked from commit f833ad7) Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com>
Signed-off-by: Shicheng Li <shicli@nvidia.com> (cherry picked from commit d7d9036) Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com>
Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com> Co-authored-by: Fanrong Li <lfr-0531@users.noreply.github.com> (cherry picked from commit 5af1511) Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com>
…IDIA#14238) Signed-off-by: Barry Kang <43644113+Barry-Delaney@users.noreply.github.com> (cherry picked from commit 552de9e) Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com>
…ymmBuffer (NVIDIA#14213) Signed-off-by: Xianjie <5410381+qiaoxj07@users.noreply.github.com> (cherry picked from commit 5a72fb3) Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com>
…VIDIA#14219) Signed-off-by: longcheng-nv <243710427+longcheng-nv@users.noreply.github.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> (cherry picked from commit bef645d) Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com>
…ttn (NVIDIA#14321) Signed-off-by: Iman Tabrizian <10105175+Tabrizian@users.noreply.github.com> (cherry picked from commit 349f087) Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com>
Filtered out disaggregation and AutoDeploy-only changes for user/fanrongl/dsv4_model. Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com>
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com> (cherry picked from commit 325152c) Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com>
…eam` (NVIDIA#14245) Signed-off-by: Barry Kang <43644113+Barry-Delaney@users.noreply.github.com> Signed-off-by: peihengh <259410613+peihu-nv@users.noreply.github.com> Co-authored-by: peihengh <259410613+peihu-nv@users.noreply.github.com> Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com>
…4241) Signed-off-by: Lanyu Liao <lancelly@users.noreply.github.com> Signed-off-by: Lance Liao <108499334+lancelly@users.noreply.github.com> Co-authored-by: Lanyu Liao <lancelly@users.noreply.github.com> Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com>
…ety (NVIDIA#14297) Signed-off-by: longcheng-nv <243710427+longcheng-nv@users.noreply.github.com> Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com>
e1c544f to
aa1c243
Compare
Collaborator
Author
|
/bot run --disable-fail-fast |
Collaborator
|
PR_Github #51063 [ run ] triggered by Bot. Commit: |
Collaborator
|
PR_Github #51063 [ run ] completed with state
|
aa1c243 to
2608a8d
Compare
Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com>
2608a8d to
94a2f25
Compare
Collaborator
Author
|
/bot run --disable-fail-fast |
Collaborator
|
PR_Github #51198 [ run ] triggered by Bot. Commit: |
Collaborator
|
PR_Github #51198 [ run ] completed with state
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
@coderabbitai summary
Description
This PR adds end-to-end support for the DeepSeek-V4 model on the PyTorch backend.
DeepSeek-V4 introduces sparse-MLA attention, and this change brings in the full path:
the compressor / indexer kernels and sparse KV-cache management, the QK-norm fused op,
MoE enablement (MEGAMOE DeepGEMM, EPLB), tokenizer chat-template handling, and the
tool / reasoning parsers. The change set is intentionally large; please refer to the
commit history for the per-feature breakdown.
Test Coverage
unittest/_torch/attention/sparse/deepseek_v4/*— sparse MLA, cache manager,indices transform, o_proj, compressor kernel / module / tf32
unittest/_torch/modeling/test_modeling_deepseekv4.pyunittest/llmapi/test_deepseek_v4_tokenizer.pyaccuracy/test_llm_api_pytorch.py::TestDeepSeekV3Lite::test_bfloat16_2_model_mtpPR Checklist
Please review the following before submitting your PR:
PR description clearly explains what and why.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths.
If PR introduces API changes, an appropriate PR label is added.
Any new dependencies have been scanned for license and vulnerabilities.
CODEOWNERS updated if ownership changes.
Documentation updated as needed.
Please check this after reviewing the above items as appropriate for this PR.
GitHub Bot Help
To see a list of available CI bot commands, please comment
/bot help.