Skip to content

[None][feat] Support DeepSeek-V4 model#14751

Draft
lfr-0531 wants to merge 59 commits into
NVIDIA:mainfrom
lfr-0531:user/fanrongl/dsv4_model
Draft

[None][feat] Support DeepSeek-V4 model#14751
lfr-0531 wants to merge 59 commits into
NVIDIA:mainfrom
lfr-0531:user/fanrongl/dsv4_model

Conversation

@lfr-0531
Copy link
Copy Markdown
Collaborator

@lfr-0531 lfr-0531 commented May 29, 2026

@coderabbitai summary

Description

This PR adds end-to-end support for the DeepSeek-V4 model on the PyTorch backend.
DeepSeek-V4 introduces sparse-MLA attention, and this change brings in the full path:
the compressor / indexer kernels and sparse KV-cache management, the QK-norm fused op,
MoE enablement (MEGAMOE DeepGEMM, EPLB), tokenizer chat-template handling, and the
tool / reasoning parsers. The change set is intentionally large; please refer to the
commit history for the per-feature breakdown.

Test Coverage

  • unittest/_torch/attention/sparse/deepseek_v4/* — sparse MLA, cache manager,
    indices transform, o_proj, compressor kernel / module / tf32
  • unittest/_torch/modeling/test_modeling_deepseekv4.py
  • unittest/llmapi/test_deepseek_v4_tokenizer.py
  • accuracy/test_llm_api_pytorch.py::TestDeepSeekV3Lite::test_bfloat16_2_model_mtp

PR Checklist

Please review the following before submitting your PR:

  • PR description clearly explains what and why.

  • PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.

  • Test cases are provided for new code paths.

  • If PR introduces API changes, an appropriate PR label is added.

  • Any new dependencies have been scanned for license and vulnerabilities.

  • CODEOWNERS updated if ownership changes.

  • Documentation updated as needed.

  • Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

heyuhhh and others added 29 commits May 29, 2026 16:14
Signed-off-by: Yuhang He <58161490+heyuhhh@users.noreply.github.com>
Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com>
(cherry picked from commit 344b9e9)
Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com>
….llm_args (NVIDIA#13568)

Signed-off-by: Yuewei Na <nv-yna@users.noreply.github.com>
Co-authored-by: Yuewei Na <nv-yna@users.noreply.github.com>
Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com>
(cherry picked from commit 085e2e1)
Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com>
Signed-off-by: Mingyang Hao <mingyangHao@users.noreply.github.com>
Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com>
(cherry picked from commit 3303f1c)
Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com>
Signed-off-by: Jiagan Cheng <jiaganc@nvidia.com>
Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com>
(cherry picked from commit 7a7935c)
Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com>
Signed-off-by: Mingyang Hao <mingyangHao@users.noreply.github.com>
Co-authored-by: Mingyang Hao <mingyangHao@users.noreply.github.com>
Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com>
(cherry picked from commit b3f45bb)
Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com>
Signed-off-by: Barry Kang <43644113+Barry-Delaney@users.noreply.github.com>
Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com>
(cherry picked from commit eb85528)
Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com>
Signed-off-by: Tracin <10434017+Tracin@users.noreply.github.com>
Signed-off-by: Qi Zhang (qizh) <10434017+Tracin@users.noreply.github.com>
Co-authored-by: OpenAI Codex <codex@openai.com>
Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com>
(cherry picked from commit eb97997)
Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com>
Filtered out disaggregation serving/router test changes for user/fanrongl/dsv4_model.

Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com>
…DIA#13628)

Signed-off-by: Shicheng Li <shicli@nvidia.com>
Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com>
(cherry picked from commit 096f86e)
Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com>
…q_b_proj+norm) (NVIDIA#13629)

Signed-off-by: Shicheng Li <shicli@nvidia.com>
Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com>
(cherry picked from commit 7a8c7c9)
Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com>
Signed-off-by: Mingyang Hao <mingyangHao@users.noreply.github.com>
Co-authored-by: Mingyang Hao <mingyangHao@users.noreply.github.com>
Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com>
(cherry picked from commit 7f0224b)
Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com>
Signed-off-by: Yuhang He <58161490+heyuhhh@users.noreply.github.com>
Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com>
(cherry picked from commit e43a9c8)
Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com>
Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>
Signed-off-by: Emma Qiao <qqiao@nvidia.com>
Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com>
(cherry picked from commit 9ceb421)
Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com>
…up shapes (NVIDIA#13657)

Signed-off-by: Lance Liao <laliao@login-bia01.bia.clusters.nvidia.com>
Signed-off-by: Lanyu Liao <lancelly@users.noreply.github.com>
Co-authored-by: Lance Liao <laliao@login-bia01.bia.clusters.nvidia.com>
Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com>
(cherry picked from commit 8e8f37d)
Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com>
Signed-off-by: Barry Kang <43644113+Barry-Delaney@users.noreply.github.com>
Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com>
(cherry picked from commit fa1e55e)
Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com>
Signed-off-by: Mingyang Hao <mingyangHao@users.noreply.github.com>
Co-authored-by: Mingyang Hao <mingyangHao@users.noreply.github.com>
Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com>
(cherry picked from commit d2ea57d)
Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com>
Signed-off-by: Yuhang He <58161490+heyuhhh@users.noreply.github.com>
Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com>
(cherry picked from commit 3f1313b)
Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com>
…ed MoE (NVIDIA#13767)

Signed-off-by: Barry Kang <43644113+Barry-Delaney@users.noreply.github.com>
(cherry picked from commit 1a52b72)
Signed-off-by: Yuhang He <58161490+heyuhhh@users.noreply.github.com>
Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com>
(cherry picked from commit 7a9b0ca)
Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com>
…IA#13646)

Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
Co-authored-by: peihengh <259410613+peihu-nv@users.noreply.github.com>
(cherry picked from commit a3f3775)
Signed-off-by: Yuhang He <58161490+heyuhhh@users.noreply.github.com>
Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com>
(cherry picked from commit aa67e3e)
Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com>
Signed-off-by: Mingyang Hao <mingyangHao@users.noreply.github.com>
Co-authored-by: Mingyang Hao <mingyangHao@users.noreply.github.com>
(cherry picked from commit b33d2dc)
Signed-off-by: Yuhang He <58161490+heyuhhh@users.noreply.github.com>
Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com>
(cherry picked from commit 0956712)
Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com>
…r mode

Signed-off-by: Yuhang He <58161490+heyuhhh@users.noreply.github.com>

Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com>
(cherry picked from commit 044c13c)
Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com>
Signed-off-by: Oseltamivir <bryansg2013@gmail.com>
Signed-off-by: Mingyang Hao <mingyangHao@users.noreply.github.com>
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
Co-authored-by: Oseltamivir <bryansg2013@gmail.com>
Co-authored-by: Mingyang Hao <mingyangHao@users.noreply.github.com>
(cherry picked from commit 9aa3715)
Signed-off-by: yuhangh <58161490+heyuhhh@users.noreply.github.com>
Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com>
(cherry picked from commit e62b6c2)
Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com>
)

Signed-off-by: Mingyang Hao <mingyangHao@users.noreply.github.com>
Co-authored-by: Mingyang Hao <mingyangHao@users.noreply.github.com>
(cherry picked from commit 7c83907)
Signed-off-by: yuhangh <58161490+heyuhhh@users.noreply.github.com>
Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com>
(cherry picked from commit 4a42788)
Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com>
Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com>
Co-authored-by: Fanrong Li <lfr-0531@users.noreply.github.com>
Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com>
(cherry picked from commit 5da90ad)
Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com>
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com>
(cherry picked from commit 189e659)
Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com>
Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com>
Co-authored-by: Fanrong Li <lfr-0531@users.noreply.github.com>
Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com>
(cherry picked from commit 6504890)
Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com>
Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com>
Co-authored-by: Fanrong Li <lfr-0531@users.noreply.github.com>
Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com>
(cherry picked from commit e41dccb)
Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com>
Signed-off-by: Yuhang He <58161490+heyuhhh@users.noreply.github.com>
(cherry picked from commit f1ebebb)
Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com>
Signed-off-by: Yuhang He <58161490+heyuhhh@users.noreply.github.com>
(cherry picked from commit aeed21e4af5db02c7739c0b139089002a7ef8ff8)
(cherry picked from commit 0778c48)
Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com>
mingyangHao and others added 19 commits May 29, 2026 16:15
NVIDIA#14124)

Signed-off-by: Mingyang Hao <mingyangh@nvidia.com>
(cherry picked from commit 82ebfc7)
Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com>
Signed-off-by: Mingyang Hao <mingyangh@nvidia.com>
(cherry picked from commit 4aad70b)
Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com>
Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com>
Signed-off-by: Barry Kang <43644113+Barry-Delaney@users.noreply.github.com>
Co-authored-by: Fanrong Li <lfr-0531@users.noreply.github.com>
Co-authored-by: Barry Kang <43644113+Barry-Delaney@users.noreply.github.com>
(cherry picked from commit e6339b5)
Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com>
…A#14000)

Signed-off-by: Barry Kang <43644113+Barry-Delaney@users.noreply.github.com>
(cherry picked from commit 12b8105)
Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com>
…cy test (NVIDIA#14212)

Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com>
Co-authored-by: Fanrong Li <lfr-0531@users.noreply.github.com>
(cherry picked from commit 37c2e05)
Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com>
Signed-off-by: Mike Iovine <miovine@nvidia.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
Co-authored-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
(cherry picked from commit 7fbe349)
Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com>
Signed-off-by: Mingyang Hao <mingyangHao@users.noreply.github.com>
Signed-off-by: Mingyang Hao <mingyangh@nvidia.com>
Co-authored-by: Mingyang Hao <mingyangHao@users.noreply.github.com>
(cherry picked from commit f833ad7)
Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com>
Signed-off-by: Shicheng Li <shicli@nvidia.com>
(cherry picked from commit d7d9036)
Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com>
Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com>
Co-authored-by: Fanrong Li <lfr-0531@users.noreply.github.com>
(cherry picked from commit 5af1511)
Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com>
…IDIA#14238)

Signed-off-by: Barry Kang <43644113+Barry-Delaney@users.noreply.github.com>
(cherry picked from commit 552de9e)
Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com>
…ymmBuffer (NVIDIA#14213)

Signed-off-by: Xianjie <5410381+qiaoxj07@users.noreply.github.com>
(cherry picked from commit 5a72fb3)
Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com>
…VIDIA#14219)

Signed-off-by: longcheng-nv <243710427+longcheng-nv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
(cherry picked from commit bef645d)
Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com>
…ttn (NVIDIA#14321)

Signed-off-by: Iman Tabrizian <10105175+Tabrizian@users.noreply.github.com>
(cherry picked from commit 349f087)
Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com>
…#14299)

Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com>
Co-authored-by: Fanrong Li <lfr-0531@users.noreply.github.com>
(cherry picked from commit 5d0a30e)
Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com>
Filtered out disaggregation and AutoDeploy-only changes for user/fanrongl/dsv4_model.

Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com>
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
(cherry picked from commit 325152c)
Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com>
…eam` (NVIDIA#14245)

Signed-off-by: Barry Kang <43644113+Barry-Delaney@users.noreply.github.com>
Signed-off-by: peihengh <259410613+peihu-nv@users.noreply.github.com>
Co-authored-by: peihengh <259410613+peihu-nv@users.noreply.github.com>
Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com>
…4241)

Signed-off-by: Lanyu Liao <lancelly@users.noreply.github.com>
Signed-off-by: Lance Liao <108499334+lancelly@users.noreply.github.com>
Co-authored-by: Lanyu Liao <lancelly@users.noreply.github.com>
Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com>
…ety (NVIDIA#14297)

Signed-off-by: longcheng-nv <243710427+longcheng-nv@users.noreply.github.com>
Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com>
@lfr-0531 lfr-0531 force-pushed the user/fanrongl/dsv4_model branch from e1c544f to aa1c243 Compare May 29, 2026 16:16
@lfr-0531 lfr-0531 added the api-compatible Accepted LLM API contract change that is backwards-compatible label May 29, 2026
@lfr-0531
Copy link
Copy Markdown
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #51063 [ run ] triggered by Bot. Commit: aa1c243 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #51063 [ run ] completed with state SUCCESS. Commit: aa1c243
/LLM/main/L0_MergeRequest_PR pipeline #40508 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

@lfr-0531 lfr-0531 force-pushed the user/fanrongl/dsv4_model branch from aa1c243 to 2608a8d Compare May 30, 2026 02:02
Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com>
@lfr-0531 lfr-0531 force-pushed the user/fanrongl/dsv4_model branch from 2608a8d to 94a2f25 Compare May 30, 2026 02:14
@lfr-0531
Copy link
Copy Markdown
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #51198 [ run ] triggered by Bot. Commit: 94a2f25 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #51198 [ run ] completed with state FAILURE. Commit: 94a2f25
/LLM/main/L0_MergeRequest_PR pipeline #40626 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

api-compatible Accepted LLM API contract change that is backwards-compatible

Projects

None yet

Development

Successfully merging this pull request may close these issues.