Skip to content

[TRTLLM-12316][feat] Integrate FP4 indexer for DSv4#13575

Merged
lfr-0531 merged 8 commits into
NVIDIA:feat/deepseek_v4from
mikeiovine:fp4-indexer
May 18, 2026
Merged

[TRTLLM-12316][feat] Integrate FP4 indexer for DSv4#13575
lfr-0531 merged 8 commits into
NVIDIA:feat/deepseek_v4from
mikeiovine:fp4-indexer

Conversation

@mikeiovine
Copy link
Copy Markdown
Collaborator

@mikeiovine mikeiovine commented Apr 28, 2026

Description

Integrate the fp4 indexer kernels from DeepGEMM into DSV4. It can be used by setting DeepSeekV4SparseAttentionConfig(indexer_k_dtype="fp4").

Test Coverage

Existing tests plus a few new new unit tests. I also manually tested accuracy - confirmed that there is no drop in gsm8k for both Flash and Pro (a small drop is expected due to the lower precision):

Flash (FP8 Blockwise Indexer)
|Tasks|Version|     Filter     |n-shot|  Metric   |   | Value |   |Stderr|
|-----|------:|----------------|-----:|-----------|---|------:|---|-----:|
|gsm8k|      3|flexible-extract|     5|exact_match|↑  |94.9962|±  |0.6005|
|     |       |strict-match    |     5|exact_match|↑  |95.0720|±  |0.5962|

Flash (FP4 Indexer)
|Tasks|Version|     Filter     |n-shot|  Metric   |   | Value |   |Stderr|
|-----|------:|----------------|-----:|-----------|---|------:|---|-----:|
|gsm8k|      3|flexible-extract|     5|exact_match|↑  |94.3897|±  |0.6339|
|     |       |strict-match    |     5|exact_match|↑  |94.4655|±  |0.6298|

PR Checklist

Please review the following before submitting your PR:

  • PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.

  • PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.

  • Test cases are provided for new code paths (see test instructions)

  • Any new dependencies have been scanned for license and vulnerabilities

  • CODEOWNERS updated if ownership changes

  • Documentation updated as needed

  • Update tava architecture diagram if there is a significant design change in PR.

  • The reviewers assigned automatically/manually are appropriate for the PR.

  • Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

@lfr-0531 lfr-0531 force-pushed the feat/deepseek_v4 branch from 67b0b17 to 9c57516 Compare May 7, 2026 16:29
@lfr-0531 lfr-0531 force-pushed the feat/deepseek_v4 branch from 0a93d10 to 118e7a5 Compare May 14, 2026 07:44
@mikeiovine mikeiovine requested a review from lfr-0531 May 14, 2026 19:19
@mikeiovine mikeiovine changed the title [TRTLLM-12316][feat] Support FP4 indexer [TRTLLM-12316][feat] Integrate FP4 indexer for DSv4 May 14, 2026
@mikeiovine mikeiovine marked this pull request as ready for review May 14, 2026 19:45
@mikeiovine mikeiovine requested review from a team as code owners May 14, 2026 19:45
@mikeiovine mikeiovine requested review from QiJune, liji-nv and schetlur-nv and removed request for a team May 14, 2026 19:45
@mikeiovine
Copy link
Copy Markdown
Collaborator Author

/bot run --disable-fail-fast

Comment thread tests/unittest/_torch/attention/sparse/rocketkv/test_rocketkv.py
@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #48423 [ run ] triggered by Bot. Commit: 9f6f0d5 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #48423 [ run ] completed with state SUCCESS. Commit: 9f6f0d5
/LLM/main/L0_MergeRequest_PR pipeline #38224 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

Comment thread tensorrt_llm/_torch/attention_backend/sparse/deepseek_v4/cache_manager.py Outdated
Comment thread tensorrt_llm/_torch/attention_backend/sparse/deepseek_v4/cache_manager.py Outdated
Comment thread tensorrt_llm/_torch/model_config.py Outdated
Comment thread tensorrt_llm/_torch/modules/attention.py Outdated
Signed-off-by: Mike Iovine <miovine@nvidia.com>
@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #48610 [ run ] completed with state ABORTED. Commit: d6d986c

Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #48633 [ run ] completed with state FAILURE. Commit: ba16ae9
/LLM/main/L0_MergeRequest_PR pipeline #38415 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

@mikeiovine
Copy link
Copy Markdown
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #48648 [ run ] triggered by Bot. Commit: ba16ae9 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #48648 [ run ] completed with state FAILURE. Commit: ba16ae9
/LLM/main/L0_MergeRequest_PR pipeline #38429 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

@mikeiovine
Copy link
Copy Markdown
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #48693 [ run ] triggered by Bot. Commit: ba16ae9 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #48693 [ run ] completed with state FAILURE. Commit: ba16ae9
/LLM/main/L0_MergeRequest_PR pipeline #38467 completed with status: 'ABORTED'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

Comment thread tests/integration/test_lists/waives.txt Outdated
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
@mikeiovine
Copy link
Copy Markdown
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #48701 [ run ] triggered by Bot. Commit: fded951 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #48701 [ run ] completed with state FAILURE. Commit: fded951
/LLM/main/L0_MergeRequest_PR pipeline #38472 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

@mikeiovine
Copy link
Copy Markdown
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #48730 [ run ] triggered by Bot. Commit: fded951 Link to invocation

Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
@lfr-0531
Copy link
Copy Markdown
Collaborator

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #48734 [ run ] triggered by Bot. Commit: d07d383 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #48730 [ run ] completed with state ABORTED. Commit: fded951

Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #48734 [ run ] completed with state SUCCESS. Commit: d07d383
/LLM/main/L0_MergeRequest_PR pipeline #38500 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

@lfr-0531
Copy link
Copy Markdown
Collaborator

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #48772 [ run ] triggered by Bot. Commit: d07d383 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #48772 [ run ] completed with state FAILURE. Commit: d07d383
/LLM/main/L0_MergeRequest_PR pipeline #38539 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

@mikeiovine
Copy link
Copy Markdown
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #48782 [ run ] triggered by Bot. Commit: d07d383 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #48782 [ run ] completed with state SUCCESS. Commit: d07d383
/LLM/main/L0_MergeRequest_PR pipeline #38547 completed with status: 'SUCCESS'

CI Report

Link to invocation

@lfr-0531 lfr-0531 merged commit 7fbe349 into NVIDIA:feat/deepseek_v4 May 18, 2026
6 checks passed
@mikeiovine mikeiovine deleted the fp4-indexer branch May 18, 2026 14:50
lfr-0531 added a commit to lfr-0531/TensorRT-LLM that referenced this pull request May 29, 2026
Signed-off-by: Mike Iovine <miovine@nvidia.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
Co-authored-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
(cherry picked from commit 7fbe349)
Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants