Skip to content

[None][feat] Enable 2 DSv4 perf optimizations by default#14120

Merged
lfr-0531 merged 1 commit into
NVIDIA:feat/deepseek_v4from
lishicheng1996-nv:feat/dsv4-enable-default-opts
May 18, 2026
Merged

[None][feat] Enable 2 DSv4 perf optimizations by default#14120
lfr-0531 merged 1 commit into
NVIDIA:feat/deepseek_v4from
lishicheng1996-nv:feat/dsv4-enable-default-opts

Conversation

@lishicheng1996-nv
Copy link
Copy Markdown
Collaborator

@lishicheng1996-nv lishicheng1996-nv commented May 14, 2026

Description

Flip the default of two previously opt-in DSv4 perf flags so users get them out of the box. Both remain user-disableable via the same env vars they always had.

Default flipped File Source Disable
TRTLLM_FUSED_FP8_QUANT_PACK "0" → "1" tensorrt_llm/_torch/custom_ops/torch_custom_ops.py PR #13628 — fused FP8 1×128 quantize + UE8M0 pack on SM100 TRTLLM_FUSED_FP8_QUANT_PACK=0
TRTLLM_MLA_EXTRA_OVERLAP "0" → "1" tensorrt_llm/_torch/modules/attention.py PR #13629 — MLA dependency-aware overlap on DSv4 TRTLLM_MLA_EXTRA_OVERLAP=0

A third originally-proposed flip (use_cute_dsl_blockscaling_bmm from False to True) is dropped from this PR and deferred. The cute_dsl FP8 BMM path also fires for DSv3 K/V absorption BMMs on Blackwell + FP8 block-scales — the fp8_block_scaling_bmm_out dispatcher at tensorrt_llm/_torch/modules/attention.py:1161 is not gated on is_deepseek_v4, so flipping the default would silently change DSv3 perf behavior (the K/V BMMs switch from torch.bmm against pre-dequanted bf16 weights to cute_dsl_fp8_bmm_blackwell consuming native FP8). DSv3 Blackwell + FP8 block-scales hasn't been re-benched, so a separate PR will re-propose the flip after a DSv3 smoke confirms no regression.

Test Coverage

PR Checklist

Please review the following before submitting your PR:

  • PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.

  • PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.

  • Test cases are provided for new code paths (see test instructions)

  • Any new dependencies have been scanned for license and vulnerabilities

  • CODEOWNERS updated if ownership changes

  • Documentation updated as needed

  • Update tava architecture diagram if there is a significant design change in PR.

  • The reviewers assigned automatically/manually are appropriate for the PR.

  • Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

@lishicheng1996-nv lishicheng1996-nv requested review from a team as code owners May 14, 2026 07:03
@lishicheng1996-nv lishicheng1996-nv requested review from dongxuy04, pengbowang-nv and yizhang-nv and removed request for a team May 14, 2026 07:03
@lfr-0531 lfr-0531 force-pushed the feat/deepseek_v4 branch from 0a93d10 to 118e7a5 Compare May 14, 2026 07:44
@lfr-0531 lfr-0531 requested review from a team as code owners May 14, 2026 07:44
@lfr-0531 lfr-0531 requested review from mzweilz and yiqingy0 and removed request for a team May 14, 2026 07:44
@lishicheng1996-nv lishicheng1996-nv force-pushed the feat/dsv4-enable-default-opts branch from 340938e to aae02ea Compare May 14, 2026 07:53
@lishicheng1996-nv
Copy link
Copy Markdown
Collaborator Author

/bot run --add-multi-gpu-test

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #48336 [ run ] triggered by Bot. Commit: aae02ea Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #48336 [ run ] completed with state SUCCESS. Commit: aae02ea
/LLM/main/L0_MergeRequest_PR pipeline #38144 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

@lfr-0531
Copy link
Copy Markdown
Collaborator

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #48399 [ run ] triggered by Bot. Commit: aae02ea Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #48399 [ run ] completed with state SUCCESS. Commit: aae02ea
/LLM/main/L0_MergeRequest_PR pipeline #38202 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

@lishicheng1996-nv lishicheng1996-nv force-pushed the feat/dsv4-enable-default-opts branch 2 times, most recently from 30ef664 to a871c4e Compare May 15, 2026 01:06
@lishicheng1996-nv
Copy link
Copy Markdown
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #48471 [ run ] triggered by Bot. Commit: a871c4e Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #48471 [ run ] completed with state SUCCESS. Commit: a871c4e
/LLM/main/L0_MergeRequest_PR pipeline #38266 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

@lishicheng1996-nv
Copy link
Copy Markdown
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #48552 [ run ] triggered by Bot. Commit: a871c4e Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #48552 [ run ] completed with state SUCCESS. Commit: a871c4e
/LLM/main/L0_MergeRequest_PR pipeline #38341 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

@lishicheng1996-nv
Copy link
Copy Markdown
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #48724 [ run ] triggered by Bot. Commit: a871c4e Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #48724 [ run ] completed with state SUCCESS. Commit: a871c4e
/LLM/main/L0_MergeRequest_PR pipeline #38493 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

@lfr-0531
Copy link
Copy Markdown
Collaborator

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #48735 [ run ] triggered by Bot. Commit: a871c4e Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #48735 [ run ] completed with state SUCCESS. Commit: a871c4e
/LLM/main/L0_MergeRequest_PR pipeline #38501 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

@lfr-0531
Copy link
Copy Markdown
Collaborator

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #48773 [ run ] triggered by Bot. Commit: a871c4e Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #48773 [ run ] completed with state SUCCESS. Commit: a871c4e
/LLM/main/L0_MergeRequest_PR pipeline #38536 completed with status: 'SUCCESS'

CI Report

Link to invocation

Switch defaults so DeepSeek-V4 inference paths run with two previously
opt-in optimizations enabled out of the box. Both remain user-disableable
via the same env vars.

  1. PR NVIDIA#13628 fused FP8 1x128 quantize + UE8M0 pack on SM100
     - tensorrt_llm/_torch/custom_ops/torch_custom_ops.py
     - Env: TRTLLM_FUSED_FP8_QUANT_PACK (default '0' -> '1')
     - Disable: TRTLLM_FUSED_FP8_QUANT_PACK=0
  2. PR NVIDIA#13629 MLA dependency-aware overlap on DSv4
     - tensorrt_llm/_torch/modules/attention.py
     - Env: TRTLLM_MLA_EXTRA_OVERLAP (default '0' -> '1')
     - Disable: TRTLLM_MLA_EXTRA_OVERLAP=0

The third originally-proposed flip (use_cute_dsl_blockscaling_bmm) is
dropped from this PR. The cute_dsl FP8 BMM path is also invoked for
DSv3 K/V absorption BMMs on Blackwell + FP8 block-scales (the
fp8_block_scaling_bmm_out dispatcher at attention.py:1161 is not gated
on is_deepseek_v4), so flipping the default would change DSv3 perf
behavior silently. Defer that flip until a DSv3 Blackwell-FP8 smoke
confirms no regression.

Signed-off-by: Shicheng Li <shicli@nvidia.com>
@lishicheng1996-nv lishicheng1996-nv force-pushed the feat/dsv4-enable-default-opts branch from a871c4e to c2a0ed6 Compare May 18, 2026 07:16
@lishicheng1996-nv lishicheng1996-nv changed the title [None][feat] Enable 3 DSv4 perf optimizations by default [None][feat] Enable 2 DSv4 perf optimizations by default May 18, 2026
@lfr-0531 lfr-0531 merged commit d7d9036 into NVIDIA:feat/deepseek_v4 May 18, 2026
7 checks passed
Thachnh added a commit to deepinfra/TensorRT-LLM that referenced this pull request May 19, 2026
…uashed)

Enables by default:
- TRTLLM_FUSED_FP8_QUANT_PACK
- TRTLLM_MLA_EXTRA_OVERLAP
- use_cute_dsl_blockscaling_bmm

Conflicts resolved by keeping pr14120's new defaults but preserving the
fused CUDA q-norm path from NVIDIA#13975 (the older inline reshape path is
gone).

Source: NVIDIA#14120 (open PR)
lfr-0531 pushed a commit to lfr-0531/TensorRT-LLM that referenced this pull request May 29, 2026
Signed-off-by: Shicheng Li <shicli@nvidia.com>
(cherry picked from commit d7d9036)
Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants