[None][feat] Enable 2 DSv4 perf optimizations by default by lishicheng1996-nv · Pull Request #14120 · NVIDIA/TensorRT-LLM

lishicheng1996-nv · 2026-05-14T07:03:52Z

Description

Flip the default of two previously opt-in DSv4 perf flags so users get them out of the box. Both remain user-disableable via the same env vars they always had.

Default flipped	File	Source	Disable
`TRTLLM_FUSED_FP8_QUANT_PACK` `"0" → "1"`	`tensorrt_llm/_torch/custom_ops/torch_custom_ops.py`	PR #13628 — fused FP8 1×128 quantize + UE8M0 pack on SM100	`TRTLLM_FUSED_FP8_QUANT_PACK=0`
`TRTLLM_MLA_EXTRA_OVERLAP` `"0" → "1"`	`tensorrt_llm/_torch/modules/attention.py`	PR #13629 — MLA dependency-aware overlap on DSv4	`TRTLLM_MLA_EXTRA_OVERLAP=0`

A third originally-proposed flip (use_cute_dsl_blockscaling_bmm from False to True) is dropped from this PR and deferred. The cute_dsl FP8 BMM path also fires for DSv3 K/V absorption BMMs on Blackwell + FP8 block-scales — the fp8_block_scaling_bmm_out dispatcher at tensorrt_llm/_torch/modules/attention.py:1161 is not gated on is_deepseek_v4, so flipping the default would silently change DSv3 perf behavior (the K/V BMMs switch from torch.bmm against pre-dequanted bf16 weights to cute_dsl_fp8_bmm_blackwell consuming native FP8). DSv3 Blackwell + FP8 block-scales hasn't been re-benched, so a separate PR will re-propose the flip after a DSv3 smoke confirms no regression.

Test Coverage

Existing DSv4 unit + integration tests already exercise both flipped defaults (they were the opt-in path in PR [None][feat] Fuse FP8 1x128 quantize + UE8M0 scale pack on SM100 #13628 / [None][feat] Add MLA dependency-aware overlap on DSv4 (compressor || q_b_proj+norm) #13629). No new tests needed for the flips themselves.
Manual bench: DSv4-Pro 8k-1k on GB300 with both defaults on (the bench configuration we've been running for the past two weeks) is the practical regression baseline; no perf drop observed.

PR Checklist

Please review the following before submitting your PR:

PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

lishicheng1996-nv · 2026-05-14T08:06:05Z

/bot run --add-multi-gpu-test

tensorrt-cicd · 2026-05-14T08:13:23Z

PR_Github #48336 [ run ] triggered by Bot. Commit: aae02ea Link to invocation

tensorrt-cicd · 2026-05-14T11:26:26Z

PR_Github #48336 [ run ] completed with state SUCCESS. Commit: aae02ea
/LLM/main/L0_MergeRequest_PR pipeline #38144 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

lfr-0531 · 2026-05-14T15:51:07Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-05-14T15:57:10Z

PR_Github #48399 [ run ] triggered by Bot. Commit: aae02ea Link to invocation

tensorrt-cicd · 2026-05-14T19:44:35Z

PR_Github #48399 [ run ] completed with state SUCCESS. Commit: aae02ea
/LLM/main/L0_MergeRequest_PR pipeline #38202 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

lishicheng1996-nv · 2026-05-15T01:08:27Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-05-15T01:15:47Z

PR_Github #48471 [ run ] triggered by Bot. Commit: a871c4e Link to invocation

tensorrt-cicd · 2026-05-15T06:29:11Z

PR_Github #48471 [ run ] completed with state SUCCESS. Commit: a871c4e
/LLM/main/L0_MergeRequest_PR pipeline #38266 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

lishicheng1996-nv · 2026-05-15T06:57:13Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-05-15T07:04:12Z

PR_Github #48552 [ run ] triggered by Bot. Commit: a871c4e Link to invocation

tensorrt-cicd · 2026-05-15T09:41:21Z

PR_Github #48552 [ run ] completed with state SUCCESS. Commit: a871c4e
/LLM/main/L0_MergeRequest_PR pipeline #38341 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

lishicheng1996-nv · 2026-05-17T01:47:01Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-05-17T01:54:14Z

PR_Github #48724 [ run ] triggered by Bot. Commit: a871c4e Link to invocation

tensorrt-cicd · 2026-05-17T04:29:14Z

PR_Github #48724 [ run ] completed with state SUCCESS. Commit: a871c4e
/LLM/main/L0_MergeRequest_PR pipeline #38493 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

lfr-0531 · 2026-05-17T05:07:33Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-05-17T05:13:07Z

PR_Github #48735 [ run ] triggered by Bot. Commit: a871c4e Link to invocation

tensorrt-cicd · 2026-05-17T06:08:45Z

PR_Github #48735 [ run ] completed with state SUCCESS. Commit: a871c4e
/LLM/main/L0_MergeRequest_PR pipeline #38501 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

lfr-0531 · 2026-05-17T16:00:37Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-05-17T16:07:24Z

PR_Github #48773 [ run ] triggered by Bot. Commit: a871c4e Link to invocation

tensorrt-cicd · 2026-05-17T16:39:38Z

PR_Github #48773 [ run ] completed with state SUCCESS. Commit: a871c4e
/LLM/main/L0_MergeRequest_PR pipeline #38536 completed with status: 'SUCCESS'

CI Report

Link to invocation

Switch defaults so DeepSeek-V4 inference paths run with two previously opt-in optimizations enabled out of the box. Both remain user-disableable via the same env vars. 1. PR NVIDIA#13628 fused FP8 1x128 quantize + UE8M0 pack on SM100 - tensorrt_llm/_torch/custom_ops/torch_custom_ops.py - Env: TRTLLM_FUSED_FP8_QUANT_PACK (default '0' -> '1') - Disable: TRTLLM_FUSED_FP8_QUANT_PACK=0 2. PR NVIDIA#13629 MLA dependency-aware overlap on DSv4 - tensorrt_llm/_torch/modules/attention.py - Env: TRTLLM_MLA_EXTRA_OVERLAP (default '0' -> '1') - Disable: TRTLLM_MLA_EXTRA_OVERLAP=0 The third originally-proposed flip (use_cute_dsl_blockscaling_bmm) is dropped from this PR. The cute_dsl FP8 BMM path is also invoked for DSv3 K/V absorption BMMs on Blackwell + FP8 block-scales (the fp8_block_scaling_bmm_out dispatcher at attention.py:1161 is not gated on is_deepseek_v4), so flipping the default would change DSv3 perf behavior silently. Defer that flip until a DSv3 Blackwell-FP8 smoke confirms no regression. Signed-off-by: Shicheng Li <shicli@nvidia.com>

…uashed) Enables by default: - TRTLLM_FUSED_FP8_QUANT_PACK - TRTLLM_MLA_EXTRA_OVERLAP - use_cute_dsl_blockscaling_bmm Conflicts resolved by keeping pr14120's new defaults but preserving the fused CUDA q-norm path from NVIDIA#13975 (the older inline reshape path is gone). Source: NVIDIA#14120 (open PR)

Signed-off-by: Shicheng Li <shicli@nvidia.com> (cherry picked from commit d7d9036) Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com>

lishicheng1996-nv requested review from a team as code owners May 14, 2026 07:03

lishicheng1996-nv requested review from dongxuy04, pengbowang-nv and yizhang-nv and removed request for a team May 14, 2026 07:03

github-actions Bot assigned lishicheng1996-nv May 14, 2026

lfr-0531 force-pushed the feat/deepseek_v4 branch from 0a93d10 to 118e7a5 Compare May 14, 2026 07:44

lfr-0531 requested review from a team as code owners May 14, 2026 07:44

lfr-0531 requested review from mzweilz and yiqingy0 and removed request for a team May 14, 2026 07:44

lishicheng1996-nv force-pushed the feat/dsv4-enable-default-opts branch from 340938e to aae02ea Compare May 14, 2026 07:53

lishicheng1996-nv force-pushed the feat/dsv4-enable-default-opts branch 2 times, most recently from 30ef664 to a871c4e Compare May 15, 2026 01:06

lishicheng1996-nv force-pushed the feat/dsv4-enable-default-opts branch from a871c4e to c2a0ed6 Compare May 18, 2026 07:16

lishicheng1996-nv changed the title ~~[None][feat] Enable 3 DSv4 perf optimizations by default~~ [None][feat] Enable 2 DSv4 perf optimizations by default May 18, 2026

lfr-0531 merged commit d7d9036 into NVIDIA:feat/deepseek_v4 May 18, 2026
7 checks passed

lfr-0531 approved these changes May 18, 2026

View reviewed changes

lfr-0531 added the deepseek-v4 label May 19, 2026

Conversation

lishicheng1996-nv commented May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Test Coverage

PR Checklist

GitHub Bot Help

Uh oh!

lishicheng1996-nv commented May 14, 2026

Uh oh!

tensorrt-cicd commented May 14, 2026

Uh oh!

tensorrt-cicd commented May 14, 2026

Uh oh!

lfr-0531 commented May 14, 2026

Uh oh!

tensorrt-cicd commented May 14, 2026

Uh oh!

tensorrt-cicd commented May 14, 2026

Uh oh!

lishicheng1996-nv commented May 15, 2026

Uh oh!

tensorrt-cicd commented May 15, 2026

Uh oh!

tensorrt-cicd commented May 15, 2026

Uh oh!

lishicheng1996-nv commented May 15, 2026

Uh oh!

tensorrt-cicd commented May 15, 2026

Uh oh!

tensorrt-cicd commented May 15, 2026

Uh oh!

lishicheng1996-nv commented May 17, 2026

Uh oh!

tensorrt-cicd commented May 17, 2026

Uh oh!

tensorrt-cicd commented May 17, 2026

Uh oh!

lfr-0531 commented May 17, 2026

Uh oh!

tensorrt-cicd commented May 17, 2026

Uh oh!

tensorrt-cicd commented May 17, 2026

Uh oh!

lfr-0531 commented May 17, 2026

Uh oh!

tensorrt-cicd commented May 17, 2026

Uh oh!

tensorrt-cicd commented May 17, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

lishicheng1996-nv commented May 14, 2026 •

edited

Loading