[None][fix] Use compressed lengths for DeepSeek-V4 indexer by mingyangHao · Pull Request #13802 · NVIDIA/TensorRT-LLM

mingyangHao · 2026-05-06T07:38:50Z

Description

Fix DeepSeek-V4 DSA indexer metadata handling for compressed KV cache.

This PR updates the indexer compression-ratio handling so both 0 and 1 are treated as disabled compression, matching model configs where 0 means no
compressor. For compressed paths, indexer KV lengths and max sequence length remain in compressed KV-token space.

It also avoids recomputing get_indexer_kv_lens() in the decode forward path by reusing the generation indexer KV lengths prepared together with scheduler
metadata. This prevents introducing an extra element-wise CUDA division kernel in the hot path.

Test Coverage

pre-commit run --show-diff-on-failure --files tensorrt_llm/_torch/attention_backend/sparse/dsa.py tests/unittest/_torch/attention/sparse/dsa/ test_dsa_indexer.py
B300 container:
- test_indexer_compress_ratio_zero_or_one_means_uncompressed
- test_indexer_decode_with_paged_kv_cache

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

mingyangHao · 2026-05-06T10:51:28Z

/bot run

lfr-0531 · 2026-05-07T00:58:11Z

/bot run

tensorrt-cicd · 2026-05-07T01:05:12Z

PR_Github #47059 [ run ] triggered by Bot. Commit: e606f5d Link to invocation

lfr-0531 · 2026-05-07T01:05:28Z

/bot kill

Signed-off-by: Mingyang Hao <mingyangHao@users.noreply.github.com>

lfr-0531 · 2026-05-07T03:10:23Z

/bot run

tensorrt-cicd · 2026-05-07T03:17:37Z

PR_Github #47093 [ run ] triggered by Bot. Commit: 5f2f2fb Link to invocation

tensorrt-cicd · 2026-05-07T06:30:52Z

PR_Github #47093 [ run ] completed with state SUCCESS. Commit: 5f2f2fb
/LLM/main/L0_MergeRequest_PR pipeline #37061 completed with status: 'SUCCESS'

CI Report

Link to invocation

Signed-off-by: Mingyang Hao <mingyangHao@users.noreply.github.com> Co-authored-by: Mingyang Hao <mingyangHao@users.noreply.github.com> (cherry picked from commit 7c83907) Signed-off-by: yuhangh <58161490+heyuhhh@users.noreply.github.com>

Signed-off-by: Mingyang Hao <mingyangHao@users.noreply.github.com> Co-authored-by: Mingyang Hao <mingyangHao@users.noreply.github.com> (cherry picked from commit 7c83907) Signed-off-by: yuhangh <58161490+heyuhhh@users.noreply.github.com> Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com>

) Signed-off-by: Mingyang Hao <mingyangHao@users.noreply.github.com> Co-authored-by: Mingyang Hao <mingyangHao@users.noreply.github.com> (cherry picked from commit 7c83907) Signed-off-by: yuhangh <58161490+heyuhhh@users.noreply.github.com> Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com> (cherry picked from commit 4a42788) Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com>

mingyangHao requested review from jiaganc and lfr-0531 May 6, 2026 07:38

mingyangHao self-assigned this May 6, 2026

mingyangHao requested a review from a team as a code owner May 6, 2026 07:38

mingyangHao requested review from pengbowang-nv and removed request for a team May 6, 2026 07:38

mingyangHao added the deepseek-v4 label May 6, 2026

mingyangHao removed the request for review from pengbowang-nv May 6, 2026 07:38

mingyangHao requested a review from a team as a code owner May 6, 2026 08:01

lfr-0531 reviewed May 6, 2026

View reviewed changes

Comment thread tensorrt_llm/_torch/attention_backend/sparse/dsa.py Outdated

mingyangHao force-pushed the mingyangh/deepseek-v4-indexer-compressed-cache-fix branch from 7672e13 to e606f5d Compare May 6, 2026 09:37

lfr-0531 approved these changes May 6, 2026

View reviewed changes

mingyangHao added 4 commits May 6, 2026 19:17

[None][fix] Use compressed lengths for DeepSeek-V4 indexer

93f926d

Signed-off-by: Mingyang Hao <mingyangHao@users.noreply.github.com>

[None][test] Add DeepSeek-V4 chunked prefill coverage

f0a0ad0

Signed-off-by: Mingyang Hao <mingyangHao@users.noreply.github.com>

[None][fix] Support disabled DSA indexer compression

b3a8fb6

Signed-off-by: Mingyang Hao <mingyangHao@users.noreply.github.com>

[None][fix] Precompute DSA indexer KV lengths

5f2f2fb

Signed-off-by: Mingyang Hao <mingyangHao@users.noreply.github.com>

mingyangHao force-pushed the mingyangh/deepseek-v4-indexer-compressed-cache-fix branch from e606f5d to 5f2f2fb Compare May 7, 2026 02:20

lfr-0531 merged commit 7c83907 into NVIDIA:feat/deepseek_v4 May 7, 2026
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[None][fix] Use compressed lengths for DeepSeek-V4 indexer#13802

[None][fix] Use compressed lengths for DeepSeek-V4 indexer#13802
lfr-0531 merged 4 commits into
NVIDIA:feat/deepseek_v4from
mingyangHao:mingyangh/deepseek-v4-indexer-compressed-cache-fix

mingyangHao commented May 6, 2026 •

edited

Loading

Uh oh!

Uh oh!

mingyangHao commented May 6, 2026

Uh oh!

lfr-0531 commented May 7, 2026

Uh oh!

tensorrt-cicd commented May 7, 2026

Uh oh!

lfr-0531 commented May 7, 2026

Uh oh!

lfr-0531 commented May 7, 2026

Uh oh!

tensorrt-cicd commented May 7, 2026

Uh oh!

tensorrt-cicd commented May 7, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

mingyangHao commented May 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Test Coverage

GitHub Bot Help

Uh oh!

Uh oh!

mingyangHao commented May 6, 2026

Uh oh!

lfr-0531 commented May 7, 2026

Uh oh!

tensorrt-cicd commented May 7, 2026

Uh oh!

lfr-0531 commented May 7, 2026

Uh oh!

lfr-0531 commented May 7, 2026

Uh oh!

tensorrt-cicd commented May 7, 2026

Uh oh!

tensorrt-cicd commented May 7, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

mingyangHao commented May 6, 2026 •

edited

Loading