Skip to content

[None][fix] Use compressed lengths for DeepSeek-V4 indexer#13802

Merged
lfr-0531 merged 4 commits into
NVIDIA:feat/deepseek_v4from
mingyangHao:mingyangh/deepseek-v4-indexer-compressed-cache-fix
May 7, 2026
Merged

[None][fix] Use compressed lengths for DeepSeek-V4 indexer#13802
lfr-0531 merged 4 commits into
NVIDIA:feat/deepseek_v4from
mingyangHao:mingyangh/deepseek-v4-indexer-compressed-cache-fix

Conversation

@mingyangHao
Copy link
Copy Markdown
Collaborator

@mingyangHao mingyangHao commented May 6, 2026

@coderabbitai summary

Description

Fix DeepSeek-V4 DSA indexer metadata handling for compressed KV cache.

This PR updates the indexer compression-ratio handling so both 0 and 1 are treated as disabled compression, matching model configs where 0 means no
compressor. For compressed paths, indexer KV lengths and max sequence length remain in compressed KV-token space.

It also avoids recomputing get_indexer_kv_lens() in the decode forward path by reusing the generation indexer KV lengths prepared together with scheduler
metadata. This prevents introducing an extra element-wise CUDA division kernel in the hot path.

Test Coverage

  • pre-commit run --show-diff-on-failure --files tensorrt_llm/_torch/attention_backend/sparse/dsa.py tests/unittest/_torch/attention/sparse/dsa/ test_dsa_indexer.py
  • B300 container:
    • test_indexer_compress_ratio_zero_or_one_means_uncompressed
    • test_indexer_decode_with_paged_kv_cache

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

@mingyangHao mingyangHao requested review from jiaganc and lfr-0531 May 6, 2026 07:38
@mingyangHao mingyangHao self-assigned this May 6, 2026
@mingyangHao mingyangHao requested a review from a team as a code owner May 6, 2026 07:38
@mingyangHao mingyangHao requested review from pengbowang-nv and removed request for a team May 6, 2026 07:38
@mingyangHao mingyangHao removed the request for review from pengbowang-nv May 6, 2026 07:38
@mingyangHao mingyangHao requested a review from a team as a code owner May 6, 2026 08:01
Comment thread tensorrt_llm/_torch/attention_backend/sparse/dsa.py Outdated
@mingyangHao mingyangHao force-pushed the mingyangh/deepseek-v4-indexer-compressed-cache-fix branch from 7672e13 to e606f5d Compare May 6, 2026 09:37
@mingyangHao
Copy link
Copy Markdown
Collaborator Author

/bot run

@lfr-0531
Copy link
Copy Markdown
Collaborator

lfr-0531 commented May 7, 2026

/bot run

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #47059 [ run ] triggered by Bot. Commit: e606f5d Link to invocation

@lfr-0531
Copy link
Copy Markdown
Collaborator

lfr-0531 commented May 7, 2026

/bot kill

Signed-off-by: Mingyang Hao <mingyangHao@users.noreply.github.com>
Signed-off-by: Mingyang Hao <mingyangHao@users.noreply.github.com>
Signed-off-by: Mingyang Hao <mingyangHao@users.noreply.github.com>
Signed-off-by: Mingyang Hao <mingyangHao@users.noreply.github.com>
@mingyangHao mingyangHao force-pushed the mingyangh/deepseek-v4-indexer-compressed-cache-fix branch from e606f5d to 5f2f2fb Compare May 7, 2026 02:20
@lfr-0531
Copy link
Copy Markdown
Collaborator

lfr-0531 commented May 7, 2026

/bot run

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #47093 [ run ] triggered by Bot. Commit: 5f2f2fb Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #47093 [ run ] completed with state SUCCESS. Commit: 5f2f2fb
/LLM/main/L0_MergeRequest_PR pipeline #37061 completed with status: 'SUCCESS'

CI Report

Link to invocation

@lfr-0531 lfr-0531 merged commit 7c83907 into NVIDIA:feat/deepseek_v4 May 7, 2026
5 checks passed
lfr-0531 pushed a commit that referenced this pull request May 7, 2026
Signed-off-by: Mingyang Hao <mingyangHao@users.noreply.github.com>
Co-authored-by: Mingyang Hao <mingyangHao@users.noreply.github.com>
(cherry picked from commit 7c83907)
Signed-off-by: yuhangh <58161490+heyuhhh@users.noreply.github.com>
lfr-0531 pushed a commit that referenced this pull request May 14, 2026
Signed-off-by: Mingyang Hao <mingyangHao@users.noreply.github.com>
Co-authored-by: Mingyang Hao <mingyangHao@users.noreply.github.com>
(cherry picked from commit 7c83907)
Signed-off-by: yuhangh <58161490+heyuhhh@users.noreply.github.com>
Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com>
lfr-0531 pushed a commit to lfr-0531/TensorRT-LLM that referenced this pull request May 29, 2026
)

Signed-off-by: Mingyang Hao <mingyangHao@users.noreply.github.com>
Co-authored-by: Mingyang Hao <mingyangHao@users.noreply.github.com>
(cherry picked from commit 7c83907)
Signed-off-by: yuhangh <58161490+heyuhhh@users.noreply.github.com>
Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com>
(cherry picked from commit 4a42788)
Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants