[https://nvbugs/6032056][fix] Clamp block indices to prevent OOB in DSA with MTP by sunnyqgg · Pull Request #12657 · NVIDIA/TensorRT-LLM

sunnyqgg · 2026-04-01T11:31:31Z

Summary

Fix out-of-bounds block index access in _compute_slot_mappings when using DSA (Dense Sparse Attention) with MTP (Multi-Token Prediction) during CUDA graph capture/replay.
Stale token-to-sequence mappings during CUDA graph padding can produce block indices that exceed block_offsets.shape[1], causing illegal memory access. This fix clamps the indices on GPU to stay within bounds.

Changes

tensorrt_llm/_torch/attention_backend/sparse/dsa.py: Move max_blocks computation before the branch. On CUDA tensors, clamp block_indices_in_seq to [0, max_blocks-1] instead of skipping the check entirely. CPU path retains the assertion.

Test plan

Run MTP + DSA throughput benchmark with GLM-5-NVFP4 on 4 GPUs (TP=4, EP=4)
Verify no OOB errors during CUDA graph capture/replay
CI pre-merge tests pass

Summary by CodeRabbit

Bug Fixes

Improved sparse attention operation handling on CUDA devices to prevent out-of-bounds errors during graph execution.

sunnyqgg · 2026-04-01T11:34:44Z

/bot run

coderabbitai · 2026-04-01T11:35:55Z

📝 Walkthrough

Walkthrough

The change modifies boundary checking logic in _compute_slot_mappings to handle CUDA tensor indexing during graph capture. For CUDA tensors, the code now clamps indices to valid bounds; for non-CUDA tensors, it retains explicit assertions.

Changes

Cohort / File(s)	Summary
Sparse DSA CUDA Graph Safe Clamping `tensorrt_llm/_torch/attention_backend/sparse/dsa.py`	Modified `_compute_slot_mappings` to unconditionally compute `max_blocks` and apply index clamping for CUDA tensors, while preserving explicit bounds assertions for non-CUDA tensors. Addresses out-of-bounds indexing caused by stale token→sequence mappings during CUDA graph capture and replay.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly identifies the specific issue (OOB in DSA with MTP), the fix approach (clamping block indices), and includes the NVBugs ticket identifier.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Description check	✅ Passed	The PR description clearly explains the issue, solution, and test plan with sufficient detail and follows the repository's guidelines.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

tensorrt-cicd · 2026-04-01T11:41:19Z

PR_Github #41192 [ run ] triggered by Bot. Commit: a05184e Link to invocation

tensorrt-cicd · 2026-04-01T13:32:54Z

PR_Github #41192 [ run ] completed with state SUCCESS. Commit: a05184e
/LLM/main/L0_MergeRequest_PR pipeline #32155 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

sunnyqgg · 2026-04-01T14:21:28Z

/bot run

tensorrt-cicd · 2026-04-01T14:27:35Z

PR_Github #41203 [ run ] triggered by Bot. Commit: a05184e Link to invocation

NVShreyas

LGTM

tensorrt-cicd · 2026-04-01T15:32:37Z

PR_Github #41203 [ run ] completed with state SUCCESS. Commit: a05184e
/LLM/main/L0_MergeRequest_PR pipeline #32164 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

sunnyqgg · 2026-04-02T02:39:03Z

/bot run

tensorrt-cicd · 2026-04-02T02:47:14Z

PR_Github #41311 [ run ] triggered by Bot. Commit: a05184e Link to invocation

tensorrt-cicd · 2026-04-02T03:50:16Z

PR_Github #41311 [ run ] completed with state SUCCESS. Commit: a05184e
/LLM/main/L0_MergeRequest_PR pipeline #32264 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

…SA with MTP during CUDA graph capture Signed-off-by: qgai <qgai@nvidia.com>

longlee0622 · 2026-04-02T04:21:45Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-04-02T04:27:14Z

PR_Github #41334 [ run ] triggered by Bot. Commit: 398d76f Link to invocation

pengbowang-nv

LGTM.

tensorrt-cicd · 2026-04-02T11:03:18Z

PR_Github #41334 [ run ] completed with state SUCCESS. Commit: 398d76f
/LLM/main/L0_MergeRequest_PR pipeline #32282 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

longlee0622 · 2026-04-02T11:24:03Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-04-02T11:29:50Z

PR_Github #41423 [ run ] triggered by Bot. Commit: 398d76f Link to invocation

tensorrt-cicd · 2026-04-02T13:45:46Z

PR_Github #41423 [ run ] completed with state SUCCESS. Commit: 398d76f
/LLM/main/L0_MergeRequest_PR pipeline #32356 completed with status: 'SUCCESS'

CI Report

Link to invocation

…SA with MTP (NVIDIA#12657) Signed-off-by: qgai <qgai@nvidia.com>

sunnyqgg requested a review from a team as a code owner April 1, 2026 11:31

sunnyqgg requested a review from pengbowang-nv April 1, 2026 11:31

github-actions bot assigned sunnyqgg Apr 1, 2026

sunnyqgg requested a review from NVShreyas April 1, 2026 14:25

NVShreyas approved these changes Apr 1, 2026

View reviewed changes

[https://nvbugs/6032056][fix] Clamp block indices to prevent OOB in D…

398d76f

…SA with MTP during CUDA graph capture Signed-off-by: qgai <qgai@nvidia.com>

longlee0622 force-pushed the bug_6032056 branch from a05184e to 398d76f Compare April 2, 2026 04:21

pengbowang-nv approved these changes Apr 2, 2026

View reviewed changes

sunnyqgg merged commit c60615a into NVIDIA:main Apr 2, 2026
5 checks passed

bmarimuthu-nv mentioned this pull request Apr 2, 2026

[None][feat] Add support for Gemma3n and sharedKV cache attention in AutoDeploy #12205

Closed

1 task

karen-sy pushed a commit to karen-sy/TensorRT-LLM that referenced this pull request Apr 7, 2026

[https://nvbugs/6032056][fix] Clamp block indices to prevent OOB in D…

5752586

…SA with MTP (NVIDIA#12657) Signed-off-by: qgai <qgai@nvidia.com>

Conversation

sunnyqgg commented Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Test plan

Summary by CodeRabbit

Bug Fixes

Uh oh!

sunnyqgg commented Apr 1, 2026

Uh oh!

coderabbitai bot commented Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Uh oh!

tensorrt-cicd commented Apr 1, 2026

Uh oh!

tensorrt-cicd commented Apr 1, 2026

Uh oh!

sunnyqgg commented Apr 1, 2026

Uh oh!

tensorrt-cicd commented Apr 1, 2026

Uh oh!

NVShreyas left a comment

Choose a reason for hiding this comment

Uh oh!

tensorrt-cicd commented Apr 1, 2026

Uh oh!

sunnyqgg commented Apr 2, 2026

Uh oh!

tensorrt-cicd commented Apr 2, 2026

Uh oh!

tensorrt-cicd commented Apr 2, 2026

Uh oh!

longlee0622 commented Apr 2, 2026

Uh oh!

tensorrt-cicd commented Apr 2, 2026

Uh oh!

pengbowang-nv left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tensorrt-cicd commented Apr 2, 2026

Uh oh!

longlee0622 commented Apr 2, 2026

Uh oh!

tensorrt-cicd commented Apr 2, 2026

Uh oh!

tensorrt-cicd commented Apr 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

sunnyqgg commented Apr 1, 2026 •

edited

Loading

coderabbitai bot commented Apr 1, 2026 •

edited

Loading

pengbowang-nv left a comment •

edited

Loading