[https://nvbugs/6032056][fix] Clamp block indices to prevent OOB in DSA with MTP#12657
[https://nvbugs/6032056][fix] Clamp block indices to prevent OOB in DSA with MTP#12657sunnyqgg merged 1 commit intoNVIDIA:mainfrom
Conversation
|
/bot run |
📝 WalkthroughWalkthroughThe change modifies boundary checking logic in Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes 🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
|
PR_Github #41192 [ run ] triggered by Bot. Commit: |
|
PR_Github #41192 [ run ] completed with state
|
|
/bot run |
|
PR_Github #41203 [ run ] triggered by Bot. Commit: |
|
PR_Github #41203 [ run ] completed with state
|
|
/bot run |
|
PR_Github #41311 [ run ] triggered by Bot. Commit: |
|
PR_Github #41311 [ run ] completed with state
|
…SA with MTP during CUDA graph capture Signed-off-by: qgai <qgai@nvidia.com>
|
/bot run --disable-fail-fast |
|
PR_Github #41334 [ run ] triggered by Bot. Commit: |
|
PR_Github #41334 [ run ] completed with state
|
|
/bot run --disable-fail-fast |
|
PR_Github #41423 [ run ] triggered by Bot. Commit: |
|
PR_Github #41423 [ run ] completed with state |
…SA with MTP (NVIDIA#12657) Signed-off-by: qgai <qgai@nvidia.com>
Summary
_compute_slot_mappingswhen using DSA (Dense Sparse Attention) with MTP (Multi-Token Prediction) during CUDA graph capture/replay.block_offsets.shape[1], causing illegal memory access. This fix clamps the indices on GPU to stay within bounds.Changes
tensorrt_llm/_torch/attention_backend/sparse/dsa.py: Movemax_blockscomputation before the branch. On CUDA tensors, clampblock_indices_in_seqto[0, max_blocks-1]instead of skipping the check entirely. CPU path retains the assertion.Test plan
Summary by CodeRabbit
Bug Fixes