Skip to content

Fix shared memory layout mismatch in 2-bit residual kernel#6

Merged
DD-DuDa merged 1 commit intoOpenBitSys:e2efrom
Iridescent-gcrace:BitDecoding-e2e
May 9, 2026
Merged

Fix shared memory layout mismatch in 2-bit residual kernel#6
DD-DuDa merged 1 commit intoOpenBitSys:e2efrom
Iridescent-gcrace:BitDecoding-e2e

Conversation

@Iridescent-gcrace
Copy link
Copy Markdown

Summary

This PR fixes an invalid shared memory access in the 2-bit residual decode path.

The residual kernel was using Kernel_traits::SharedStorage to interpret dynamic shared memory, while the launch side allocates shared memory according to Kernel_traits::SharedStorage_residual.

This mismatch can cause the kernel to compute shared-memory field offsets using the packed/split-kernel layout, even though the actual allocated shared-memory region follows the residual layout. In the 2-bit residual path, this leads to an invalid __shared__ write detected by compute-sanitizer.

Root Cause

Before this patch, the residual kernel used:

using SharedStorage = typename Kernel_traits::SharedStorage;

However, the residual kernel should use the residual shared-memory layout:

using SharedStorage = typename Kernel_traits::SharedStorage_residual;

The bug may not always be visible in the 4-bit path because the accessed shared-memory range can happen to stay within the allocated region. In the 2-bit residual path, the residual block/layout pressure is larger and the mismatch triggers an out-of-bounds shared-memory write.

Fix

Change the residual kernel shared-memory alias from:

using SharedStorage = typename Kernel_traits::SharedStorage;

to:

using SharedStorage = typename Kernel_traits::SharedStorage_residual;

This makes the shared-memory layout used by the kernel consistent with the shared-memory size allocated at launch time.

Reproduction

After building and installing the package, the issue can be reproduced with:

python evaluation/example.py \
    --model_path xxx \
    --max_length 8192 \
    --num_bits 2 \
    --quant_mode k-channel \
    --attn_backend bit_decoding

@DD-DuDa DD-DuDa merged commit 6bfac90 into OpenBitSys:e2e May 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants