Suggestion Description
The current pa_fwd_asm kernel only supports seqlen_q = 1. However, speculative decoding scenarios require handling seqlen_q values of 2 or 4, depending on the configuration. We need to add support for these cases
Thanks!!
cc @carlushuang
Operating System
No response
GPU
MI300
ROCm Component
No response