Open
Description
@sijiac Hello everyone! I want to raise a question about the usage of CUDA qualifier launch_bounds.
In CUDA document, launch_bounds( ) only has 2 parameters: maxThreadsPerBlock and minBlocksPerMultiprocessor. However, at line485 of https://github.com/deepseek-ai/FlashMLA/blob/main/csrc/flash_fwd_mla_kernel.h, the launch_bounds used 3 parameters, what's meaning of the third parameter?
the code snippet: global void launch_bounds(256, 1, 1)
code line link:
FlashMLA/csrc/flash_fwd_mla_kernel.h
Line 485 in b31bfe7
Metadata
Metadata
Assignees
Labels
No labels