Conversation
Signed-off-by: AlpinDale <alpindale@gmail.com>
There was a problem hiding this comment.
Code Review
This pull request introduces a fix to prevent FlexAttention block sizes (block_m and block_n) from becoming too small by enforcing a lower bound of 16. This is a good fix to improve the stability of the attention kernel. My review focuses on improving the code's maintainability by replacing the magic number 16 with a named constant, which makes the code clearer and easier to manage in the future.
| return kernel_options | ||
| else: | ||
| preferred_block = 32 if query.dtype == torch.float32 else 64 | ||
| block_lower_bound = 16 |
There was a problem hiding this comment.
The value 16 is a magic number. To improve readability and maintainability, it should be defined as a constant with a descriptive name in UPPER_CASE_WITH_UNDERSCORES as per PEP 8. This makes the code's intent clearer and future updates easier. The suggestion below introduces a constant and assigns it to block_lower_bound for compatibility with the rest of the code in this PR. Ideally, block_lower_bound should be replaced with the constant MIN_KERNEL_BLOCK_SIZE directly on lines 850-851.
| block_lower_bound = 16 | |
| MIN_KERNEL_BLOCK_SIZE = 16 | |
| block_lower_bound = MIN_KERNEL_BLOCK_SIZE |
No description provided.