Skip to content

MQA implementation#832

Merged
aska-0096 merged 1 commit into
kabafrom
MQA
Aug 7, 2023
Merged

MQA implementation#832
aska-0096 merged 1 commit into
kabafrom
MQA

Conversation

@aska-0096
Copy link
Copy Markdown
Contributor

BatchedGemmSoftmaxGemmWmma based Multi-Query Attention implementation, with example:
make example_multi_query_attention_forward_wmma_fp16
./bin/example_multi_query_attention_forward_wmma_fp16

@aska-0096 aska-0096 merged commit 3cf4572 into kaba Aug 7, 2023
@illsilin illsilin deleted the MQA branch December 14, 2023 17:08
hyoon1 pushed a commit to hyoon1/composable_kernel that referenced this pull request Mar 19, 2026
…ads starvation when letting ninja decide how many workers to spawn or manual MAX_JOBS "guesses". Logic is to take the min value of MAX_JOBS auto-calculated by two metrics: 1: cpu cores 2: free memory. This should allow flash-attn to compile close to the most efficient manner under any consumer/server env. (ROCm#832)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant