MQA implementation by aska-0096 · Pull Request #832 · ROCm/composable_kernel

aska-0096 · 2023-08-07T11:01:51Z

BatchedGemmSoftmaxGemmWmma based Multi-Query Attention implementation, with example:
make example_multi_query_attention_forward_wmma_fp16
./bin/example_multi_query_attention_forward_wmma_fp16

…ads starvation when letting ninja decide how many workers to spawn or manual MAX_JOBS "guesses". Logic is to take the min value of MAX_JOBS auto-calculated by two metrics: 1: cpu cores 2: free memory. This should allow flash-attn to compile close to the most efficient manner under any consumer/server env. (ROCm#832)

MQA implementation

73e475d

aska-0096 merged commit 3cf4572 into kaba Aug 7, 2023

illsilin deleted the MQA branch December 14, 2023 17:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MQA implementation#832

MQA implementation#832
aska-0096 merged 1 commit into
kabafrom
MQA

aska-0096 commented Aug 7, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

aska-0096 commented Aug 7, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant