[Op] Add flash-attention CUDA kernel by comaniac · Pull Request #49 · awslabs/slapo

comaniac · 2023-02-09T01:33:44Z

Description

It achieves better latency over the triton one in the case of no attention mask, which are ignored in both CUDA and triton kernels for now.

comaniac added 7 commits February 9, 2023 01:30

[Op] Add flash-attention cuda kernel

d79266f

change names

7a08b51

fix

9068f3d

lint

22fef02

configurable attn op

35afeab

sm_80

2d64a75

fix data device

fb0da80

szhengac approved these changes Feb 9, 2023

View reviewed changes

szhengac merged commit 23e8bb4 into awslabs:main Feb 9, 2023

comaniac deleted the more-attn branch February 9, 2023 21:55