Skip to content

[Op] Add flash-attention CUDA kernel#49

Merged
szhengac merged 7 commits intoawslabs:mainfrom
comaniac:more-attn
Feb 9, 2023
Merged

[Op] Add flash-attention CUDA kernel#49
szhengac merged 7 commits intoawslabs:mainfrom
comaniac:more-attn

Conversation

@comaniac
Copy link
Contributor

@comaniac comaniac commented Feb 9, 2023

Description

It achieves better latency over the triton one in the case of no attention mask, which are ignored in both CUDA and triton kernels for now.

Checklist

  • PR's title starts with a category (e.g. [Bugfix], [Model], [Tutorial], etc)
  • Changes are complete (i.e. I finished coding on this PR)
  • All changes have test coverage
  • Code is well-documented

@szhengac szhengac merged commit 23e8bb4 into awslabs:main Feb 9, 2023
@comaniac comaniac deleted the more-attn branch February 9, 2023 21:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants