-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Future support for attention bias or masking #242
Comments
As mentioned in the README, we have an experimental implementation in Triton that support attention |
Thank you! I'll give it a try |
are there plans to implement gradients for the bias? i.e for learnt attention biases. Or how difficult would this be too implement do you think? |
I'm not planning to work on it as I don't use attention bias in my work. Implementing it in Triton is probably not hard, all the necessary ingredients are there in the Triton backward pass implementation. I suspect one just has to add some code to save the gradient. |
Hi, I find flashattn of triton version is very experimental, only support 64 head dimension size. Does this condition have any changes? |
Hi, I noticed this plan for customized attention bias was recently removed. Do you still have this in plan anytime? I feel Flash-attention is such a great project, and having this feature will make it perfect 😄
40a25c8#diff-b335630551682c19a781afebcf4d07bf978fb1f8ac04c6bf87428ed5106870f5L80
The text was updated successfully, but these errors were encountered: