We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TL;DR the current code for BloomAttention is surprisingly inefficient, with obvious problems that should be easy to fix
The current code for BloomAttention is surprisingly inefficient:
It also misses a ton of opportunities for in-place ops for memory savings, but that's the least of its problems
The text was updated successfully, but these errors were encountered:
Updates on alibi.py: https://gist.github.com/justheuristic/9751e02a2a5604a98a4fe0b6b688e808 Transformers PR developed by @younesbelkada : huggingface/transformers#17759
Sorry, something went wrong.
I've copy-pasted changes from @younesbelkada 's PR above. As of right now ( aaaf0c2 ) , all four issues are solved.
No branches or pull requests
TL;DR the current code for BloomAttention is surprisingly inefficient, with obvious problems that should be easy to fix
The current code for BloomAttention is surprisingly inefficient:
It also misses a ton of opportunities for in-place ops for memory savings, but that's the least of its problems
The text was updated successfully, but these errors were encountered: