Skip to content

Flash Attention 2 - Torch2.1.1+cu121 Ada Lovelace and Hopper Wheels for Python 3.10 and 3.11

Compare
Choose a tag to compare
@NeedsMoar NeedsMoar released this 17 Dec 01:49
30b8c43

Initial uploads. Ada Lovelace (native) and Hopper (via PTX) only. Volta + Ampere compatible binaries will be uploaded shortly. Just make sure you've got a recent xformers installed or install it after (xformers doesn't have a hard dependency on this but all the better CUDA attention algorithms are here), and "pip install" the correct wheel for your python version.

That's it. I see a ~2it/s improvement at 1024x1024 on regular SD1.5 models, but that jumps to a 50-60% speed improvement at 2048x2048. At normal generation sizes like 512x512 it'll do a little for you but not as much. More importantly it's much more memory efficient than the pytorch 2 version it seems.

Edit: Fixed the descriptive naming; I'd renamed after building, there's a good chance pip will reject them with a nonsensical "bad filename" error message if they're not the same as they were when built. Stupid but whatever.