Release Flash Attention 2 - Torch2.1.1+cu121 Ada Lovelace and Hopper Wheels for Python 3.10 and 3.11 · NeedsMoar/flash-attention-2-builds

Initial uploads. Ada Lovelace (native) and Hopper (via PTX) only. Volta + Ampere compatible binaries will be uploaded shortly. Just make sure you've got a recent xformers installed or install it after (xformers doesn't have a hard dependency on this but all the better CUDA attention algorithms are here), and "pip install" the correct wheel for your python version.

That's it. I see a ~2it/s improvement at 1024x1024 on regular SD1.5 models, but that jumps to a 50-60% speed improvement at 2048x2048. At normal generation sizes like 512x512 it'll do a little for you but not as much. More importantly it's much more memory efficient than the pytorch 2 version it seems.

Edit: Fixed the descriptive naming; I'd renamed after building, there's a good chance pip will reject them with a nonsensical "bad filename" error message if they're not the same as they were when built. Stupid but whatever.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Flash Attention 2 - Torch2.1.1+cu121 Ada Lovelace and Hopper Wheels for Python 3.10 and 3.11