AOTriton 0.4 Beta
Summary
This is the first release which is considered sufficiently stable for production.
Features
- Implement Flash Attention v2 Algorithm on MI200/MI300
- Implemented most features required by PyTorch's mha_fwd and mha_bwd
- Missing feature:
window_size_leftandwindow_size_right - API can be found at include/aotriton/flash.h