Releases: ROCm/aotriton
AOTriton 0.8.2 Beta
0.8.2b is an emergency fix
We received reports there is a bug causing NaN during fine-tuning llama3.2-11b vision model. Anyone who uses 0.8 is recommended to upgrade to 0.8.2b.
Note: AOTriton 0.8.1b adds head dimension 512 support and thus the binary size increases compared with 0.8b
This is a point release
0.8.2b can be used as drop-in replacement of 0.7b shared object file.
What's Changed
- Fix missing batch_index when calculating bias pointer by @xinyazhang in #68
Full Changelog: 0.8.1b...0.8.2b
AOTriton 0.8.1 Beta
What's Changed
- Support Head Dimension 512 by @xinyazhang in #67
Note: this not recommended unless you need to support head dimension 512 immediately.
Full Changelog: 0.8b...0.8.1b
AOTriton 0.8 Beta
What's Changed
- Add PyTorch compatibility matrix to README.md by @xinyazhang in #41
- Add cmake option AOTRITON_NAME_SUFFIX to resolve name conflicts by @xinyazhang in #42
- Merge improvements of 0.7.1b release into main by @xinyazhang in #46
- Code Clean Up by @xinyazhang in #48
- GQA Support by @xinyazhang in #49
- Kernel Storage V2 by @xinyazhang in #50
- Add docker based package builder and switch to system compiler by @xinyazhang in #51
- Add versioning support in multiple levels. by @xinyazhang in #53
- Restore the support of causal=True and seqlen_q != seqlen_k by @xinyazhang in #55
- Misc changes and performance tuning for 0.8b release by @xinyazhang in #57
Full Changelog: 0.7b...0.8b
AOTriton 0.7.3 Beta
0.7.3b is an emergency fix for 0.7.2b
0.7.2b has been removed from release due to a correctness bug.
This is a point release
0.7.3b can be used as drop-in replacement of 0.7b or 0.7.xb shared object file.
What's Changed (Compared with 0.7.1b)
- Fix varlen related implementation errors
- Fix NaN output when
sm_scale=0.0
, which is introduced #45 - Fix NaN output for large numerical errors. See #54 for more details.
- The fix in 0.7.2b introduced a bug. 0.7.3b is released to revise this fix.
Note the two fixes for NaN may have some performance impact.
Full Changelog: 0.7.1b...0.7.3b
(DO NOT USE) AOTriton 0.7.2 Beta
CAVEAT: DO NOT USE THIS RELEASE
Commit 14d673f introduced a bug.
We are going to release 0.7.3 instead for a fix.
The binary tarballs are deleted to prevent accidental usages.
This is a point release
0.7.2b can be used as drop-in replacement of 0.7b or 0.7.1b shared object file.
What's Changed
Fix varlen related implementation errorsFix NaN output whensm_scale=0.0
, which is introduced #45Fix NaN output for large numerical errors. See #54 for more details.
Note the two fixes for NaN may have some performance impact.
Full Changelog: 0.7.1b...0.7.2b
AOTriton 0.7.1 Beta
This is a point release
0.7.1b can be used as drop-in replacement of 0.7b shared object file.
What's Changed
- Ignore colon suffixes in gcnArchName by @xinyazhang in #44
- FA Kernel Update for Accuracy and Performance by @xinyazhang in #45
Full Changelog: 0.7b...0.7.1b
AOTriton 0.7 Beta
What's Changed
- Default to Shared Object by @jithunnair-amd in #33
- Add varlen support to AOTriton's Flash Attention by @xinyazhang in #31
- Switch to upstream Triton compiler, and related changes by @xinyazhang in #36
- Improve Backward Performance and Experimental Navi31 Support by @xinyazhang in #39
- Introduce new tuning system based on pre-compiled GPU kernels
- Navi 31's support is still experimental
- Support hipGraph usage in PyTorch by @xinyazhang in #40
- This changes the RNG API used by FA kernels.
- Switch to new testing scheme to match PyTorch 2.5's changes
New Contributors
- @jithunnair-amd made their first contribution in #33
Full Changelog: 0.6b...0.7b
Preview 2 of 0.7b
The tuning database for Preview 1 was generated with newer triton kernel which does not use block pointer anymore. However Preview 1 does not include those changes.
Preview 2 was created to fix this.
Preview 1 of 0.7b
Preview 1 of 0.7b.
What's Changed
- Switch to Triton upstream compiler
Improved backward kernel performance with better tuning databaseDidn't fully accomplish this, check Preview 2 for this feature- Add Navi31 support
- Default to AOTRITON_COMPRESS_KERNEL=ON
- Requires zstd as runtime dependency
Known problems
- No Navi32 support
- Lack of changes, especially ABI breaks to the library, that enable the generation the tuning_database.sqlite3 shipped in the preview version.
AOTriton 0.4.2 Beta
Manylinux2_28 updates to 0.4.1b