Skip to content

Releases: ROCm/aotriton

AOTriton 0.8.2 Beta

23 Jan 17:58
b24f43a
Compare
Choose a tag to compare

0.8.2b is an emergency fix

We received reports there is a bug causing NaN during fine-tuning llama3.2-11b vision model. Anyone who uses 0.8 is recommended to upgrade to 0.8.2b.

Note: AOTriton 0.8.1b adds head dimension 512 support and thus the binary size increases compared with 0.8b

This is a point release

0.8.2b can be used as drop-in replacement of 0.7b shared object file.

What's Changed

  • Fix missing batch_index when calculating bias pointer by @xinyazhang in #68

Full Changelog: 0.8.1b...0.8.2b

AOTriton 0.8.1 Beta

14 Jan 16:54
3a80554
Compare
Choose a tag to compare

What's Changed

Note: this not recommended unless you need to support head dimension 512 immediately.

Full Changelog: 0.8b...0.8.1b

AOTriton 0.8 Beta

26 Nov 19:42
6f8cbca
Compare
Choose a tag to compare

What's Changed

Full Changelog: 0.7b...0.8b

AOTriton 0.7.3 Beta

20 Nov 00:56
Compare
Choose a tag to compare

0.7.3b is an emergency fix for 0.7.2b

0.7.2b has been removed from release due to a correctness bug.

This is a point release

0.7.3b can be used as drop-in replacement of 0.7b or 0.7.xb shared object file.

What's Changed (Compared with 0.7.1b)

  • Fix varlen related implementation errors
  • Fix NaN output when sm_scale=0.0, which is introduced #45
  • Fix NaN output for large numerical errors. See #54 for more details.
    • The fix in 0.7.2b introduced a bug. 0.7.3b is released to revise this fix.

Note the two fixes for NaN may have some performance impact.

Full Changelog: 0.7.1b...0.7.3b

(DO NOT USE) AOTriton 0.7.2 Beta

08 Nov 20:32
Compare
Choose a tag to compare

CAVEAT: DO NOT USE THIS RELEASE

Commit 14d673f introduced a bug.
We are going to release 0.7.3 instead for a fix.

The binary tarballs are deleted to prevent accidental usages.

This is a point release

0.7.2b can be used as drop-in replacement of 0.7b or 0.7.1b shared object file.

What's Changed

  • Fix varlen related implementation errors
  • Fix NaN output when sm_scale=0.0, which is introduced #45
  • Fix NaN output for large numerical errors. See #54 for more details.

Note the two fixes for NaN may have some performance impact.

Full Changelog: 0.7.1b...0.7.2b

AOTriton 0.7.1 Beta

04 Oct 21:50
f6b28a9
Compare
Choose a tag to compare

This is a point release

0.7.1b can be used as drop-in replacement of 0.7b shared object file.

What's Changed

Full Changelog: 0.7b...0.7.1b

AOTriton 0.7 Beta

23 Aug 16:19
9be0406
Compare
Choose a tag to compare

What's Changed

  • Default to Shared Object by @jithunnair-amd in #33
  • Add varlen support to AOTriton's Flash Attention by @xinyazhang in #31
  • Switch to upstream Triton compiler, and related changes by @xinyazhang in #36
  • Improve Backward Performance and Experimental Navi31 Support by @xinyazhang in #39
    • Introduce new tuning system based on pre-compiled GPU kernels
    • Navi 31's support is still experimental
  • Support hipGraph usage in PyTorch by @xinyazhang in #40
    • This changes the RNG API used by FA kernels.
    • Switch to new testing scheme to match PyTorch 2.5's changes

New Contributors

Full Changelog: 0.6b...0.7b

Preview 2 of 0.7b

04 Aug 18:04
Compare
Choose a tag to compare

The tuning database for Preview 1 was generated with newer triton kernel which does not use block pointer anymore. However Preview 1 does not include those changes.

Preview 2 was created to fix this.

Preview 1 of 0.7b

04 Aug 08:42
Compare
Choose a tag to compare

Preview 1 of 0.7b.

What's Changed

  1. Switch to Triton upstream compiler
  2. Improved backward kernel performance with better tuning database Didn't fully accomplish this, check Preview 2 for this feature
  3. Add Navi31 support
  4. Default to AOTRITON_COMPRESS_KERNEL=ON
  5. Requires zstd as runtime dependency

Known problems

  1. No Navi32 support
  2. Lack of changes, especially ABI breaks to the library, that enable the generation the tuning_database.sqlite3 shipped in the preview version.

AOTriton 0.4.2 Beta

02 Aug 22:08
Compare
Choose a tag to compare

Manylinux2_28 updates to 0.4.1b