Skip to content

Release v0.2.2

Choose a tag to compare

@zhe-pang zhe-pang released this 26 May 08:47
· 2 commits to main since this release

Highlights

This release introduces FMHA local attention with chunk support, enhances GDN Decode capabilities with state indices and MTP, optimizes mHC prenorm GEMM stability and performance, adds a new Hash TopK operator, and includes critical bug fixes for DSA, MLA, and FA3.

Note: The new FMHA local with chunk feature requires SDK 5.1.0 or later.

What's Changed

FMHA Updates

  • Added support for local attention with chunk functionality (FA3 API attention_chunk parameter).
    • Requires SDK 5.1.0+ compiler.

GDN Updates

  • Added support for state indices and MTP in GDN Decode.
  • Optimized GDN Prefill performance.

mHC Updates

  • Improved stability and performance of mHC prenorm GEMM.
  • Added new mHC pre-big-fuse interface.

New Operators

  • Added Hash TopK operator.

Bug Fixes

  • Fixed redundant JIT compilation issue for DSA in long-context scenarios.
  • Fixed illegal memory address issue in MLA under certain scenarios.
  • Fixed floating-point exception in FA3 host when seqlen_k=0.