Release v0.2.2
Highlights
This release introduces FMHA local attention with chunk support, enhances GDN Decode capabilities with state indices and MTP, optimizes mHC prenorm GEMM stability and performance, adds a new Hash TopK operator, and includes critical bug fixes for DSA, MLA, and FA3.
Note: The new FMHA local with chunk feature requires SDK 5.1.0 or later.
What's Changed
FMHA Updates
- Added support for local attention with chunk functionality (FA3 API
attention_chunkparameter).- Requires SDK 5.1.0+ compiler.
GDN Updates
- Added support for state indices and MTP in GDN Decode.
- Optimized GDN Prefill performance.
mHC Updates
- Improved stability and performance of mHC prenorm GEMM.
- Added new mHC pre-big-fuse interface.
New Operators
- Added Hash TopK operator.
Bug Fixes
- Fixed redundant JIT compilation issue for DSA in long-context scenarios.
- Fixed illegal memory address issue in MLA under certain scenarios.
- Fixed floating-point exception in FA3 host when
seqlen_k=0.