Skip to content

Release v0.2.1

Choose a tag to compare

@zhe-pang zhe-pang released this 23 May 05:36
· 3 commits to main since this release

Highlights

This release expands FMHA compatibility, improves FP8 output support, adds MODEL1(DS V4) support for DSA, optimizes GDN performance, and introduces new mHC and DeepGEMM MQA Logits capabilities.

Starting from this release, wrapper versions are strictly aligned with the MATE version to prevent incompatible package combinations.

What's Changed

FMHA Updates

Added compatibility support for additional FMHA features:

  • RoPE
  • k_leftpad
  • kv_batch_idx

FP8 SageAttention & FP8 DenseGEMM

Added support for FP8 output and quantization scale outputs.

  • Added FP8 output support.
  • Added quant scale output support.

DeepSeek Sparse Attention

Added MODEL1(DS V4) support for DeepSeek Sparse Attention.

  • Added DSA Prefill support for MODEL1(DS V4).
  • Added DSA Decode support for MODEL1(DS V4).

GDN Updates

Improved GDN performance and expanded Decode capability.

  • Optimized Prefill performance.
  • Optimized Decode performance.
  • Added MTP support for Decode.

mHC

Added support for TF32 mHC pre-norm.

DeepGEMM MQA Logits

Improved paged MQA Logits support for larger batch sizes.

  • Paged MQA Logits now supports larger batch sizes.
  • The maximum supported batch size is only limited by shared memory capacity.

Wrapper Updates

Starting from v0.2.1, wrapper versions are strictly aligned with the MATE version to avoid incompatible package combinations.

  • Use mate check to verify wrapper consistency, including version and commit information.

DeepGEMM Wrapper

Added new interfaces:

  • mHC
  • bf16_gemm_nt

FlashMLA Wrapper

Added support for MODEL1(DS V4) related input arguments.

Bug Fixes

Fixed the following issues:

  • Fixed NaN outputs in Fused MoE Gate under certain scenarios.
  • Fixed IMA issues in MQA Logits.
  • Fixed incorrect FA3 backend selection for Softcap scenarios.