Release Release v0.2.1 · MooreThreads/mate

Highlights

This release expands FMHA compatibility, improves FP8 output support, adds MODEL1(DS V4) support for DSA, optimizes GDN performance, and introduces new mHC and DeepGEMM MQA Logits capabilities.

Starting from this release, wrapper versions are strictly aligned with the MATE version to prevent incompatible package combinations.

What's Changed

FMHA Updates

Added compatibility support for additional FMHA features:

RoPE
k_leftpad
kv_batch_idx

FP8 SageAttention & FP8 DenseGEMM

Added support for FP8 output and quantization scale outputs.

Added FP8 output support.
Added quant scale output support.

DeepSeek Sparse Attention

Added MODEL1(DS V4) support for DeepSeek Sparse Attention.

Added DSA Prefill support for MODEL1(DS V4).
Added DSA Decode support for MODEL1(DS V4).

GDN Updates

Improved GDN performance and expanded Decode capability.

Optimized Prefill performance.
Optimized Decode performance.
Added MTP support for Decode.

mHC

Added support for TF32 mHC pre-norm.

DeepGEMM MQA Logits

Improved paged MQA Logits support for larger batch sizes.

Paged MQA Logits now supports larger batch sizes.
The maximum supported batch size is only limited by shared memory capacity.

Wrapper Updates

Starting from v0.2.1, wrapper versions are strictly aligned with the MATE version to avoid incompatible package combinations.

Use mate check to verify wrapper consistency, including version and commit information.

DeepGEMM Wrapper

Added new interfaces:

mHC
bf16_gemm_nt

FlashMLA Wrapper

Added support for MODEL1(DS V4) related input arguments.

Bug Fixes

Fixed the following issues:

Fixed NaN outputs in Fused MoE Gate under certain scenarios.
Fixed IMA issues in MQA Logits.
Fixed incorrect FA3 backend selection for Softcap scenarios.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Release v0.2.1

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Highlights

What's Changed

FMHA Updates

FP8 SageAttention & FP8 DenseGEMM

DeepSeek Sparse Attention

GDN Updates

mHC

DeepGEMM MQA Logits

Wrapper Updates

DeepGEMM Wrapper

FlashMLA Wrapper

Bug Fixes

Uh oh!