Skip to content

Release v0.1.3

Choose a tag to compare

@zhe-pang zhe-pang released this 23 May 05:29
· 5 commits to main since this release

Highlights

This release improves compatibility across FMHA Forward, DeepGEMM, extensions, wrappers, and CLI tooling. It expands FlashAttention 3 scenario coverage, adds more DeepGEMM API support, introduces additional MoE Fused Gate configurations, and provides new debugging utilities through the mate CLI.

What's Changed

FMHA Forward Compatibility

Enhanced FMHA Forward compatibility for broader FlashAttention 3 scenarios.

Supported QKV input modes:

  • Normal
  • Ragged
  • Padded
  • Paged — KV only

Supported mask modes:

  • None
  • Causal
  • Local
  • Local w/ sink

Supported score modes:

  • None
  • Softcap

Supported configurations:

  • PageSize: arbitrary page size is supported; 64 is recommended.
  • DataType: bf16, fp16.
  • HeadDim: arbitrary head dimension up to 512.

Optimization knobs:

  • SplitKV
  • PackGQA
  • SchedulerMetadata

Additional compatibility:

  • ContextParallel: compatible with VLLM-style usage.
  • Compile: JIT enabled.

DeepGEMM Compatibility

Enhanced DeepGEMM compatibility with additional API and edge-case support.

Added support for:

  • m_grouped_bf16_gemm_nt_* APIs
  • m_grouped_fp8_gemm_nt_* APIs
  • k_grouped_fp8_gemm_tn_contiguous
  • FP8 MQA Logits Prefill / Decode
  • NextN=4 scenarios for Decode
  • m/n/k = 0 edge cases

Extensions

Added more MoE Fused Gate expert configurations:

  • 160 experts
  • 384 experts
  • 256 experts with 1 group

Wrappers

Added compatibility wrappers to simplify migration and integration.

mate-deep-gemm

  • Compatible with the deep-gemm import style.
  • Compatible with existing DeepGEMM API usage patterns.

mate-flash-attention

  • Compatible with the flash-attention3 import style.
  • Compatible with FlashAttention 3 API usage patterns.
  • Extended compatibility for SGL / VLLM FA3 fork usage patterns.

MATE CLI

Introduced the mate CLI for environment inspection and debugging.

New commands:

  • mate show-config

    • Displays environment status, commit ID, and related runtime/build information.
  • mate env

    • Displays available MATE-related environment variables.

Debugging improvements:

  • Added new environment variables for dumping input/output data during debugging.

For more details, please refer to the repository documentation and mate --help.