Skip to content

Release v0.2.3

Latest

Choose a tag to compare

@zhe-pang zhe-pang released this 23 Jun 02:31
· 1 commit to main since this release

Highlights

This release improves documentation, expands FMHA capabilities with QV and FP8 support, introduces a TileLang-based FMHA backward implementation for large head dimensions, and adds new debugging and attention components including Guard Allocator and FlashKDA. It also delivers performance improvements across Paged MQA Logits and GDN Decode.

What's Changed

Documentation

Improved MATE documentation with clearer usage guides and tutorials.

  • Enhanced documentation structure and usability.
  • Added more comprehensive tutorials and examples.

FMHA Updates

Expanded FMHA functionality and improved runtime performance.

FMHA Forward:

  • Added QV support.
  • Added FP8 support.
    • FP8 performance optimizations require an upcoming compiler release.
  • Improved workload balancing and partitioning in selected scenarios.

FMHA Backward:

  • Added a TileLang-based implementation for HeadDim 256-256.

DeepGEMM Updates

Added new DeepGEMM implementations and improved performance.

  • Added MUTLASS-based FP8 DeepGEMM implementation.
  • Added MUTLASS-based BF16 DeepGEMM implementation.
  • Improved Paged MQA Logits performance.

GDN Updates

  • Improved GDN Decode performance.

Memory Debugging

Added Guard Allocator for debugging memory-related issues.

  • Helps identify and diagnose illegal memory access problems.
  • Intended for debugging and validation workflows.

KDA Support

Added KDA Prefill support.

  • Introduced the KDA Prefill interface.
  • Added the FlashKDA wrapper for easier integration and adoption.

Bug Fixes

Fixed the following issues:

  • Fixed an inconsistency between DeepGEMM's default get_alignment behavior and API input parameters.
  • Fixed incorrect robust descriptor configuration in the FA assembly backend.
  • Fixed stride overflow issues in the FA assembly backend.
  • Fixed performance regressions in DSA under certain scenarios.