Release Release v0.2.3 · MooreThreads/mate

Highlights

This release improves documentation, expands FMHA capabilities with QV and FP8 support, introduces a TileLang-based FMHA backward implementation for large head dimensions, and adds new debugging and attention components including Guard Allocator and FlashKDA. It also delivers performance improvements across Paged MQA Logits and GDN Decode.

What's Changed

Documentation

Improved MATE documentation with clearer usage guides and tutorials.

Enhanced documentation structure and usability.
Added more comprehensive tutorials and examples.

FMHA Updates

Expanded FMHA functionality and improved runtime performance.

FMHA Forward:

Added QV support.
Added FP8 support.
- FP8 performance optimizations require an upcoming compiler release.
Improved workload balancing and partitioning in selected scenarios.

FMHA Backward:

Added a TileLang-based implementation for HeadDim 256-256.

DeepGEMM Updates

Added new DeepGEMM implementations and improved performance.

Added MUTLASS-based FP8 DeepGEMM implementation.
Added MUTLASS-based BF16 DeepGEMM implementation.
Improved Paged MQA Logits performance.

GDN Updates

Improved GDN Decode performance.

Memory Debugging

Added Guard Allocator for debugging memory-related issues.

Helps identify and diagnose illegal memory access problems.
Intended for debugging and validation workflows.

KDA Support

Added KDA Prefill support.

Introduced the KDA Prefill interface.
Added the FlashKDA wrapper for easier integration and adoption.

Bug Fixes

Fixed the following issues:

Fixed an inconsistency between DeepGEMM's default get_alignment behavior and API input parameters.
Fixed incorrect robust descriptor configuration in the FA assembly backend.
Fixed stride overflow issues in the FA assembly backend.
Fixed performance regressions in DSA under certain scenarios.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Release v0.2.3

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Highlights

What's Changed

Documentation

FMHA Updates

DeepGEMM Updates

GDN Updates

Memory Debugging

KDA Support

Bug Fixes

Uh oh!