v0.13.1
v0.13.1 (2026-04-23)
This release is published under the Apache-2.0 License.
Bug Fixes
-
Correct GQA/MQA head expansion logic for Q and K (
62b73bd) -
Introduce RoutingResult and fix probability renormalization (
a1dabe3) -
Remove GPU-CPU sync that was happening during bincount() call in MoE layer (
534facc) -
Remove MoE statistics interface from qwen3 decoder layer and module (
99e0379)
Detailed Changes: v0.13.0...v0.13.1