Release v0.21.0 · dphnAI/sonar

What's Changed

build: support python 3.14 by @AlpinDale in #1636
fix: GLM-5.1 on ROCm by @AlpinDale in #1637
fix: replica selection bias in fusedmoe router by @AlpinDale in #1638
fix: respect TORCH_COMPILE_DISABLE env var for torch 2.12 by @AlpinDale in #1639
chore: remove dead code from worker by @AlpinDale in #1640
feat: warmup readonly mm processor during renderer startup by @AlpinDale in #1641
fix: GPU memory leaks in engine shutdown for rocm by @AlpinDale in #1642
chore: optimize deepstack buffer handling for MM Qwen3 models by @AlpinDale in #1643
feat: support kv offload storing with multiple KV groups by @AlpinDale in #1644
feat: add perf benchmark script by @AlpinDale in #1645
fix: only unpad routed output before shared expert add by @AlpinDale in #1646
fix: DSML token leakage in DeepSeek-V4 and 3.2 by @AlpinDale in #1647
fix: size the MNNVL workspace for flashinfer to EP group by @AlpinDale in #1648
fix: offload all KV blocks when doing prefill in P/D by @AlpinDale in #1649
fix: disable sequence parallelism for piecewise compilation by @AlpinDale in #1650
feat: implement DeepSeek-V4 model by @AlpinDale in #1651
perf: EXL3 performance tuning on GeForce Blackwell by @AlpinDale in #1652
fix: TRT-LLM MXFP4 MoE compile for DeepSeek-V4 by @AlpinDale in #1653
fix: normalize nested args in DeepSeek DSML by @AlpinDale in #1654
perf: exl3 decode kernel optimization experiments by @AlpinDale in #1655
perf: exl3 optims with guarded MoE down tuning by @AlpinDale in #1656
fix: auto-disable expandable_segments around cumem memory pool by @AlpinDale in #1657
fix: rejection sampling acceptance rate in MRv2 by @AlpinDale in #1658
fix: cap SWA/chunked-local runtime admission to startup pool-sizing bound by @AlpinDale in #1659
feat: FP8 ViT Attention w/ FlashInfer by @AlpinDale in #1660
chore: share dequant buffers in TurboQuant to save memory by @AlpinDale in #1661
fix: remove invalid deepstack boundary check for Qwen3-VL by @AlpinDale in #1664
feat: add silu clamp limit to shared expert for DeepSeek-V4 by @AlpinDale in #1665
chore: sync to upstream 985961345a13f3e3bb15d29c94b011ba9a6b858b by @AlpinDale in #1666

Full Changelog: v0.20.0...v0.21.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

v0.21.0

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

What's Changed

Contributors

Uh oh!