What's Changed
- build: support python 3.14 by @AlpinDale in #1636
- fix: GLM-5.1 on ROCm by @AlpinDale in #1637
- fix: replica selection bias in fusedmoe router by @AlpinDale in #1638
- fix: respect
TORCH_COMPILE_DISABLEenv var for torch 2.12 by @AlpinDale in #1639 - chore: remove dead code from worker by @AlpinDale in #1640
- feat: warmup readonly mm processor during renderer startup by @AlpinDale in #1641
- fix: GPU memory leaks in engine shutdown for rocm by @AlpinDale in #1642
- chore: optimize deepstack buffer handling for MM Qwen3 models by @AlpinDale in #1643
- feat: support kv offload storing with multiple KV groups by @AlpinDale in #1644
- feat: add perf benchmark script by @AlpinDale in #1645
- fix: only unpad routed output before shared expert add by @AlpinDale in #1646
- fix: DSML token leakage in DeepSeek-V4 and 3.2 by @AlpinDale in #1647
- fix: size the MNNVL workspace for flashinfer to EP group by @AlpinDale in #1648
- fix: offload all KV blocks when doing prefill in P/D by @AlpinDale in #1649
- fix: disable sequence parallelism for piecewise compilation by @AlpinDale in #1650
- feat: implement DeepSeek-V4 model by @AlpinDale in #1651
- perf: EXL3 performance tuning on GeForce Blackwell by @AlpinDale in #1652
- fix: TRT-LLM MXFP4 MoE compile for DeepSeek-V4 by @AlpinDale in #1653
- fix: normalize nested args in DeepSeek DSML by @AlpinDale in #1654
- perf: exl3 decode kernel optimization experiments by @AlpinDale in #1655
- perf: exl3 optims with guarded MoE down tuning by @AlpinDale in #1656
- fix: auto-disable
expandable_segmentsaround cumem memory pool by @AlpinDale in #1657 - fix: rejection sampling acceptance rate in MRv2 by @AlpinDale in #1658
- fix: cap SWA/chunked-local runtime admission to startup pool-sizing bound by @AlpinDale in #1659
- feat: FP8 ViT Attention w/ FlashInfer by @AlpinDale in #1660
- chore: share dequant buffers in TurboQuant to save memory by @AlpinDale in #1661
- fix: remove invalid deepstack boundary check for Qwen3-VL by @AlpinDale in #1664
- feat: add silu clamp limit to shared expert for DeepSeek-V4 by @AlpinDale in #1665
- chore: sync to upstream 985961345a13f3e3bb15d29c94b011ba9a6b858b by @AlpinDale in #1666
Full Changelog: v0.20.0...v0.21.0