What's Changed
- feat(cuda): support CUDA 13.3 by @EricLBuehler in #2275
- chore: bump candle dep by @EricLBuehler in #2276
- feat(cli): improve --quant and --isq docs in cli by @EricLBuehler in #2277
- feat(gemma4): do not load projections for shared kv layers by @EricLBuehler in #2281
- feat(quant): add isq executor and planning by @EricLBuehler in #2283
- feat: add Hunyuan v1 dense and MoE support by @ASheng1019 in #2268
- feat(install): fix handling when there are preexisting installs by @EricLBuehler in #2284
- feat(gdn): add isq support by @EricLBuehler in #2285
- Fix reversed FCFS priority in PagedAttentionScheduler preemption by @pjdurden in #2250
- Validate GGUF special token ids against vocab to prevent OOB panic by @pjdurden in #2282
New Contributors
- @ASheng1019 made their first contribution in #2268
- @pjdurden made their first contribution in #2250
Full Changelog: v0.8.22...v0.8.23