v0.8.4
Highlights
- OpenAI-compatible local agent platform. Full support for OpenAI skills (/v1/skills), the complete Files API (/v1/files + file inputs and agent-produced outputs), and a shell tool (#2230, #2229)
- Prebuilt binaries + one-line install.
pip install mistralrsand the install scripts now download prebuilt binaries (e.g, CUDA across all supported compute caps, Metal, CPU, x86 and aarch64), including multi-arch Docker images and per-SM Python wheels. No more compiling from source! (#2218, #2220, #2221) - Anthropic API support. (#2182)
- Online calibration for K-quants. (#2203)
- New & improved models. Gemma 4 (incl. 12B), DiffusionGemma / block-diffusion models, and Qwen 3.5 perf + tool-calling improvements. (#2191, #2209, #2196)
- CUDA performance. Improved CUDA graphs, paged flash-attention kernels, BF16 CUTLASS 2.x MoE, broader tensor-parallel sizes (#2197, #2202)
- Prometheus /metrics endpoint (#2189)
What's Changed
- feat(cuda): improve cuda p2p backend and docs by @EricLBuehler in #2181
- Fix corner chunking case for paged attn SWA prefill by @EricLBuehler in #2183
- feat(server): support Anthropic API by @EricLBuehler in #2182
- feat(cuda): allow more general TP sizes by @EricLBuehler in #2184
- fix(metal): enable bf16 GDN kernel compilation by @EricLBuehler in #2185
- fix(engine): handle SendError when client disconnects during error reporting by @yussypu in #2170
- docs(bench): trim outdated mistralrs-bench README by @fiorelorenzo in #2157
- fix(windows): Fix windows builds and cuDNN discovery by @sobrinth in #2178
- feat(build): emit rerun-if-env-changed for build-time env vars by @EricLBuehler in #2186
- fix(docs): fix build features on docs.rs builds by @EricLBuehler in #2187
- fix(core): avoid raw reasoning fallback in streaming by @EricLBuehler in #2188
- perf(gguf): batch add_special_tokens to drop O(N^2) tokenizer load by @fiorelorenzo in #2177
- build(mistralrs-quant): harden Metal kernel build against silent failures by @fiorelorenzo in #2176
- feat: add Prometheus /metrics endpoint by @MicheleCampi in #2189
- fix(metal): register PR #2166 kernels in runtime-compile path by @ljchang in #2169
- feat(models): support Gemma 4 12B by @EricLBuehler in #2191
- feat(core): improve cuda graph implementation and fixes w/ multimodal edge cases by @EricLBuehler in #2195
- feat(qwen3.5): perf improvements on cuda, handle tool calling format by @EricLBuehler in #2196
- feat(uqff): new revision with improved ser/de functionality and storage by @EricLBuehler in #2199
- feat(cuda): improve cuda graph performance, add paged flash attn kernels by @EricLBuehler in #2197
- feat(quant): support isq/uqff/imatrix machinery for broader model classes by @EricLBuehler in #2200
- Add support for CUDA BF16 CUTLASS 2.x MoE kernels by @EricLBuehler in #2202
- feat(metal): fix sdpa offset handling, remove f32 upcast by @EricLBuehler in #2205
- feat(models): integrate DiffusionGemma and integrate block-diffusion models by @EricLBuehler in #2209
- feat(core): support online calibration for K-quants by @EricLBuehler in #2203
- feat(docs): improve quality and examples reachability by @EricLBuehler in #2216
- feat(release): prebuilt binaries with download-first install by @EricLBuehler in #2218
- feat(release): update minimum compute cap support by @EricLBuehler in #2219
- feat(ci): add python wheels, aarch64, to release artifact generation by @EricLBuehler in #2220
- feat(ci): Switch to Namespace CI for release matrix by @EricLBuehler in #2221
- feat(cli): ensure docs reflect toml kind key as not required by @EricLBuehler in #2222
- ci(docs): check generated docs artifacts by @EricLBuehler in #2223
- ci(release): add dry run by @EricLBuehler in #2224
- fix(core): propogate errors during engine creation by @EricLBuehler in #2226
- chore(deps): bump esbuild, @sveltejs/vite-plugin-svelte and vite in /mistralrs-cli/webui by @dependabot[bot] in #2228
- feat(diffusiongemma): minor fixes and improvements by @EricLBuehler in #2227
- feat(server): better compatability with newer openai api updates by @EricLBuehler in #2229
- feat(server): support openai /v1/skills, /v1/files and shell tool by @EricLBuehler in #2230
New Contributors
- @yussypu made their first contribution in #2170
- @fiorelorenzo made their first contribution in #2157
- @sobrinth made their first contribution in #2178
- @MicheleCampi made their first contribution in #2189
Full Changelog: v0.8.3...v0.8.4