Release v0.8.4 · EricLBuehler/mistral.rs

Highlights

OpenAI-compatible local agent platform. Full support for OpenAI skills (/v1/skills), the complete Files API (/v1/files + file inputs and agent-produced outputs), and a shell tool (#2230, #2229)
Prebuilt binaries + one-line install. pip install mistralrs and the install scripts now download prebuilt binaries (e.g, CUDA across all supported compute caps, Metal, CPU, x86 and aarch64), including multi-arch Docker images and per-SM Python wheels. No more compiling from source! (#2218, #2220, #2221)
Anthropic API support. (#2182)
Online calibration for K-quants. (#2203)
New & improved models. Gemma 4 (incl. 12B), DiffusionGemma / block-diffusion models, and Qwen 3.5 perf + tool-calling improvements. (#2191, #2209, #2196)
CUDA performance. Improved CUDA graphs, paged flash-attention kernels, BF16 CUTLASS 2.x MoE, broader tensor-parallel sizes (#2197, #2202)
Prometheus /metrics endpoint (#2189)

What's Changed

feat(cuda): improve cuda p2p backend and docs by @EricLBuehler in #2181
Fix corner chunking case for paged attn SWA prefill by @EricLBuehler in #2183
feat(server): support Anthropic API by @EricLBuehler in #2182
feat(cuda): allow more general TP sizes by @EricLBuehler in #2184
fix(metal): enable bf16 GDN kernel compilation by @EricLBuehler in #2185
fix(engine): handle SendError when client disconnects during error reporting by @yussypu in #2170
docs(bench): trim outdated mistralrs-bench README by @fiorelorenzo in #2157
fix(windows): Fix windows builds and cuDNN discovery by @sobrinth in #2178
feat(build): emit rerun-if-env-changed for build-time env vars by @EricLBuehler in #2186
fix(docs): fix build features on docs.rs builds by @EricLBuehler in #2187
fix(core): avoid raw reasoning fallback in streaming by @EricLBuehler in #2188
perf(gguf): batch add_special_tokens to drop O(N^2) tokenizer load by @fiorelorenzo in #2177
build(mistralrs-quant): harden Metal kernel build against silent failures by @fiorelorenzo in #2176
feat: add Prometheus /metrics endpoint by @MicheleCampi in #2189
fix(metal): register PR #2166 kernels in runtime-compile path by @ljchang in #2169
feat(models): support Gemma 4 12B by @EricLBuehler in #2191
feat(core): improve cuda graph implementation and fixes w/ multimodal edge cases by @EricLBuehler in #2195
feat(qwen3.5): perf improvements on cuda, handle tool calling format by @EricLBuehler in #2196
feat(uqff): new revision with improved ser/de functionality and storage by @EricLBuehler in #2199
feat(cuda): improve cuda graph performance, add paged flash attn kernels by @EricLBuehler in #2197
feat(quant): support isq/uqff/imatrix machinery for broader model classes by @EricLBuehler in #2200
Add support for CUDA BF16 CUTLASS 2.x MoE kernels by @EricLBuehler in #2202
feat(metal): fix sdpa offset handling, remove f32 upcast by @EricLBuehler in #2205
feat(models): integrate DiffusionGemma and integrate block-diffusion models by @EricLBuehler in #2209
feat(core): support online calibration for K-quants by @EricLBuehler in #2203
feat(docs): improve quality and examples reachability by @EricLBuehler in #2216
feat(release): prebuilt binaries with download-first install by @EricLBuehler in #2218
feat(release): update minimum compute cap support by @EricLBuehler in #2219
feat(ci): add python wheels, aarch64, to release artifact generation by @EricLBuehler in #2220
feat(ci): Switch to Namespace CI for release matrix by @EricLBuehler in #2221
feat(cli): ensure docs reflect toml kind key as not required by @EricLBuehler in #2222
ci(docs): check generated docs artifacts by @EricLBuehler in #2223
ci(release): add dry run by @EricLBuehler in #2224
fix(core): propogate errors during engine creation by @EricLBuehler in #2226
chore(deps): bump esbuild, @sveltejs/vite-plugin-svelte and vite in /mistralrs-cli/webui by @dependabot[bot] in #2228
feat(diffusiongemma): minor fixes and improvements by @EricLBuehler in #2227
feat(server): better compatability with newer openai api updates by @EricLBuehler in #2229
feat(server): support openai /v1/skills, /v1/files and shell tool by @EricLBuehler in #2230

New Contributors

@yussypu made their first contribution in #2170
@fiorelorenzo made their first contribution in #2157
@sobrinth made their first contribution in #2178
@MicheleCampi made their first contribution in #2189

Full Changelog: v0.8.3...v0.8.4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.8.4

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Highlights

What's Changed

New Contributors

Contributors

Uh oh!