Skip to content

v0.8.4

Choose a tag to compare

@EricLBuehler EricLBuehler released this 16 Jun 03:29
· 42 commits to master since this release
6f753eb

Highlights

  • OpenAI-compatible local agent platform. Full support for OpenAI skills (/v1/skills), the complete Files API (/v1/files + file inputs and agent-produced outputs), and a shell tool (#2230, #2229)
  • Prebuilt binaries + one-line install. pip install mistralrs and the install scripts now download prebuilt binaries (e.g, CUDA across all supported compute caps, Metal, CPU, x86 and aarch64), including multi-arch Docker images and per-SM Python wheels. No more compiling from source! (#2218, #2220, #2221)
  • Anthropic API support. (#2182)
  • Online calibration for K-quants. (#2203)
  • New & improved models. Gemma 4 (incl. 12B), DiffusionGemma / block-diffusion models, and Qwen 3.5 perf + tool-calling improvements. (#2191, #2209, #2196)
  • CUDA performance. Improved CUDA graphs, paged flash-attention kernels, BF16 CUTLASS 2.x MoE, broader tensor-parallel sizes (#2197, #2202)
  • Prometheus /metrics endpoint (#2189)

What's Changed

New Contributors

Full Changelog: v0.8.3...v0.8.4