Release v0.8.2 · EricLBuehler/mistral.rs

What's Changed

fix(gemma4): tool calling improvements and attention mask overhaul by @EricLBuehler in #2059
feat(tools): mid-stream grammar enforcement for tool calls by @EricLBuehler in #2060
feat(tools): suppoort harmony tool call grammars by @EricLBuehler in #2061
feat(tools): add tool call strict mode by @EricLBuehler in #2062
feat(tools): add tool dispatch and agentic docs by @EricLBuehler in #2063
feat(docs): improved docs, guides, correctness by @EricLBuehler in #2065
fix(tools): fixes and cleanup for tool agentic by @EricLBuehler in #2069
fix(gemma4): tool call and masking fix by @EricLBuehler in #2073
feat(gemma4): 3.5-5.5x faster moe prefill for quantized cuda case by @EricLBuehler in #2077
feat(gemma4): ~10% faster moe decode through fused moe decode kernels by @EricLBuehler in #2080
feat(gemma4,cuda): optimized fused moe decode path by @EricLBuehler in #2090
fix(gemma4): no paged-attn cache cases by @EricLBuehler in #2091
Add fast CUDA MMVQ GGUF kernels by @EricLBuehler in #2104
Add fast CUDA MMQ GGUF kernels by @EricLBuehler in #2109
feat(core): code execution, file outputs and /v1/files, strict tool calling, new docs and ui by @EricLBuehler in #2130
Bump quinn-proto from 0.11.13 to 0.11.14 by @dependabot[bot] in #2012
chore(deps): bump tar from 0.4.44 to 0.4.45 by @dependabot[bot] in #2014
chore(deps): bump devalue from 5.7.1 to 5.8.1 in /docs by @dependabot[bot] in #2133
chore(deps): bump rustls-webpki from 0.103.9 to 0.103.13 by @dependabot[bot] in #2132
chore(deps-dev): bump svelte from 5.55.4 to 5.55.7 in /mistralrs-cli/webui by @dependabot[bot] in #2134
chore(deps): bump astro from 6.1.7 to 6.3.3 in /docs by @dependabot[bot] in #2136
chore(deps): bump rand from 0.9.2 to 0.9.3 by @dependabot[bot] in #2135
Bump candle to use new Metal input/output encoder tracking by @EricLBuehler in #2131
fix(cuda): support AFQ BF16 on sm_75 (#2092) by @atzenhofer in #2126
fix(gemma4): accept both expert_intermediate_size and moe_intermediate_size by @EricLBuehler in #2137
fix(quant): fail missing dummy layers outside uqff by @EricLBuehler in #2138
fix(install): run metal/xcode toolchain checks outside build_features by @EricLBuehler in #2139
feat(core, docs): remove automatic downsampling for videos and add install docs for ffmpeg by @EricLBuehler in #2140
feat(core): support HF_HUB_OFFLINE for loading pre-downloaded models fully offline by @EricLBuehler in #2141
fix(qwen3_embedding): attention mask handling for flash attn by @EricLBuehler in #2142
refactor(core): memory usage to handle discrete/unified systems better by @EricLBuehler in #2143
feat(agentic): remove latex autocorrect from python exec by @EricLBuehler in #2144
feat(agentic): add sandboxing for agentic code execution by @EricLBuehler in #2145
docs(sandbox): tweak for clarity in design page by @EricLBuehler in #2146
fix(core): use from_env for sandboxed apps by @setoelkahfi in #2064
feat(cli): add smart quantization and agentic presets by @EricLBuehler in #2152
feat(cli): add verbosity-controlled logging by @EricLBuehler in #2154
feat(agent): add app-driven tool approvals by @EricLBuehler in #2155
fix(agentic): add prefix to read and list file tools by @EricLBuehler in #2158
feat(gemma4): support MTP speculative decoding! by @EricLBuehler in #2159
feat(gemma4): optimize CUDA prompt and decode performance by @EricLBuehler in #2161
fix(cuda): support bf16 indexed moe input quantization by @EricLBuehler in #2162
feat(bench): improve benchmark sweeps by @EricLBuehler in #2163
feat(gemma4): further optimize CUDA MoE prefill and decode by @EricLBuehler in #2165
feat(metal): optimize Gemma 4 prefill and decode on Apple Silicon by @EricLBuehler in #2166
feat(gemma4): optimize metal MoE perf by @EricLBuehler in #2179
feat(cuda): implement cuda graphs and various optimizations by @EricLBuehler in #2180

New Contributors

@atzenhofer made their first contribution in #2126

Full Changelog: v0.8.0...v0.8.2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.8.2

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

What's Changed

New Contributors

Contributors

Uh oh!