v0.1.0 — Initial release
Pre-release
Pre-release
First public release of inferaived, an LLM inference engine written in Rust on top of wgpu.
Status: early / experimental. APIs will change. Expect rough edges.
Highlights
- GPU-resident inference on
wgpu29 — runs on Vulkan, Metal, DX12, and browser WebGPU; no CUDA or external runtime required. - Supported models: Qwen3.5 and MiniCPM5 (including MiniCPM5's parallel hybrid layer stack). Weights load directly from Hugging Face
safetensors. - Custom WGSL kernels: matmul, RoPE, RMSNorm, masked attention, mamba scan, delta rule, and samplers.
- GPU KV cache with a continuous decode loop and argmax sampler.
- Runnable examples:
generate,chat_qwen35,chat_minicpm5,parallel_minicpm5,bench_decode.
Install
[dependencies]
inferaived = "0.1"Or:
cargo add inferaivedLinks
- Crate: https://crates.io/crates/inferaived
- Docs: https://docs.rs/inferaived/0.1.0
- Changelog: CHANGELOG.md
License
MIT