Skip to content

v0.4.0

Choose a tag to compare

@bwiemz bwiemz released this 18 Mar 18:16
· 1909 commits to main since this release

v0.4.0: Phase 5 — Inference Optimization & Compile-Time Moat Features

Milestones M41, M42, M44 complete (M36, M37 shipped in v0.3.0).

New in v0.4.0:

  • M41: Disaggregated inference (prefill/decode worker separation, KV transfer, router scheduling)
  • M42: KV-cache compression (INT8/INT4/FP8 quantization, sliding window, H2O eviction)
  • M44: Constrained decoding (compiled FSM, JSON Schema/BNF grammars, token-level DFA, logit masking)

Full Changelog: v0.3.0...v0.4.0