v0.4.0
v0.4.0: Phase 5 — Inference Optimization & Compile-Time Moat Features
Milestones M41, M42, M44 complete (M36, M37 shipped in v0.3.0).
New in v0.4.0:
- M41: Disaggregated inference (prefill/decode worker separation, KV transfer, router scheduling)
- M42: KV-cache compression (INT8/INT4/FP8 quantization, sliding window, H2O eviction)
- M44: Constrained decoding (compiled FSM, JSON Schema/BNF grammars, token-level DFA, logit masking)
Full Changelog: v0.3.0...v0.4.0