Skip to content

v0.3.0

Choose a tag to compare

@bevsxyz bevsxyz released this 14 May 13:18
· 139 commits to main since this release

🎯 WhisperForge 0.3.0: Quantization & Performance

πŸš€ Major Features

Phase C: INT8 Post-Training Quantization βœ…

  • 4Γ— model size reduction: Tiny (150 MB β†’ 37 MB), ideal for edge deployment
  • --quantize int8 flag in model converter
  • Transparent loadingβ€”no code changes required
  • Full precision (FP32) and quantized (INT8) models interoperable

Phase B.5: GPU-Accelerated Mel Spectrogram βœ…

  • CubeCL DFT kernel for GPU mel filterbank matmul
  • --features cubecl-stft enables GPU STFT pipeline
  • Faster audio preprocessing on WGPU backend

Burn 0.21 & burn-flex Migration

  • Latest Burn with improved numerical stability
  • CPU fallback (burn-flex) seamlessly handles CPU inference
  • Better WGPU runtime integration

πŸ“Š What's Changed

  • Quantized models fully compatible with CLI and library API
  • Streaming audio pipeline (Phase B) now fully integrated
  • Fixed EOT suppression at step 0 for robustness
  • Improved error handling across all crates

πŸ“¦ All 5 Crates Published

  • whisperforge-core v0.3.0 β€” library
  • whisperforge-cli v0.3.0 β€” binary
  • whisperforge-convert v0.3.0 β€” model converter
  • whisperforge-align v0.3.0 β€” VAD + SRT
  • whisperforge-diarize v0.3.0 β€” speaker diarization

πŸ› οΈ Dependency Updates

Simplified workspace dependency management: inter-crate deps now use workspace version automatically.

πŸ“– Next Phase

Phase D: WASM Targetβ€”browser-native speech-to-text with wasm-bindgen.

Full Changelog: v0.2.0...v0.3.0