v0.3.0

bevsxyz released this 14 May 13:18

· 139 commits to main since this release

e81d3ee

🎯 WhisperForge 0.3.0: Quantization & Performance

🚀 Major Features

Phase C: INT8 Post-Training Quantization ✅

4× model size reduction: Tiny (150 MB → 37 MB), ideal for edge deployment
--quantize int8 flag in model converter
Transparent loading—no code changes required
Full precision (FP32) and quantized (INT8) models interoperable

Phase B.5: GPU-Accelerated Mel Spectrogram ✅

CubeCL DFT kernel for GPU mel filterbank matmul
--features cubecl-stft enables GPU STFT pipeline
Faster audio preprocessing on WGPU backend

Burn 0.21 & burn-flex Migration

Latest Burn with improved numerical stability
CPU fallback (burn-flex) seamlessly handles CPU inference
Better WGPU runtime integration

📊 What's Changed

Quantized models fully compatible with CLI and library API
Streaming audio pipeline (Phase B) now fully integrated
Fixed EOT suppression at step 0 for robustness
Improved error handling across all crates

📦 All 5 Crates Published

whisperforge-core v0.3.0 — library
whisperforge-cli v0.3.0 — binary
whisperforge-convert v0.3.0 — model converter
whisperforge-align v0.3.0 — VAD + SRT
whisperforge-diarize v0.3.0 — speaker diarization

🛠️ Dependency Updates

Simplified workspace dependency management: inter-crate deps now use workspace version automatically.

📖 Next Phase

Phase D: WASM Target—browser-native speech-to-text with wasm-bindgen.

Full Changelog: v0.2.0...v0.3.0

Assets 2