WhisperForge v0.4.0
0.4.0 — 2026-05-22
Breaking — Phase E: foundation refactor (crate merger + CLI UX + device selection).
Removed (breaking)
whisperforge-convertcrate removed — model conversion lives inwf convert.whisperforgebinary alias dropped — onlywfships.--cpuflag on the transcribe path — replaced by--device cpu.--taskflag on the transcribe path — parsed-then-runtime-errored before; less code, clearer help.--wgpuflag (vestigial; was already replaced by feature-driven dispatch).
Added
wfsubcommand surface:transcribe,convert,list-models. No more implicit-default subcommand — must specify one explicitly.wf list-models: tabulates.mpkmodels under the models directory with precision,n_audio_layer,n_mels, file size.--device <auto|cpu|wgpu|cuda>onwf transcribe.autoprefers CUDA → WGPU → CPU based on compiled-in features.- Native CUDA backend via optional
burn-cuda0.21.0 (opt-in--features cuda; requires CUDA toolkit at build time). --models-dir <PATH>andWF_MODELS_DIRenv var honored bytranscribeandlist-models.- Friendly missing-model error pointing at
wf list-modelsand the matchingwf convertinvocation.
Changed
whisperforge-clicrate renamed towhisperforge. Workspace shrinks 5 → 4 crates.- mise
release-checksmoke-tests--device cudaand the missing-model friendly error. - Project docs (CLAUDE.md + README + per-crate READMEs) updated for the new CLI surface; Roadmap marks Phase E ✅ COMPLETE and Phase D (WASM) ⏸ DEFERRED. Next: Phase F = streaming realtime.
Migration
| Before | After |
|---|---|
whisperforge-cli -a foo.wav -m tiny_en |
wf transcribe -a foo.wav -m tiny_en |
cargo run -p whisperforge-convert -- --output models/tiny_en |
wf convert --output models/tiny_en |
wf -a foo.wav --cpu |
wf transcribe -a foo.wav --device cpu |
wf -a foo.wav --wgpu |
wf transcribe -a foo.wav --device wgpu (or default auto) |
wf -a foo.wav --task translate |
(removed — never worked) |
Deferred (Phase E carve-outs, queued for later phases)
- VRAM-aware
--encoder-batch-sizeauto-tune. Users override with--encoder-batch-sizeuntil a workload justifies the heuristic cost. - Windows wgpu runtime fallback. Upstream
windows-crate conflict betweenwgpu-halandgpu-allocatoris compile-time; no runtime probe can rescue it. Windows binary still ships CPU-only.