Release WhisperForge v0.4.0 · bevsxyz/WhisperForge

0.4.0 — 2026-05-22

Breaking — Phase E: foundation refactor (crate merger + CLI UX + device selection).

whisperforge-convert crate removed — model conversion lives in wf convert.
whisperforge binary alias dropped — only wf ships.
--cpu flag on the transcribe path — replaced by --device cpu.
--task flag on the transcribe path — parsed-then-runtime-errored before; less code, clearer help.
--wgpu flag (vestigial; was already replaced by feature-driven dispatch).

wf subcommand surface: transcribe, convert, list-models. No more implicit-default subcommand — must specify one explicitly.
wf list-models: tabulates .mpk models under the models directory with precision, n_audio_layer, n_mels, file size.
--device <auto|cpu|wgpu|cuda> on wf transcribe. auto prefers CUDA → WGPU → CPU based on compiled-in features.
Native CUDA backend via optional burn-cuda 0.21.0 (opt-in --features cuda; requires CUDA toolkit at build time).
--models-dir <PATH> and WF_MODELS_DIR env var honored by transcribe and list-models.
Friendly missing-model error pointing at wf list-models and the matching wf convert invocation.

whisperforge-cli crate renamed to whisperforge. Workspace shrinks 5 → 4 crates.
mise release-check smoke-tests --device cuda and the missing-model friendly error.
Project docs (CLAUDE.md + README + per-crate READMEs) updated for the new CLI surface; Roadmap marks Phase E ✅ COMPLETE and Phase D (WASM) ⏸ DEFERRED. Next: Phase F = streaming realtime.

Before	After
`whisperforge-cli -a foo.wav -m tiny_en`	`wf transcribe -a foo.wav -m tiny_en`
`cargo run -p whisperforge-convert -- --output models/tiny_en`	`wf convert --output models/tiny_en`
`wf -a foo.wav --cpu`	`wf transcribe -a foo.wav --device cpu`
`wf -a foo.wav --wgpu`	`wf transcribe -a foo.wav --device wgpu` (or default `auto`)
`wf -a foo.wav --task translate`	(removed — never worked)

VRAM-aware --encoder-batch-size auto-tune. Users override with --encoder-batch-size until a workload justifies the heuristic cost.
Windows wgpu runtime fallback. Upstream windows-crate conflict between wgpu-hal and gpu-allocator is compile-time; no runtime probe can rescue it. Windows binary still ships CPU-only.