Skip to content

WhisperForge v0.4.0

Choose a tag to compare

@github-actions github-actions released this 22 May 11:28
· 108 commits to main since this release

0.4.0 — 2026-05-22

Breaking — Phase E: foundation refactor (crate merger + CLI UX + device selection).

Removed (breaking)

  • whisperforge-convert crate removed — model conversion lives in wf convert.
  • whisperforge binary alias dropped — only wf ships.
  • --cpu flag on the transcribe path — replaced by --device cpu.
  • --task flag on the transcribe path — parsed-then-runtime-errored before; less code, clearer help.
  • --wgpu flag (vestigial; was already replaced by feature-driven dispatch).

Added

  • wf subcommand surface: transcribe, convert, list-models. No more implicit-default subcommand — must specify one explicitly.
  • wf list-models: tabulates .mpk models under the models directory with precision, n_audio_layer, n_mels, file size.
  • --device <auto|cpu|wgpu|cuda> on wf transcribe. auto prefers CUDA → WGPU → CPU based on compiled-in features.
  • Native CUDA backend via optional burn-cuda 0.21.0 (opt-in --features cuda; requires CUDA toolkit at build time).
  • --models-dir <PATH> and WF_MODELS_DIR env var honored by transcribe and list-models.
  • Friendly missing-model error pointing at wf list-models and the matching wf convert invocation.

Changed

  • whisperforge-cli crate renamed to whisperforge. Workspace shrinks 5 → 4 crates.
  • mise release-check smoke-tests --device cuda and the missing-model friendly error.
  • Project docs (CLAUDE.md + README + per-crate READMEs) updated for the new CLI surface; Roadmap marks Phase E ✅ COMPLETE and Phase D (WASM) ⏸ DEFERRED. Next: Phase F = streaming realtime.

Migration

Before After
whisperforge-cli -a foo.wav -m tiny_en wf transcribe -a foo.wav -m tiny_en
cargo run -p whisperforge-convert -- --output models/tiny_en wf convert --output models/tiny_en
wf -a foo.wav --cpu wf transcribe -a foo.wav --device cpu
wf -a foo.wav --wgpu wf transcribe -a foo.wav --device wgpu (or default auto)
wf -a foo.wav --task translate (removed — never worked)

Deferred (Phase E carve-outs, queued for later phases)

  • VRAM-aware --encoder-batch-size auto-tune. Users override with --encoder-batch-size until a workload justifies the heuristic cost.
  • Windows wgpu runtime fallback. Upstream windows-crate conflict between wgpu-hal and gpu-allocator is compile-time; no runtime probe can rescue it. Windows binary still ships CPU-only.