08 Jun 16:28

e7c9b5e

0.5.5 - 2026-06-08 Latest

Latest

Release Notes

Install whisperforge 0.5.5

Install prebuilt binaries via shell script

curl --proto '=https' --tlsv1.2 -LsSf https://github.com/bevsxyz/WhisperForge/releases/download/v0.5.5/whisperforge-installer.sh | sh

Install prebuilt binaries via powershell script

powershell -ExecutionPolicy Bypass -c "irm https://github.com/bevsxyz/WhisperForge/releases/download/v0.5.5/whisperforge-installer.ps1 | iex"

Install prebuilt binaries via Homebrew

brew install bevsxyz/tap/whisperforge

Install prebuilt binaries into your npm project

npm install whisperforge@0.5.5

Download whisperforge 0.5.5

File	Platform	Checksum
whisperforge-aarch64-apple-darwin.tar.xz	Apple Silicon macOS	checksum
whisperforge-x86_64-pc-windows-msvc.zip	x64 Windows	checksum
whisperforge-x86_64-pc-windows-msvc.msi	x64 Windows	checksum
whisperforge-aarch64-unknown-linux-gnu.tar.xz	ARM64 Linux	checksum
whisperforge-x86_64-unknown-linux-gnu.tar.xz	x64 Linux	checksum

Assets 20

dist-manifest.json

sha256:6280827908918ce622ca28cc54bc85c56b748eba34e2a69cc6db44cd3760f1a7

27.7 KB 2026-06-08T16:28:29Z
sha256.sum

sha256:ebf3c3140c34967d31e826b6a4d1fb98c90830c16395902ea4495f9dbcf5260f

721 Bytes 2026-06-08T16:28:29Z
source.tar.gz

sha256:f61a9fe7ffe69db2cf501b055661d0de5dbfdae657e61a33ded4d83fbe7902dc

1.83 MB 2026-06-08T16:28:29Z
source.tar.gz.sha256

sha256:fda48006bf95d0a3fe98049ec77acd4089b72c0254eca7bf10e47915a021f850

81 Bytes 2026-06-08T16:28:29Z
whisperforge-aarch64-apple-darwin.tar.xz

sha256:050d8231a29d8cfc2dd351c7fab3f8ee4962ccdef495e991c1354e68f2c3db50

11.4 MB 2026-06-08T16:28:29Z
whisperforge-aarch64-apple-darwin.tar.xz.sha256

sha256:64d9c6185029f1a6fdca9e114dfe04069a5b19e9259dc76e38f999908ee8b7a8

108 Bytes 2026-06-08T16:28:29Z
whisperforge-aarch64-unknown-linux-gnu.tar.xz

sha256:47cba08a4b0abec7c942512b42d71e656f63e9f0135cbfe30e8b939c43ab4681

13.1 MB 2026-06-08T16:28:30Z
whisperforge-aarch64-unknown-linux-gnu.tar.xz.sha256

sha256:6f5b46b36242708e413c52cc6d6b94a0b3eae13c9ce680583dd2bffcfccc0ac5

113 Bytes 2026-06-08T16:28:30Z
whisperforge-installer.ps1

sha256:04a95ce014076c83fecc48448b3b9414798722b8e80027339569f6e2a1c45dbb

21.8 KB 2026-06-08T16:28:30Z
whisperforge-installer.sh

sha256:9e00c7595e1b5d66dd1433c9ee7ceb0c8848cf15177cd7d1d00b3b1d3205e304

51.9 KB 2026-06-08T16:28:30Z
Source code (zip)

2026-06-08T16:04:40Z
Source code (tar.gz)

2026-06-08T16:04:40Z

22 May 11:28

github-actions

v0.4.0

8a09d44

WhisperForge v0.4.0

0.4.0 — 2026-05-22

Breaking — Phase E: foundation refactor (crate merger + CLI UX + device selection).

Removed (breaking)

whisperforge-convert crate removed — model conversion lives in wf convert.
whisperforge binary alias dropped — only wf ships.
--cpu flag on the transcribe path — replaced by --device cpu.
--task flag on the transcribe path — parsed-then-runtime-errored before; less code, clearer help.
--wgpu flag (vestigial; was already replaced by feature-driven dispatch).

Added

wf subcommand surface: transcribe, convert, list-models. No more implicit-default subcommand — must specify one explicitly.
wf list-models: tabulates .mpk models under the models directory with precision, n_audio_layer, n_mels, file size.
--device <auto|cpu|wgpu|cuda> on wf transcribe. auto prefers CUDA → WGPU → CPU based on compiled-in features.
Native CUDA backend via optional burn-cuda 0.21.0 (opt-in --features cuda; requires CUDA toolkit at build time).
--models-dir <PATH> and WF_MODELS_DIR env var honored by transcribe and list-models.
Friendly missing-model error pointing at wf list-models and the matching wf convert invocation.

Changed

whisperforge-cli crate renamed to whisperforge. Workspace shrinks 5 → 4 crates.
mise release-check smoke-tests --device cuda and the missing-model friendly error.
Project docs (CLAUDE.md + README + per-crate READMEs) updated for the new CLI surface; Roadmap marks Phase E ✅ COMPLETE and Phase D (WASM) ⏸ DEFERRED. Next: Phase F = streaming realtime.

Migration

Before	After
`whisperforge-cli -a foo.wav -m tiny_en`	`wf transcribe -a foo.wav -m tiny_en`
`cargo run -p whisperforge-convert -- --output models/tiny_en`	`wf convert --output models/tiny_en`
`wf -a foo.wav --cpu`	`wf transcribe -a foo.wav --device cpu`
`wf -a foo.wav --wgpu`	`wf transcribe -a foo.wav --device wgpu` (or default `auto`)
`wf -a foo.wav --task translate`	(removed — never worked)

Deferred (Phase E carve-outs, queued for later phases)

VRAM-aware --encoder-batch-size auto-tune. Users override with --encoder-batch-size until a workload justifies the heuristic cost.
Windows wgpu runtime fallback. Upstream windows-crate conflict between wgpu-hal and gpu-allocator is compile-time; no runtime probe can rescue it. Windows binary still ships CPU-only.

Assets 5

14 May 14:14

bevsxyz

v0.3.1

efa59d7

WhisperForge v0.3.1

Changelog

All notable changes to WhisperForge are documented in this file.

Format follows Keep a Changelog.
All workspace crates are versioned together.

0.2.0 — 2026-04-30

Added

Core transcription pipeline (whisperforge-core)

Whisper model architecture for all sizes (tiny.en through large-v2/v3), built on Burn 0.20.
Audio pipeline: WAV loading with format dispatch (f32/i16/i24/i32), rubato resampling to 16 kHz mono, Slaney-scale mel spectrogram matching Python Whisper (power spectrum, center-padding, 80-band filter bank).
HybridDecoder with quality-gated temperature fallback matching faster-whisper SOTA: compression ratio gate (2.4), log-probability gate (−1.0), no-speech threshold (0.6), temperature sequence [0.0, 0.2, 0.4, 0.6, 0.8, 1.0].
KvCache<B> + forward_decoder_cached: O(n) per step via static cross-attention K,V and growing self-attention K,V — ~2.6× speedup over naive O(n²) decoder.
batch_mel_spectrograms: all audio chunks mel-encoded in a single forward_encoder call before sequential decoding.
transcribe_with_timestamps: per-token timestamps via cross-attention peak, ~100 ms precision.
extract_speaker_embedding: mean-pool encoder output + L2-normalise → speaker fingerprint.
TranscriptionResult / TranscriptionSegment: fully serde-serializable; token_timestamps and speaker fields skipped when absent.
0.8% average WER on LJSpeech tiny.en benchmark.

VAD and alignment (whisperforge-align)

VoiceActivityDetector via earshot with configurable threshold.
AudioSegmenter: splits audio into voice spans ≤30 s, filters silence.
BatchedTranscriber<B>: real batch transcription pipeline — batch_mel_spectrograms → forward_encoder → per-segment KV-cached greedy decode → HybridDecoder.
SrtWriter / SrtEntry: generates well-formed SRT subtitle files.

Speaker diarization (whisperforge-diarize)

SpeakerDiarizer: agglomerative single-linkage clustering of speaker embeddings.
cluster_embeddings: cosine similarity with configurable threshold, SPEAKER_NN label assignment.

Model conversion (whisperforge-convert)

Converts OpenAI Whisper safetensors (via HuggingFace hf-hub) to Burn NamedMpk format.
Correct tensor name mapping for all encoder and decoder layers including cross-attention.
Config auto-detection for all model sizes from tensor shapes.

CLI binary (whisperforge / wf)

Two binary aliases: whisperforge and wf (same binary, shorter alias).
--audio-file / -a: WAV input.
--model / -m: model directory under models/ (default: tiny_en_converted).
--output-format text|srt|json: plain text, SRT subtitles, or JSON with segments and timestamps.
--decoding-preset fast|balanced|accurate: pre-configured beam size and temperature sequences.
--beam-size, --temperature, --length-penalty, --no-speech-threshold: per-run overrides.
--vad-enabled / --vad-threshold: voice activity detection; silence segments skipped.
--wgpu: WGPU GPU backend (Vulkan, DX12, or Metal — no CUDA required).
--diarize / --diarize-threshold: speaker labels ([SPEAKER_NN]:) in SRT and JSON.
--task transcribe|translate, --language: Whisper task and language selection.
Automatic 30 s chunking with 1 s overlap for long audio.

Notes

Model files (.mpk, .cfg, tokenizer.json) are not bundled — download from HuggingFace and convert with whisperforge-convert. See README for instructions.
Requires Rust 1.85+ (Rust 2024 edition).
--wgpu requires Vulkan/DX12/Metal drivers.
whisperforge-align has known pre-existing test failures; always excluded from the test suite.
Phase 7 Option B (ResNet293 speaker embeddings via burn-onnx) is deferred pending Burn 0.21 stable.

Assets 5

14 May 13:18

bevsxyz

v0.3.0

e81d3ee

v0.3.0

🎯 WhisperForge 0.3.0: Quantization & Performance

🚀 Major Features

Phase C: INT8 Post-Training Quantization ✅

4× model size reduction: Tiny (150 MB → 37 MB), ideal for edge deployment
--quantize int8 flag in model converter
Transparent loading—no code changes required
Full precision (FP32) and quantized (INT8) models interoperable

Phase B.5: GPU-Accelerated Mel Spectrogram ✅

CubeCL DFT kernel for GPU mel filterbank matmul
--features cubecl-stft enables GPU STFT pipeline
Faster audio preprocessing on WGPU backend

Burn 0.21 & burn-flex Migration

Latest Burn with improved numerical stability
CPU fallback (burn-flex) seamlessly handles CPU inference
Better WGPU runtime integration

📊 What's Changed

Quantized models fully compatible with CLI and library API
Streaming audio pipeline (Phase B) now fully integrated
Fixed EOT suppression at step 0 for robustness
Improved error handling across all crates

📦 All 5 Crates Published

whisperforge-core v0.3.0 — library
whisperforge-cli v0.3.0 — binary
whisperforge-convert v0.3.0 — model converter
whisperforge-align v0.3.0 — VAD + SRT
whisperforge-diarize v0.3.0 — speaker diarization

🛠️ Dependency Updates

Simplified workspace dependency management: inter-crate deps now use workspace version automatically.

📖 Next Phase

Phase D: WASM Target—browser-native speech-to-text with wasm-bindgen.

Full Changelog: v0.2.0...v0.3.0

Assets 2

30 Apr 12:51

github-actions

v0.2.0

2262c0e

WhisperForge v0.2.0

Changelog

All notable changes to WhisperForge are documented in this file.

Format follows Keep a Changelog.
All workspace crates are versioned together.

0.2.0 — 2026-04-30

Added

Core transcription pipeline (whisperforge-core)

Whisper model architecture for all sizes (tiny.en through large-v2/v3), built on Burn 0.20.
Audio pipeline: WAV loading with format dispatch (f32/i16/i24/i32), rubato resampling to 16 kHz mono, Slaney-scale mel spectrogram matching Python Whisper (power spectrum, center-padding, 80-band filter bank).
HybridDecoder with quality-gated temperature fallback matching faster-whisper SOTA: compression ratio gate (2.4), log-probability gate (−1.0), no-speech threshold (0.6), temperature sequence [0.0, 0.2, 0.4, 0.6, 0.8, 1.0].
KvCache<B> + forward_decoder_cached: O(n) per step via static cross-attention K,V and growing self-attention K,V — ~2.6× speedup over naive O(n²) decoder.
batch_mel_spectrograms: all audio chunks mel-encoded in a single forward_encoder call before sequential decoding.
transcribe_with_timestamps: per-token timestamps via cross-attention peak, ~100 ms precision.
extract_speaker_embedding: mean-pool encoder output + L2-normalise → speaker fingerprint.
TranscriptionResult / TranscriptionSegment: fully serde-serializable; token_timestamps and speaker fields skipped when absent.
0.8% average WER on LJSpeech tiny.en benchmark.

VAD and alignment (whisperforge-align)

VoiceActivityDetector via earshot with configurable threshold.
AudioSegmenter: splits audio into voice spans ≤30 s, filters silence.
BatchedTranscriber<B>: real batch transcription pipeline — batch_mel_spectrograms → forward_encoder → per-segment KV-cached greedy decode → HybridDecoder.
SrtWriter / SrtEntry: generates well-formed SRT subtitle files.

Speaker diarization (whisperforge-diarize)

SpeakerDiarizer: agglomerative single-linkage clustering of speaker embeddings.
cluster_embeddings: cosine similarity with configurable threshold, SPEAKER_NN label assignment.

Model conversion (whisperforge-convert)

Converts OpenAI Whisper safetensors (via HuggingFace hf-hub) to Burn NamedMpk format.
Correct tensor name mapping for all encoder and decoder layers including cross-attention.
Config auto-detection for all model sizes from tensor shapes.

CLI binary (whisperforge / wf)

Two binary aliases: whisperforge and wf (same binary, shorter alias).
--audio-file / -a: WAV input.
--model / -m: model directory under models/ (default: tiny_en_converted).
--output-format text|srt|json: plain text, SRT subtitles, or JSON with segments and timestamps.
--decoding-preset fast|balanced|accurate: pre-configured beam size and temperature sequences.
--beam-size, --temperature, --length-penalty, --no-speech-threshold: per-run overrides.
--vad-enabled / --vad-threshold: voice activity detection; silence segments skipped.
--wgpu: WGPU GPU backend (Vulkan, DX12, or Metal — no CUDA required).
--diarize / --diarize-threshold: speaker labels ([SPEAKER_NN]:) in SRT and JSON.
--task transcribe|translate, --language: Whisper task and language selection.
Automatic 30 s chunking with 1 s overlap for long audio.

Notes

Model files (.mpk, .cfg, tokenizer.json) are not bundled — download from HuggingFace and convert with whisperforge-convert. See README for instructions.
Requires Rust 1.85+ (Rust 2024 edition).
--wgpu requires Vulkan/DX12/Metal drivers.
whisperforge-align has known pre-existing test failures; always excluded from the test suite.
Phase 7 Option B (ResNet293 speaker embeddings via burn-onnx) is deferred pending Burn 0.21 stable.

Assets 5

Releases: bevsxyz/WhisperForge

0.5.5 - 2026-06-08

Release Notes

Install whisperforge 0.5.5

Install prebuilt binaries via shell script

Install prebuilt binaries via powershell script

Install prebuilt binaries via Homebrew

Install prebuilt binaries into your npm project

Download whisperforge 0.5.5

Uh oh!

WhisperForge v0.4.0

0.4.0 — 2026-05-22

Removed (breaking)

Added

Changed

Migration

Deferred (Phase E carve-outs, queued for later phases)

Uh oh!

WhisperForge v0.3.1

Changelog

0.2.0 — 2026-04-30

Added

Notes

Uh oh!

v0.3.0

🎯 WhisperForge 0.3.0: Quantization & Performance

🚀 Major Features

Phase C: INT8 Post-Training Quantization ✅

Phase B.5: GPU-Accelerated Mel Spectrogram ✅

Burn 0.21 & burn-flex Migration

📊 What's Changed

📦 All 5 Crates Published

🛠️ Dependency Updates

📖 Next Phase

Uh oh!

WhisperForge v0.2.0

Changelog

0.2.0 — 2026-04-30

Added

Notes

Uh oh!