Release v0.7.2 · CrispStrobe/CrispASR

New Backends

ASR:

LFM2-Audio 1.5B (LiquidAI LFM2.5-Audio) — Depthformer encoder with KV cache, conv state caching, gallocr graph allocation. ASR + TTS + speech-to-speech. GPU backend support.
Mini-Omni2 — Qwen2-0.5B LLM + Whisper encoder + adapter. ASR/TTS/S2S with BPE tokenizer. Q4_K auto-download.
Nemotron 3.5 ASR Streaming 0.6B — 39-language streaming ASR scaffold (NVIDIA Parakeet architecture + prompt kernel). Chunked encoder with graph reuse and causal DW conv padding.

Speech-to-speech (--s2s) — new CLI flag for end-to-end speech-to-speech with LFM2-Audio and Mini-Omni2; POST /v1/audio/speech-to-speech server endpoint; session-level S2S C API + Dart FFI bindings
WebSocket streaming ASR — real-time ASR via --ws-port with proper WebSocket handshake
M4A/AAC/Opus/WebM input — ffmpeg-based container support across CLI, WASM demo, and HF Space
WASM ASR session surface — backend-agnostic asrOpen/asrTranscribe/asrSet* for JavaScript/WASM consumers
Node.js addon — transcribeSession() via crispasr_session C ABI with Jest test coverage
Full C-ABI session-setter parity — all 7 language bindings (C, Go, Python, Ruby, Rust, JS/WASM, Dart) at full parity; crispasr_session_set_punc_model + hotwords setters
Server parity — truecase, per-request diarize/LID knobs, punc-model (PCS + CTC auto-enable), POST /v1/translate, all remaining transcription params exposed per-request
--dry-run-resolve — now honors sub-variant model keys
FunASR -l language flag — language routing wired into prompt template
Regression manifest — 65 total backends tracked (17 new); pinned SHA revisions for all model repos

LFM2-Audio — KV cache (3x decode speedup), conv state caching (10x), gallocr graph allocation (prefill buffer 2 GB → 256 MB), depthformer buffer reuse, streaming TTS API
FireRed — batch beam decoder (4.5x faster beam search), OpenMP + vectorizable dot (~4x faster AED decoder)
Server — VAD slicing now matches CLI for unbounded backends (#165); LID model kept resident across requests; Silero/FireRed/ECAPA LID contexts cached across calls
Mini-Omni2 — batch embed for TTS/S2S

#167 — SIGILL on AMD Ryzen 5700X (and all non-AVX512 CPUs): release binaries were compiled with -march=native on AVX-512 CI runners. All Linux x86_64 builds now use -DGGML_NATIVE=OFF -DGGML_AVX2=ON
#164 — VoxCPM2 TTS: 12+ fixes — VAE Vulkan work-group overflow (CPU fallback for long decodes), RALM NaN on Vulkan, CUDA SIGABRT in attention permute, FSQ NaN on CUDA, stop predictor crash, null-guard graph tensors
#165 — Silero-LID crash on GPU builds (CPU threads set on GPU backend)
#89 — Parakeet-JA: auto-enable VAD instead of shorter chunks for Japanese models
#81 — Nemotron: tensor name loading (conv.bn.*, prompt_kernel.linear*.*), exact-size graphs (no zero-padding), language mapping, causal DW conv padding
#52 — Qwen3-TTS O15 CUDA crash: dedicated scheduler for cached T=1 graph
#125 — Gemma4-E2B prefers_vad + MIMO-ASR sweep fixes
MOSS-Audio GPT-2 byte detokenization (remove Ġ artifacts)
Orpheus token_embd kept at F16 during quantization (SNAC codec is quant-sensitive)
Core attention: skip RoPE when rope_theta <= 0 (fixes RALM NaN)
Core attention: ggml_cont after permute before set_rows (fixes CUDA SIGABRT)
Server: --no-warmup opt-out, guarded warmup, surfaced 500s, robust scratch dir
CLI: --dry-run-resolve honors sub-variant model keys

Emscripten: fix libwhisper.worker.js copy failure with SINGLE_FILE=1 (modern Emscripten inlines pthread worker)
Windows MSYS2: fix kokoro phonemize_builtin_* linker errors (phonemizer extracted to shared OBJECT library)
Regression manifest: all model revisions pinned to SHAs; voice_preset added to 12 TTS entries
Pre-commit hook auto-syncs Go CGO LDFLAGS on CMake changes
Docker Smoke workflow fixed (correct build context)
Go CGO LDFLAGS regenerated for new backends (snac, mini-omni2, lfm2_audio)
clang-format v18 pass on 25 files with format drift
HF Space: pre-built binary workflow, model pre-download, Ubuntu 24.04 fixes
WASM build workflow added (all backends)
CI: windows-blas pkgconfiglite fix for windows-2025 runner