New Backends
ASR:
- LFM2-Audio 1.5B (LiquidAI LFM2.5-Audio) — Depthformer encoder with KV cache, conv state caching, gallocr graph allocation. ASR + TTS + speech-to-speech. GPU backend support.
- Mini-Omni2 — Qwen2-0.5B LLM + Whisper encoder + adapter. ASR/TTS/S2S with BPE tokenizer. Q4_K auto-download.
- Nemotron 3.5 ASR Streaming 0.6B — 39-language streaming ASR scaffold (NVIDIA Parakeet architecture + prompt kernel). Chunked encoder with graph reuse and causal DW conv padding.
Major Features
- Speech-to-speech (
--s2s) — new CLI flag for end-to-end speech-to-speech with LFM2-Audio and Mini-Omni2;POST /v1/audio/speech-to-speechserver endpoint; session-level S2S C API + Dart FFI bindings - WebSocket streaming ASR — real-time ASR via
--ws-portwith proper WebSocket handshake - M4A/AAC/Opus/WebM input — ffmpeg-based container support across CLI, WASM demo, and HF Space
- WASM ASR session surface — backend-agnostic
asrOpen/asrTranscribe/asrSet*for JavaScript/WASM consumers - Node.js addon —
transcribeSession()via crispasr_session C ABI with Jest test coverage - Full C-ABI session-setter parity — all 7 language bindings (C, Go, Python, Ruby, Rust, JS/WASM, Dart) at full parity;
crispasr_session_set_punc_model+ hotwords setters - Server parity — truecase, per-request diarize/LID knobs, punc-model (PCS + CTC auto-enable),
POST /v1/translate, all remaining transcription params exposed per-request --dry-run-resolve— now honors sub-variant model keys- FunASR
-llanguage flag — language routing wired into prompt template - Regression manifest — 65 total backends tracked (17 new); pinned SHA revisions for all model repos
Performance
- LFM2-Audio — KV cache (3x decode speedup), conv state caching (10x), gallocr graph allocation (prefill buffer 2 GB → 256 MB), depthformer buffer reuse, streaming TTS API
- FireRed — batch beam decoder (4.5x faster beam search), OpenMP + vectorizable dot (~4x faster AED decoder)
- Server — VAD slicing now matches CLI for unbounded backends (#165); LID model kept resident across requests; Silero/FireRed/ECAPA LID contexts cached across calls
- Mini-Omni2 — batch embed for TTS/S2S
Notable Bug Fixes
- #167 — SIGILL on AMD Ryzen 5700X (and all non-AVX512 CPUs): release binaries were compiled with
-march=nativeon AVX-512 CI runners. All Linux x86_64 builds now use-DGGML_NATIVE=OFF -DGGML_AVX2=ON - #164 — VoxCPM2 TTS: 12+ fixes — VAE Vulkan work-group overflow (CPU fallback for long decodes), RALM NaN on Vulkan, CUDA SIGABRT in attention permute, FSQ NaN on CUDA, stop predictor crash, null-guard graph tensors
- #165 — Silero-LID crash on GPU builds (CPU threads set on GPU backend)
- #89 — Parakeet-JA: auto-enable VAD instead of shorter chunks for Japanese models
- #81 — Nemotron: tensor name loading (
conv.bn.*,prompt_kernel.linear*.*), exact-size graphs (no zero-padding), language mapping, causal DW conv padding - #52 — Qwen3-TTS O15 CUDA crash: dedicated scheduler for cached T=1 graph
- #125 — Gemma4-E2B prefers_vad + MIMO-ASR sweep fixes
- MOSS-Audio GPT-2 byte detokenization (remove Ġ artifacts)
- Orpheus
token_embdkept at F16 during quantization (SNAC codec is quant-sensitive) - Core attention: skip RoPE when
rope_theta <= 0(fixes RALM NaN) - Core attention:
ggml_contafter permute beforeset_rows(fixes CUDA SIGABRT) - Server:
--no-warmupopt-out, guarded warmup, surfaced 500s, robust scratch dir - CLI:
--dry-run-resolvehonors sub-variant model keys
Build / CI
- Emscripten: fix
libwhisper.worker.jscopy failure withSINGLE_FILE=1(modern Emscripten inlines pthread worker) - Windows MSYS2: fix kokoro
phonemize_builtin_*linker errors (phonemizer extracted to shared OBJECT library) - Regression manifest: all model revisions pinned to SHAs; voice_preset added to 12 TTS entries
- Pre-commit hook auto-syncs Go CGO LDFLAGS on CMake changes
- Docker Smoke workflow fixed (correct build context)
- Go CGO LDFLAGS regenerated for new backends (snac, mini-omni2, lfm2_audio)
- clang-format v18 pass on 25 files with format drift
- HF Space: pre-built binary workflow, model pre-download, Ubuntu 24.04 fixes
- WASM build workflow added (all backends)
- CI: windows-blas pkgconfiglite fix for windows-2025 runner
SNAC Refactor
- SNAC decoder extracted to
core/snac(shared by Orpheus + Mini-Omni2) - Unit + live tests for
core/snac.h