What's Changed
- Implement an ICL cache for Qwen3-TTS. by @orbitalquark in #685
- feat(higgs_audio): overlap-add mid-generation streaming by @Kairos-a in #669
- Add Step-Audio 2 codec by @lucasnewman in #686
- Allow reference voice matching in Qwen3 TTS batch generation by @lucasnewman in #689
- Delete XCODE_BUILD_TROUBLESHOOTING.md by @Blaizzy in #692
- feat: Add VoxCPM2 TTS model (2B params, 48kHz, 30 languages) by @acul3 in #641
- Qwen3-TTS: honor caller-provided max_tokens in instruct and ICL paths. by @contrapuntal in #695
- Add MOSS TTS models (delay and local transformer) by @lucasnewman in #691
- Fix regression in Fish S2 Pro by @lucasnewman in #693
- Add Silero VAD model by @lucasnewman in #701
fix(voxtral_realtime): empty downsample_and_project uses decoder dimby @bsmith925 in #700- feat(stt): OpenAI-compatible response_format on /v1/audio/transcriptions by @mbailey in #704
- fix: quantize Chatterbox ve projection for 4-bit checkpoints by @masterbatcoderman10 in #707
- Add MOSS-TTSD dialogue model by @lucasnewman in #698
- Fix slow and flaky tests by @lucasnewman in #705
- STS voice pipeline updates by @lucasnewman in #708
- Cohere ASR: 1.7× faster long-form, multi-batch correctness, VAD pre-processing by @beshkenadze in #697
- fix(granite-speech): transpose pointwise conv weights from PyTorch layout by @nneubacher in #715
- Add style instruction support for Fish Speech S2 Pro by @lucasnewman in #713
- Fix mx.random.seed() no-op in Fish and Whisper samplers by @emmilco in #721
- test(irodori-tts): add v3 tests (duration predictor, Sway Sampling) by @yoshphys in #728
- fix: prevent AudioPlayer hang when short text falls below min_buffer_seconds threshold by @jeon30c in #727
- feat(vad): Add FSMN-VAD model support by @tian-sweetaylor in #729
- Pre-flight model load on streaming audio routes by @guygrigsby in #725
- Fix handling of temperature / min-p / top-p when sampling from Qwen3-TTS by @lucasnewman in #735
- Fix error handling for models with periods in the repo ID by @lucasnewman in #734
- Add Dramabox model by @lucasnewman in #722
- Add KittenTTS to supported model docs by @dewana-sl in #741
- Fix Kokoro usage from worker threads by @lucasnewman in #745
- Add Mega-ASR STT model (Qwen3-ASR-1.7B + audio-quality router + LoRA switching) by @beshkenadze in #740
- Add server-side VAD turn detection to /v1/realtime by @maltyxx in #748
- Add support for MOSS-TTS 1.5 by @lucasnewman in #749
- feat(stt): add granite-speech-4.1-2b-nar (non-autoregressive ASR) by @mouddane in #738
- Fix VoxCPM2 Chinese tokenization by @qiuhq-9527 in #751
- Update README.md by @Saucken1945 in #752
- Use a dynamic batch size for Cohere Transcribe by @lucasnewman in #711
- feat(stt): expose word_timestamps form field on /v1/audio/transcriptions by @ciekawy in #716
- Add OmniVoice to README by @Blaizzy in #754
- Add GET /v1/audio/voices endpoint for TTS voice discovery by @etherious1804 in #743
- Add support for Miso TTS by @lucasnewman in #764
- feat(irodori-tts): add Irodori-TTS v3 VoiceDesign dual conditioning (speaker + caption) by @yoshphys in #759
- Fix LFM2.5-Audio EOS handling by @lucasnewman in #768
- feat(stt): add NVIDIA Nemotron 3.5 ASR (streaming) support by @ARahim3 in #771
- Add Higgs Audio v3 TTS model by @lucasnewman in #770
- Fix Canary-1B-v2 loading for MLX-native checkpoints by @evanqhuang in #763
- nemotron_asr: cache-aware streaming (stream_generate) by @beshkenadze in #774
- refactor(stt): add shared nemo/ package; decouple nemotron_asr from parakeet by @beshkenadze in #775
- fix(utils): sharpen resampler anti-aliasing filter by @beshkenadze in #776
- Update version v0.4.4 by @Blaizzy in #778
- Add an STT eval harness by @lucasnewman in #777
- Fix Whisper best-of-N decoding by @lucasnewman in #767
- Add Fun-ASR-Nano model support by @lucasnewman in #760
New Contributors
- @acul3 made their first contribution in #641
- @bsmith925 made their first contribution in #700
- @mbailey made their first contribution in #704
- @masterbatcoderman10 made their first contribution in #707
- @nneubacher made their first contribution in #715
- @emmilco made their first contribution in #721
- @jeon30c made their first contribution in #727
- @tian-sweetaylor made their first contribution in #729
- @guygrigsby made their first contribution in #725
- @dewana-sl made their first contribution in #741
- @maltyxx made their first contribution in #748
- @mouddane made their first contribution in #738
- @qiuhq-9527 made their first contribution in #751
- @Saucken1945 made their first contribution in #752
- @ciekawy made their first contribution in #716
- @etherious1804 made their first contribution in #743
- @ARahim3 made their first contribution in #771
- @evanqhuang made their first contribution in #763
Full Changelog: v0.4.3...v0.4.4