Release v0.4.4 · Blaizzy/mlx-audio

What's Changed

Implement an ICL cache for Qwen3-TTS. by @orbitalquark in #685
feat(higgs_audio): overlap-add mid-generation streaming by @Kairos-a in #669
Add Step-Audio 2 codec by @lucasnewman in #686
Allow reference voice matching in Qwen3 TTS batch generation by @lucasnewman in #689
Delete XCODE_BUILD_TROUBLESHOOTING.md by @Blaizzy in #692
feat: Add VoxCPM2 TTS model (2B params, 48kHz, 30 languages) by @acul3 in #641
Qwen3-TTS: honor caller-provided max_tokens in instruct and ICL paths. by @contrapuntal in #695
Add MOSS TTS models (delay and local transformer) by @lucasnewman in #691
Fix regression in Fish S2 Pro by @lucasnewman in #693
Add Silero VAD model by @lucasnewman in #701
fix(voxtral_realtime): empty downsample_and_project uses decoder dim by @bsmith925 in #700
feat(stt): OpenAI-compatible response_format on /v1/audio/transcriptions by @mbailey in #704
fix: quantize Chatterbox ve projection for 4-bit checkpoints by @masterbatcoderman10 in #707
Add MOSS-TTSD dialogue model by @lucasnewman in #698
Fix slow and flaky tests by @lucasnewman in #705
STS voice pipeline updates by @lucasnewman in #708
Cohere ASR: 1.7× faster long-form, multi-batch correctness, VAD pre-processing by @beshkenadze in #697
fix(granite-speech): transpose pointwise conv weights from PyTorch layout by @nneubacher in #715
Add style instruction support for Fish Speech S2 Pro by @lucasnewman in #713
Fix mx.random.seed() no-op in Fish and Whisper samplers by @emmilco in #721
test(irodori-tts): add v3 tests (duration predictor, Sway Sampling) by @yoshphys in #728
fix: prevent AudioPlayer hang when short text falls below min_buffer_seconds threshold by @jeon30c in #727
feat(vad): Add FSMN-VAD model support by @tian-sweetaylor in #729
Pre-flight model load on streaming audio routes by @guygrigsby in #725
Fix handling of temperature / min-p / top-p when sampling from Qwen3-TTS by @lucasnewman in #735
Fix error handling for models with periods in the repo ID by @lucasnewman in #734
Add Dramabox model by @lucasnewman in #722
Add KittenTTS to supported model docs by @dewana-sl in #741
Fix Kokoro usage from worker threads by @lucasnewman in #745
Add Mega-ASR STT model (Qwen3-ASR-1.7B + audio-quality router + LoRA switching) by @beshkenadze in #740
Add server-side VAD turn detection to /v1/realtime by @maltyxx in #748
Add support for MOSS-TTS 1.5 by @lucasnewman in #749
feat(stt): add granite-speech-4.1-2b-nar (non-autoregressive ASR) by @mouddane in #738
Fix VoxCPM2 Chinese tokenization by @qiuhq-9527 in #751
Update README.md by @Saucken1945 in #752
Use a dynamic batch size for Cohere Transcribe by @lucasnewman in #711
feat(stt): expose word_timestamps form field on /v1/audio/transcriptions by @ciekawy in #716
Add OmniVoice to README by @Blaizzy in #754
Add GET /v1/audio/voices endpoint for TTS voice discovery by @etherious1804 in #743
Add support for Miso TTS by @lucasnewman in #764
feat(irodori-tts): add Irodori-TTS v3 VoiceDesign dual conditioning (speaker + caption) by @yoshphys in #759
Fix LFM2.5-Audio EOS handling by @lucasnewman in #768
feat(stt): add NVIDIA Nemotron 3.5 ASR (streaming) support by @ARahim3 in #771
Add Higgs Audio v3 TTS model by @lucasnewman in #770
Fix Canary-1B-v2 loading for MLX-native checkpoints by @evanqhuang in #763
nemotron_asr: cache-aware streaming (stream_generate) by @beshkenadze in #774
refactor(stt): add shared nemo/ package; decouple nemotron_asr from parakeet by @beshkenadze in #775
fix(utils): sharpen resampler anti-aliasing filter by @beshkenadze in #776
Update version v0.4.4 by @Blaizzy in #778
Add an STT eval harness by @lucasnewman in #777
Fix Whisper best-of-N decoding by @lucasnewman in #767
Add Fun-ASR-Nano model support by @lucasnewman in #760

New Contributors

@acul3 made their first contribution in #641
@bsmith925 made their first contribution in #700
@mbailey made their first contribution in #704
@masterbatcoderman10 made their first contribution in #707
@nneubacher made their first contribution in #715
@emmilco made their first contribution in #721
@jeon30c made their first contribution in #727
@tian-sweetaylor made their first contribution in #729
@guygrigsby made their first contribution in #725
@dewana-sl made their first contribution in #741
@maltyxx made their first contribution in #748
@mouddane made their first contribution in #738
@qiuhq-9527 made their first contribution in #751
@Saucken1945 made their first contribution in #752
@ciekawy made their first contribution in #716
@etherious1804 made their first contribution in #743
@ARahim3 made their first contribution in #771
@evanqhuang made their first contribution in #763

Full Changelog: v0.4.3...v0.4.4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

v0.4.4

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

What's Changed

New Contributors

Contributors

Uh oh!