Higgs Audio v3 is a standalone release and does not depend on the code here. Just grab the weights or call the hosted API:
Conversational TTS across 100+ languages · zero-shot voice cloning · inline emotion / style / prosody control.
Free, rate-limited public preview. Get a key at boson.ai/workspace.
export BOSON_API_KEY=bai-xxxx
curl https://api.boson.ai/v1/audio/speech \
-H "Authorization: Bearer $BOSON_API_KEY" \
-H "Content-Type: application/json" \
-d '{"model": "higgs-audio-v3-tts", "input": "Hello, this is a test."}' \
--output out.mp3OpenAI-compatible; supports preset voices, zero-shot cloning, and streaming. Full reference: API docs.
Weights: bosonai/higgs-audio-v3-tts-4b. We recommend serving with SGLang-Omni:
export HF_TOKEN=hf_xxxxxxxxxxxxxxxx
hf download bosonai/higgs-audio-v3-tts-4b
sgl-omni serve --model-path bosonai/higgs-audio-v3-tts-4b --port 8000Serving, voice-cloning, and streaming recipes are in the model card and the SGLang-Omni cookbook.
Note
Higgs Audio v3 is released under the Boson Higgs Audio v3 Research and Non-Commercial License. Production / hosted / revenue-generating use requires a separate commercial license.
The full v2 / v2.5 documentation — installation, examples, technical details, and benchmarks — has moved to README_V2.md. Those models remain available on Hugging Face: v2 (3B base) and the v2.5 blog.
For contribution and support guidelines, please see SUPPORT_GUIDELINES.md.
If you are passionate about multimodal AI, speech/audio models, or large-scale systems, check out our open positions at Boson AI Careers.
@misc{bosonai_higgs_audio_tts_v3_2026,
title = {Higgs Audio v3 TTS: Conversational Speech for Voice AI from Boson AI},
author = {Boson AI},
year = {2026},
howpublished = {https://huggingface.co/bosonai/higgs-audio-v3-tts-4b},
}The boson_multimodal/audio_processing/ directory contains code derived from third-party repositories, primarily from xcodec. See the LICENSE in that directory for attribution and licensing.