Lightweight local TTS server based on the very fast Pocket TTS model from Kyutai, provides a simple OpenAI-compatible speech API (v1/audio/speech) for generating audio from text.
Using an old Haswell CPU it runs at around 1.5x real-time speed with the nova voice.
The server works great with the OpenAI TTS Custom Component for Home Assistant.
Inspired by kyutai-tts-openai-api.
docker build -t pocket_tts_api .
docker run --restart=always --name pocket_tts_api -d -p 8008:8000 pocket_tts_apior with docker-compose:
docker-compose up -dCurrently the model and speed parameters are ignored.
curl http://localhost:8008/v1/audio/speech \
-H "Content-Type: application/json" \
-d '{
"model": "tts-1",
"input": "Hello! This is a test of the fully compatible local text to speech server.",
"voice": "nova",
"response_format":"wav",
"speed": 1.1
}' \
--output test_audio.wav