Local Linux TTS daemon plus Speech Dispatcher bridge built on Qwen3-TTS.
- Persistent local daemon (
speake-rs-daemon) over a Unix socket - Optional HTTP server mode (
--features http) for network-accessible TTS - CLI client (
speake-rs) for direct synthesis and Speech Dispatcher bridge mode sd_genericmodule integration so globalspd-saycan route throughspeake-rs- Optional user-managed voice cloning profiles
You'll have to clone your own voices.
How:
# 1) Create a local profile from your own reference audio
speake-rs clone create --name sample_voice --ref-audio /path/to/reference.wav
# 2) Start daemon in base mode for profile synthesis
speake-rs-daemon --model base
# 3) Test profile directly
speake-rs speak "hello from my cloned profile" --profile sample_voiceFor global spd-say profile mapping, see docs/voice-cloning.md.
ICL voice cloning is mostly untested in this project right now and should be treated as experimental.
cargo build --workspaceCUDA build:
cargo build --workspace --features cudaBuild with HTTP server support:
cargo build -p speake-rs-daemon --features httpcargo install --path crates/speake-rs-cli --force
cargo install --path crates/speake-rs-daemon --forceCUDA install:
cargo install --path crates/speake-rs-cli --force --features cuda
cargo install --path crates/speake-rs-daemon --force --features cudaInstall with HTTP + CUDA:
cargo install --path crates/speake-rs-daemon --force --features http,cudaStart daemon (preset-voice default path):
speake-rs-daemon --model custom-voiceVerify local health:
speake-rs doctor
speake-rs speak "hello from speake-rs" --voice ryanConfigure global spd-say routing via Speech Dispatcher:
docs/setup-speech-dispatcher.md
When built with --features http, the daemon can serve TTS over HTTP instead of a Unix socket. This is useful for running in Docker containers or exposing TTS as a network service.
speake-rs-daemon --http 0.0.0.0:9000GET /health— returns{"status":"ok","model":"...","uptime_secs":N}GET /voices— returns list of available voice IDsPOST /tts— synthesize speech, returns raw audio bytes
Request body (JSON):
{
"text": "Hello world",
"voice": "ryan",
"language": "en",
"speaking_rate": 1.0,
"format": "ogg"
}| Field | Default | Description |
|---|---|---|
text |
(required) | Text to synthesize |
voice |
"ryan" |
Preset speaker name or "profile:<name>" for a cloned voice |
language |
"en" |
Language code (en, zh, ja, ko, de, fr, ru, pt, es, it) |
speaking_rate |
1.0 |
Playback speed (0.0–5.0, applied via ffmpeg atempo) |
format |
"ogg" |
Output format: "ogg" or "mp3" |
Response: raw audio bytes with appropriate Content-Type header.
Requires ffmpeg on the system PATH for audio format conversion.
docker build -t speake-rs .
docker run --gpus all -p 9000:9000 speake-rscurl http://localhost:9000/health
curl -X POST http://localhost:9000/tts \
-H 'Content-Type: application/json' \
-d '{"text":"hello world","voice":"ryan","format":"ogg"}' \
--output test.oggdocs/setup-speech-dispatcher.md- global user setup forspd-saydocs/voice-cloning.md- optional local profile cloning workflowdocs/gpu-cuda.md- CUDA build/runtime notesdocs/troubleshooting.md- common runtime and routing issues