A minimal, embeddable Text-to-Speech (TTS) library for Rust using the Kokoro 82M parameter model.
This is a reduced version of kokoro-tiny created by by 8b-is.
- Minimal dependencies - Only essential crates for TTS synthesis
- Auto-downloading - Model files (310MB + 27MB) download automatically to
~/.cache/k/ - Multiple voices - Support for various voice styles with mixing capability
- Speed & gain control - Adjust speech speed and volume
- WAV export - Save synthesized audio to WAV files
- Long text support - Automatic chunking and crossfading for longer texts
- Silent by default - No output unless
KOKORO_DEBUG=1is set
Add to your Cargo.toml:
[dependencies]
kokoro-micro = "0.2.0"
tokio = { version = "1", features = ["rt", "macros"] }use kokoro_micro::TtsEngine;
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
// Initialize TTS engine (downloads model on first run)
let mut tts = TtsEngine::new().await?;
// Synthesize speech
// Parameters: text, voice (None for default), speed, gain, language
let audio = tts.synthesize_with_options(
"Hello world!",
None, // voice: None = default "af_sky"
1.0, // speed: 1.0 = normal
1.0, // gain: 1.0 = normal volume
Some("en") // language
)?;
// Save to WAV file
tts.save_wav("output.wav", &audio)?;
Ok(())
}Main struct for text-to-speech synthesis.
-
new() -> Result<Self, String>
Create a new TTS engine. Downloads model files to~/.cache/k/on first run. -
with_paths(model_path: &str, voices_path: &str) -> Result<Self, String>
Create engine with custom model file paths. -
voices() -> Vec<String>
List all available voice names. -
synthesize_with_options(text: &str, voice: Option<&str>, speed: f32, gain: f32, lang: Option<&str>) -> Result<Vec<f32>, String>
Synthesize text to audio samples.text- Text to synthesizevoice- Voice name (e.g., "af_sky", "af_nicole", "am_adam") or None for defaultspeed- Speech speed (0.5 = slower, 1.0 = normal, 2.0 = faster)gain- Volume multiplier (0.5 = quieter, 1.0 = normal, 2.0 = louder)lang- Language code (e.g., "en", "es", "fr") or None for default "en"
-
save_wav(path: &str, audio: &[f32]) -> Result<(), String>
Save audio samples to a WAV file.
You can mix multiple voices by using weighted combinations:
// Mix 40% af_sky + 50% af_nicole
let audio = tts.synthesize_with_options(
"Hello!",
Some("af_sky.4+af_nicole.5"),
1.0,
1.0,
Some("en")
)?;Common voices include:
af_sky(default) - Female, gentleaf_nicole- Femaleaf_bella- Femaleam_adam- Maleam_michael- Male
Use tts.voices() to list all available voices.
By default, kokoro-micro runs silently with no console output. To enable debug logging (model download progress, synthesis details, etc.), set the KOKORO_DEBUG environment variable:
# Enable debug logging
KOKORO_DEBUG=1 cargo run --example simple
# Or in your code
std::env::set_var("KOKORO_DEBUG", "1");Debug logging shows:
- Model download progress
- Long-form synthesis chunking information
- Phoneme conversion details
- Audio generation statistics
See examples/simple.rs:
# Run without debug output
cargo run --example simple
# Run with debug output
KOKORO_DEBUG=1 cargo run --example simplecuda- Enable CUDA acceleration for ONNX Runtime
[dependencies]
kokoro-micro = { version = "0.2.0", features = ["cuda"] }Model files are automatically downloaded on first use to $HOME/.cache/k/:
$HOME/.cache/k/0.onnx(310MB) - Kokoro ONNX model$HOME/.cache/k/0.bin(27MB) - Voice embeddings
The same cache directory is used on all platforms (Linux, macOS, Windows):
- Linux/macOS:
$HOME/.cache/k/(e.g.,/home/user/.cache/k/) - Windows:
%USERPROFILE%/.cache/k/(e.g.,C:\Users\Username\.cache\k\)
Files are cached and shared across all applications using kokoro-micro.
Apache-2.0
Built with the Kokoro 82M parameter TTS model. Reduced version from kokoro-tiny by 8b-is.