Kokoro is an open-weight TTS model with 82 million parameters. Despite its lightweight architecture, it delivers comparable quality to larger models while being significantly faster and more cost-efficient. With Apache-licensed weights, Kokoro can be deployed anywhere from production environments to personal projects.
You can run this basic cell:
!pip install -q kokoro>=0.9.2 soundfile
!apt-get -qq -y install espeak-ng > /dev/null 2>&1
!git clone https://github.com/.../kokoro_german kokoro_german
from kokoro_german.kokoro import KPipeline
from IPython.display import display, Audio
import soundfile as sf
import torch
pipeline = KPipeline(lang_code='d')
text = '''
Kokoro-German ist ein TTS-Modell mit 82 Millionen Parametern.
Das Modell verwendet denselben Inferenzcode wie Kokoro.
'''
generator = pipeline(text, voice='df_eva')
for i, (gs, ps, audio) in enumerate(generator):
print(i, gs, ps)
display(Audio(data=audio, rate=24000, autoplay=i==0))
sf.write(f'{i}.wav', audio, 24000)Under the hood, kokoro uses misaki, a G2P library at https://github.com/hexgrad/misaki
You can run this advanced cell on Google Colab.
# 1️⃣ Install kokoro
!pip install -q kokoro>=0.9.4 soundfile
# 2️⃣ Install espeak, used for English OOD fallback and some non-English languages
!apt-get -qq -y install espeak-ng > /dev/null 2>&1
# 3️⃣ Initalize a pipeline
from kokoro import KPipeline
from IPython.display import display, Audio
import soundfile as sf
import torch
# 🇺🇸 'a' => American English, 🇬🇧 'b' => British English
# 🇩🇪 'd' => German de
# 🇪🇸 'e' => Spanish es
# 🇫🇷 'f' => French fr-fr
# 🇮🇳 'h' => Hindi hi
# 🇮🇹 'i' => Italian it
# 🇯🇵 'j' => Japanese: pip install misaki[ja]
# 🇧🇷 'p' => Brazilian Portuguese pt-br
# 🇨🇳 'z' => Mandarin Chinese: pip install misaki[zh]
pipeline = KPipeline(lang_code='a') # <= make sure lang_code matches voice, reference above.
# This text is for demonstration purposes only, unseen during training
text = '''
Kokoro-German ist ein TTS-Modell mit 82 Millionen Parametern.
Das Modell verwendet denselben Inferenzcode wie Kokoro.
'''
# 4️⃣ Generate, display, and save audio files in a loop.
generator = pipeline(
text, voice='df_eva', # <= change voice here
speed=1, split_pattern=r'\n+'
)
# Alternatively, load voice tensor directly:
# voice_tensor = torch.load('path/to/voice.pt', weights_only=True)
# generator = pipeline(
# text, voice=voice_tensor,
# speed=1, split_pattern=r'\n+'
# )
for i, (gs, ps, audio) in enumerate(generator):
print(i) # i => index
print(gs) # gs => graphemes/text
print(ps) # ps => phonemes
display(Audio(data=audio, rate=24000, autoplay=i==0))
sf.write(f'{i}.wav', audio, 24000) # save each audio fileTo install espeak-ng on Windows:
- Go to espeak-ng releases
- Click on Latest release
- Download the appropriate
*.msifile (e.g. espeak-ng-20191129-b702b03-x64.msi) - Run the downloaded installer
For advanced configuration and usage on Windows, see the official espeak-ng Windows guide
On Mac M1/M2/M3/M4 devices, you can explicitly specify the environment variable PYTORCH_ENABLE_MPS_FALLBACK=1 to enable GPU acceleration.
PYTORCH_ENABLE_MPS_FALLBACK=1 python run-your-kokoro-script.pyUse the following conda environment.yml if you're facing any dependency issues.
name: kokoro
channels:
- defaults
dependencies:
- python==3.9
- libstdcxx~=12.4.0 # Needed to load espeak correctly. Try removing this if you're facing issues with Espeak fallback.
- pip:
- kokoro>=0.3.1
- soundfile
- misaki[en]- 🛠️ @yl4579 for architecting StyleTTS 2.
- 🏆 @Pendrokar for adding Kokoro as a contender in the TTS Spaces Arena.
- 📊 Thank you to everyone who contributed synthetic training data.
- ❤️ Special thanks to all compute sponsors.
- 👾 Discord server: https://discord.gg/QuGxSWBfQy
- 🪽 Kokoro is a Japanese word that translates to "heart" or "spirit". Kokoro is also a character in the Terminator franchise along with Misaki.
