🗣️ Open TTS Tracker

A one stop shop to track all open-access/ source TTS models as they come out. Feel free to make a PR for all those that aren't linked here.

This is aimed as a resource to increase awareness for these models and to make it easier for researchers, developers, and enthusiasts to stay informed about the latest advancements in the field.

Note

This repo will only track open source/access codebase TTS models. More motivation for everyone to open-source! 🤗

Name	GitHub 💻	Weights ⚖	License 🧾	Fine-tune 👤	Languages	Paper 📄	Demo 🗣️	Issues 📚	Processor ⚡	Word pronunciation adjustment 👄	Emotional control 🎭	Audio control 🎚	S2S support 🦜
XTTS	Repo	🤗 Hub	CPML	Yes	Multilingual	Technical notes	🤗 Space
TorToiSe TTS	Repo	🤗 Hub	Apache 2.0	Yes	English	Technical report	🤗 Space
VITS/ MMS-TTS	Repo	🤗 Hub / MMS	Apache 2.0	Yes	English	Paper	🤗 Space
Pheme	Repo	🤗 Hub	CC-BY	Yes	English	Paper	🤗 Space
OpenVoice	Repo	🤗 Hub	CC-BY-NC 4.0	No	ZH + EN	Paper	🤗 Space
IMS-Toucan	Repo	GH release	Apache 2.0	Yes	Multilingual	Paper	🤗 Space
Matcha-TTS	Repo	GDrive	MIT	Yes	English	Paper	🤗 Space	GPL-licensed phonemizer
pflowTTS	Unofficial Repo	GDrive	MIT	Yes	English	Paper	Not Available	GPL-licensed phonemizer
StyleTTS 2	Repo	🤗 Hub	MIT	Yes	English	Paper	🤗 Space	GPL-licensed phonemizer
VALL-E	Unofficial Repo	Not Available	MIT	Yes	NA	Paper	Not Available
HierSpeech++	Repo	GDrive	CC-BY-NC-SA 4.0	No	KR + EN	Paper	🤗 Space
Bark	Repo	🤗 Hub	MIT	No	Multilingual	Paper	🤗 Space
EmotiVoice	Repo	GDrive	Apache 2.0	Yes	ZH + EN	Not Available	Not Available	Separate GUI agreement
Amphion	Repo	🤗 Hub	MIT	No	Multilingual	Paper	🤗 Space
xVASynth	Repo	GH commit	GPL-3.0	Yes	Multilingual	Paper	Not Available	Copyright materials used for training.	CPU / CUDA	ARPAbet	4-type 😡😃 😭😯 per-phoneme	speed / pitch / energy 🎚 per-phoneme	🦜
OverFlow TTS	Repo	GitHub	MIT	Yes	English	Paper	GH Pages
Neural-HMM TTS	Repo	GitHub	MIT	Yes	English	Paper	GH Pages
Tacotron 2	Unofficial Repo	GDrive	BSD-3	Yes	English	Paper	Webpage
Glow-TTS	Repo	GDrive	MIT	Yes	English	Paper	GH Pages
Silero	Repo	GH links	CC BY-NC-SA	No	EM + DE + ES + EA	Not Available	Not Available	Non Commercial
MahaTTS	Repo	🤗 Hub	Apache 2.0	No	English, Hindi, Indian English, Bengali, Tamil, Telugu, Punjabi, Marathi, Gujarati, Assamese	Not Available	Recordings, Colab

Capability specifics

Name	Processor ⚡	Phonetic alphabet 👄	Emotional control 🎭	Speech control 🎚	S2S support 🦜
XTTS
TorToiSe TTS
VITS/ MMS-TTS
Pheme
OpenVoice
IMS-Toucan
Matcha-TTS
pflowTTS
StyleTTS 2
VALL-E
HierSpeech++
Bark
EmotiVoice
Amphion
xVASynth	CPU / CUDA	ARPAbet	4-type 🎭 😡😃😭😯 per‑phoneme	speed / pitch / energy / 🎭 🎚 per‑phoneme	🦜
OverFlow TTS
Neural-HMM TTS
Tacotron 2
Glow-TTS
Silero
MahaTTS

Processor - CPU/CUDA/ROCm (single/multi)
Phonetic alphabet - None/IPA/ARPAbet/ (Phonetic transcription that allows to control pronunciation of certain words)
Insta-clone - Yes/No (Quick voice clone using a few audio samples)
Emotional control - Yes/Strict/No (Strict, as in has no ability to go in-between states)
Prompting - Yes/No (A side effect of narrator based datasets and a way to affect the emotional state, ElevenLabs docs)
Streaming support - Yes/No (If it is possible to playback audio that is still being generated)
Speech control - speed/pitch/ (Ability to change the pitch, duration, energy and/or emotion of generated speech)
Speech-To-Speech support - Yes/No (Streaming support implies real-time S2S)

How can you help?

Help make this list more complete. Create demos on the Hugging Face Hub and link them here :) Got any questions? Drop me a DM on Twitter @reach_vb.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

🗣️ Open TTS Tracker

How can you help?

Files

README.md

Latest commit

History

README.md

File metadata and controls

🗣️ Open TTS Tracker

How can you help?