Skip to content

Latest commit

 

History

History
74 lines (65 loc) · 11.9 KB

README.md

File metadata and controls

74 lines (65 loc) · 11.9 KB

🗣️ Open TTS Tracker

A one stop shop to track all open-access/ source TTS models as they come out. Feel free to make a PR for all those that aren't linked here.

This is aimed as a resource to increase awareness for these models and to make it easier for researchers, developers, and enthusiasts to stay informed about the latest advancements in the field.

Note

This repo will only track open source/access codebase TTS models. More motivation for everyone to open-source! 🤗

Name GitHub 💻 Weights ⚖ License 🧾 Fine-tune 👤 Languages Paper 📄 Demo 🗣️ Issues 📚 Processor ⚡ Word pronunciation adjustment 👄 Insta-clone 👥 Emotional control 🎭 Prompting 📖 Streaming support 🌊 Audio control 🎚 S2S support 🦜
XTTS Repo 🤗 Hub CPML Yes Multilingual Technical notes 🤗 Space
TorToiSe TTS Repo 🤗 Hub Apache 2.0 Yes English Technical report 🤗 Space
VITS/ MMS-TTS Repo 🤗 Hub / MMS Apache 2.0 Yes English Paper 🤗 Space
Pheme Repo 🤗 Hub CC-BY Yes English Paper 🤗 Space
OpenVoice Repo 🤗 Hub CC-BY-NC 4.0 No ZH + EN Paper 🤗 Space
IMS-Toucan Repo GH release Apache 2.0 Yes Multilingual Paper 🤗 Space
Matcha-TTS Repo GDrive MIT Yes English Paper 🤗 Space GPL-licensed phonemizer
pflowTTS Unofficial Repo GDrive MIT Yes English Paper Not Available GPL-licensed phonemizer
StyleTTS 2 Repo 🤗 Hub MIT Yes English Paper 🤗 Space GPL-licensed phonemizer
VALL-E Unofficial Repo Not Available MIT Yes NA Paper Not Available
HierSpeech++ Repo GDrive CC-BY-NC-SA 4.0 No KR + EN Paper 🤗 Space
Bark Repo 🤗 Hub MIT No Multilingual Paper 🤗 Space
EmotiVoice Repo GDrive Apache 2.0 Yes ZH + EN Not Available Not Available Separate GUI agreement
Amphion Repo 🤗 Hub MIT No Multilingual Paper 🤗 Space
xVASynth Repo GH commit GPL-3.0 Yes Multilingual Paper Not Available Copyright materials used for training. CPU / CUDA ARPAbet 4-type
😡😃
😭😯 per-phoneme
speed / pitch / energy
🎚
per-phoneme
🦜
OverFlow TTS Repo GitHub MIT Yes English Paper GH Pages
Neural-HMM TTS Repo GitHub MIT Yes English Paper GH Pages
Tacotron 2 Unofficial Repo GDrive BSD-3 Yes English Paper Webpage
Glow-TTS Repo GDrive MIT Yes English Paper GH Pages
Silero Repo GH links CC BY-NC-SA No EM + DE + ES + EA Not Available Not Available Non Commercial
MahaTTS Repo 🤗 Hub Apache 2.0 No English, Hindi, Indian English, Bengali, Tamil, Telugu, Punjabi, Marathi, Gujarati, Assamese Not Available Recordings, Colab
Capability specifics
Name Processor
Phonetic alphabet
👄
Insta-clone
👥
Emotional control
🎭
Prompting
📖
Streaming support
🌊
Speech control
🎚
S2S support
🦜
XTTS
TorToiSe TTS
VITS/ MMS-TTS
Pheme
OpenVoice
IMS-Toucan
Matcha-TTS
pflowTTS
StyleTTS 2
VALL-E
HierSpeech++
Bark
EmotiVoice
Amphion
xVASynth CPU / CUDA ARPAbet 4-type 🎭
😡😃😭😯
per‑phoneme
speed / pitch / energy / 🎭
🎚
per‑phoneme
🦜
OverFlow TTS
Neural-HMM TTS
Tacotron 2
Glow-TTS
Silero
MahaTTS
  • Processor - CPU/CUDA/ROCm (single/multi)
  • Phonetic alphabet - None/IPA/ARPAbet/ (Phonetic transcription that allows to control pronunciation of certain words)
  • Insta-clone - Yes/No (Quick voice clone using a few audio samples)
  • Emotional control - Yes/Strict/No (Strict, as in has no ability to go in-between states)
  • Prompting - Yes/No (A side effect of narrator based datasets and a way to affect the emotional state, ElevenLabs docs)
  • Streaming support - Yes/No (If it is possible to playback audio that is still being generated)
  • Speech control - speed/pitch/ (Ability to change the pitch, duration, energy and/or emotion of generated speech)
  • Speech-To-Speech support - Yes/No (Streaming support implies real-time S2S)

How can you help?

Help make this list more complete. Create demos on the Hugging Face Hub and link them here :) Got any questions? Drop me a DM on Twitter @reach_vb.