1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
-
Updated
Apr 25, 2025 - Python
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
Finetune Qwen3, Llama 4, TTS, DeepSeek-R1 & Gemma 3 LLMs 2x faster with 70% less memory! 🦥
A generative speech model for daily dialogue.
🚀AI拟声: 5秒内克隆您的声音并生成任意语音内容 Clone a voice in 5 seconds to generate arbitrary speech in real-time
Instant voice cloning by MIT and MyShell. Audio foundation model.
A TTS model capable of generating ultra-realistic dialogue in one pass.
🧠 Leon is your open-source personal assistant.
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
Translate the video from one language to another and add dubbing. 将视频从一种语言翻译为另一种语言,同时支持语音识别转录、语音合成、字幕翻译。
🤖 💬 Deep learning for Text to Speech (Discussion forum: https://discourse.mozilla.org/c/tts)
End-to-End Speech Processing Toolkit
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.
A fast, local neural text to speech system
Use Microsoft Edge's online text-to-speech service from Python WITHOUT needing Microsoft Edge or Windows or an API key
EmotiVoice 😊: a Multi-Voice and Prompt-Controlled TTS Engine
An open source implementation of Microsoft's VALL-E X zero-shot TTS model. Demo is available in https://plachtaa.github.io/vallex/
VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech
Speech-to-text, text-to-speech, speaker diarization, speech enhancement, and VAD using next-gen Kaldi with onnxruntime without Internet connection. Support embedded systems, Android, iOS, HarmonyOS, Raspberry Pi, RISC-V, x86_64 servers, websocket server/client, support 11 programming languages
High-quality multi-lingual text-to-speech library by MyShell.ai. Support English, Spanish, French, Chinese, Japanese and Korean.
Add a description, image, and links to the text-to-speech topic page so that developers can more easily learn about it.
To associate your repository with the text-to-speech topic, visit your repo's landing page and select "manage topics."