<a href="https://colab.research.google.com/github/Troyanovsky/awesome-TTS-Colab/blob/main/xTTS.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 🗣️ xTTS Google Colab

## 📄 Description  
This Colab notebook uses xTTS V2 to generate speech from text. It provides natural text-to-speech and voice-cloning with short reference audio.  

**Languages supported**: English (en), Spanish (es), French (fr), German (de), Italian (it), Portuguese (pt), Polish (pl), Turkish (tr), Russian (ru), Dutch (nl), Czech (cs), Arabic (ar), Chinese (zh-cn), Japanese (ja), Hungarian (hu), Korean (ko) Hindi (hi)  
**Capabilities**: Text-to-speech, Predefined Voices, Multi-lingual, Voice Cloning

---

## 🔗 Resources

- **GitHub Repository:** https://github.com/coqui-ai/TTS, https://github.com/idiap/coqui-ai-TTS (Original Coqui TTS is no longer maintained as Coqui shut down in 2023.)  
- **Model Availability:** https://huggingface.co/coqui/XTTS-v2

---

## 🎙️ Explore More TTS Models  
Want to try out additional TTS models? Check out the curated collection here:  
👉 [awesome-TTS-Colab](https://github.com/Troyanovsky/awesome-TTS-Colab)


## Text-to-speech

In [1]:
!pip install coqui-tts==0.26.1
# Original TTS is no longer maintained as Coqui TTS shut down in 2023

Collecting coqui-tts==0.26.1
  Downloading coqui_tts-0.26.1-py3-none-any.whl.metadata (19 kB)
Collecting anyascii>=0.3.0 (from coqui-tts==0.26.1)
  Downloading anyascii-0.3.2-py3-none-any.whl.metadata (1.5 kB)
Collecting coqpit-config<0.3.0,>=0.2.0 (from coqui-tts==0.26.1)
  Downloading coqpit_config-0.2.0-py3-none-any.whl.metadata (11 kB)
Collecting coqui-tts-trainer<0.3.0,>=0.2.0 (from coqui-tts==0.26.1)
  Downloading coqui_tts_trainer-0.2.3-py3-none-any.whl.metadata (8.1 kB)
Collecting encodec>=0.1.1 (from coqui-tts==0.26.1)
  Downloading encodec-0.1.1.tar.gz (3.7 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.7/3.7 MB[0m [31m43.6 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting gruut>=2.4.0 (from gruut[de,es,fr]>=2.4.0->coqui-tts==0.26.1)
  Downloading gruut-2.4.0.tar.gz (85 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m85.3/85.3 kB[0m [31m7.5 MB/s[0m eta [36m0:00:00[0m
[?25h  Pr

In [2]:
TEXT = "This is text to speech generated by XTTS" # Change to the text you want to generate

In [4]:
import torch
from TTS.api import TTS
from IPython.display import Audio

device = "cuda" if torch.cuda.is_available() else "cpu"

tts = TTS("tts_models/multilingual/multi-dataset/xtts_v2").to(device)

print(tts.speakers)

 > You must confirm the following:
 | > "I have purchased a commercial license from Coqui: licensing@coqui.ai"
 | > "Otherwise, I agree to the terms of the non-commercial CPML: https://coqui.ai/cpml" - [y/n]
 | | > y


100%|█████████▉| 1.87G/1.87G [00:29<00:00, 75.2MiB/s]
100%|██████████| 1.87G/1.87G [00:30<00:00, 61.1MiB/s]
100%|██████████| 4.37k/4.37k [00:00<00:00, 20.4kiB/s]
 25%|██▌       | 91.1k/361k [00:00<00:00, 896kiB/s]
100%|██████████| 361k/361k [00:00<00:00, 462kiB/s] 
100%|██████████| 32.0/32.0 [00:00<00:00, 100iB/s]
100%|██████████| 7.75M/7.75M [00:18<00:00, 18.9MiB/s]

['Claribel Dervla', 'Daisy Studious', 'Gracie Wise', 'Tammie Ema', 'Alison Dietlinde', 'Ana Florence', 'Annmarie Nele', 'Asya Anara', 'Brenda Stern', 'Gitta Nikolina', 'Henriette Usha', 'Sofia Hellen', 'Tammy Grit', 'Tanja Adelina', 'Vjollca Johnnie', 'Andrew Chipper', 'Badr Odhiambo', 'Dionisio Schuyler', 'Royston Min', 'Viktor Eka', 'Abrahan Mack', 'Adde Michal', 'Baldur Sanjin', 'Craig Gutsy', 'Damien Black', 'Gilberto Mathias', 'Ilkin Urbano', 'Kazuhiko Atallah', 'Ludvig Milivoj', 'Suad Qasim', 'Torcull Diarmuid', 'Viktor Menelaos', 'Zacharie Aimilios', 'Nova Hogarth', 'Maja Ruoho', 'Uta Obando', 'Lidiya Szekeres', 'Chandra MacFarland', 'Szofi Granger', 'Camilla Holmström', 'Lilya Stainthorpe', 'Zofija Kendrick', 'Narelle Moon', 'Barbora MacLean', 'Alexandra Hisakawa', 'Alma María', 'Rosemary Okafor', 'Ige Behringer', 'Filip Traverse', 'Damjan Chapman', 'Wulf Carlevaro', 'Aaron Dreschner', 'Kumar Dahl', 'Eugenio Mataracı', 'Ferran Simen', 'Xavier Hayasaka', 'Luis Moray', 'Marcos Ru

In [5]:
# Text to speech to a file
tts.tts_to_file(text=TEXT, language="en", file_path="output.wav", speaker="Claribel Dervla")

Audio("output.wav")

## Voice Cloning with Reference Audio

In [None]:
TEXT = "This is text to speech generated by XTTS" # Change to the text you want to generate

In [6]:
# Run everything in the previous section, except the last cell.
# Run the following instead

from google.colab import files
import os

print("Please upload your reference audio file.")
uploaded = files.upload()

# Get the uploaded file name
reference_audio = list(uploaded.keys())[0]

# Rename the uploaded file to reference_audio.wav
os.rename(reference_audio, "reference_audio.wav")
print("File uploaded and renamed to reference_audio.wav")

# Text to speech to a file, using a reference input audio
tts.tts_to_file(text=TEXT,
                language="en",
                file_path="output.wav",
                speaker_wav="reference_audio.wav")

Audio("output.wav")

Please upload your reference audio file.


Saving trump_promptvn.wav to trump_promptvn.wav
File uploaded and renamed to reference_audio.wav
