<a href="https://colab.research.google.com/github/Troyanovsky/awesome-TTS-Colab/blob/main/Auralis_xTTS.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 🗣️ Auralis xTTS Google Colab

## 📄 Description  
This Colab notebook uses Auralis, a TTS model based on xTTS V2, to generate speech from text. It has efficient long-text generation and streaming.

**Languages supported**: English (en), Spanish (es), French (fr), German (de), Italian (it), Portuguese (pt), Polish (pl), Turkish (tr), Russian (ru), Dutch (nl), Czech (cs), Arabic (ar), Chinese (zh-cn), Japanese (ja), Hungarian (hu), Korean (ko) Hindi (hi)  
**Capabilities**: Text-to-speech, Predefined Voices, Multi-lingual, Voice Cloning

---

## How to use

- Follow the instructions from the comments to change the text_to_generate
- Run all cells in the section you need
- Follow instructions to upload reference file
- The generated output will be in `output.wav`

---

## 🔗 Resources

- **GitHub Repository:** https://github.com/astramind-ai/Auralis/tree/main  
- **Model Availability:** https://huggingface.co/AstraMindAI/xttsv2

---

## 🎙️ Explore More TTS Models  
Want to try out additional TTS models? Check out the curated collection here:  
👉 [awesome-TTS-Colab](https://github.com/Troyanovsky/awesome-TTS-Colab)


## Text-to-speech with voice cloning

In [7]:
!pip install auralis
!apt-get update
!apt-get install -y ffmpeg portaudio19-dev
!pip install ffmpeg-python sounddevice

Hit:1 https://cloud.r-project.org/bin/linux/ubuntu jammy-cran40/ InRelease
Hit:2 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64  InRelease
Hit:3 http://archive.ubuntu.com/ubuntu jammy InRelease
Hit:4 http://security.ubuntu.com/ubuntu jammy-security InRelease
Get:5 http://archive.ubuntu.com/ubuntu jammy-updates InRelease [128 kB]
Hit:6 https://r2u.stat.illinois.edu/ubuntu jammy InRelease
Hit:7 https://ppa.launchpadcontent.net/deadsnakes/ppa/ubuntu jammy InRelease
Hit:8 https://ppa.launchpadcontent.net/graphics-drivers/ppa/ubuntu jammy InRelease
Hit:9 http://archive.ubuntu.com/ubuntu jammy-backports InRelease
Hit:10 https://ppa.launchpadcontent.net/ubuntugis/ppa/ubuntu jammy InRelease
Fetched 128 kB in 2s (84.4 kB/s)
Reading package lists... Done
W: Skipping acquire of configured file 'main/source/Sources' as repository 'https://r2u.stat.illinois.edu/ubuntu jammy InRelease' does not seem to provide it (sources.list entry misspelt?)
Reading package lists... Don

In [1]:
# Upload a file as reference audio file

from google.colab import files
import ffmpeg
import os

# Prompt user to upload a file
uploaded = files.upload()

# Process the uploaded file
for filename in uploaded.keys():
    print(f'User uploaded file "{filename}"')
    output_filename = "reference.wav"

    # Use ffmpeg-python to convert the file to WAV
    try:
        (
            ffmpeg
            .input(filename)
            .output(output_filename, acodec='pcm_s16le', ar='16000')
            .run(overwrite_output=True)
        )
        print(f'Converted "{filename}" to "{output_filename}"')
    except ffmpeg.Error as e:
        print("Error during conversion:", e.stderr.decode())

Saving trump_promptvn.wav to trump_promptvn (4).wav
User uploaded file "trump_promptvn (4).wav"
Converted "trump_promptvn (4).wav" to "reference.wav"


In [2]:
languages = ["en", "es", "fr", "de", "it", "pt", "pl", "tr", "ru", "nl", "cs", "ar", "zh-cn", "ja", "hu", "ko", "hi"]
text_to_generate = "Hello Earth! This is Auralis speaking." # Replace with the text you want to generate
language = "en"

In [None]:
import asyncio
import nest_asyncio
from auralis import TTS, AudioPreprocessingConfig
from auralis.common.definitions.requests import TTSRequest
from IPython.display import Audio, display

# Allow nested async in Colab
nest_asyncio.apply()

# Initialize model
tts = TTS()
tts.from_pretrained("AstraMindAI/xttsv2", gpt_model='AstraMindAI/xtts2-gpt')

# Async function to generate speech
async def generate_speech(text_to_generate, reference_audio):
    request = TTSRequest(
        text=text_to_generate,
        speaker_files=[reference_audio],
        audio_config=AudioPreprocessingConfig(
            normalize=True,
            trim_silence=True,
            enhance_speech=True,
            enhance_amount=1.5
        ),
        language=language
    )

    output = await tts.generate_speech_async(request)
    output.save("output.wav")

# Run the coroutine
await generate_speech(text_to_generate, "reference.wav")

# Play result
display(Audio("output.wav"))
