<a href="https://colab.research.google.com/github/Les-El/TorToiSe-Colab-Drive/blob/main/tortoise_tts_with_long_text_colab_drive.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Welcome to Tortoise! 🐢🐢🐢🐢**
## **"Colab + Drive UI Edition"**

### About this notebook

TorToiSe was developed by James Betker: https://github.com/neonbjb. It is a text-to-speech program built with the following priorities: Strong multi-voice capabilities, and highly realistic prosody and intonation. The repo contains all the code needed to run Tortoise TTS in inference mode, plus more like examples and documentation.

A YouTube video by Martin Thissen presented a Tortoise notebook which could handle much longer text files: https://youtu.be/FN3yxL0Rr0c. Please check it out for more details, and for an interface that isn't Google Drive dependent.

This notebook here, the "Colab + Drive UI Edition" is built on that notebook from YouTube, and is meant to be novice friendly. It is desgined to be run from top-to-bottom, once all settings have been chosen. 

If you have any questions or requests about the notebook, find me at https://discordapp.com/users/Lester#0973. The is a side/learning project for me, so I can't promise to address all issues. But I enjoy developing this notebook, so don't be shy!

One more note from James Betker:
> There's a reason this is called "Tortoise" - this model takes up to a minute to perform inference for a single sentence on a GPU. Expect waits on the order of hours on a CPU.

# **0. Instructions**

##  
1.   *Before you begin, I **strongly** recommend you turn on a GPU runtime. (If you have Colab Pro, this is automatic.)*
2.   Upload a UTF-8 encoded .txt file into the Google Drive folder AI/Tortoise_TTS/Text.
    *    You can make this folder yourself, or let the "Check GPU & Mount Google Drive" cell do it for you.
3.    In Notebook Settings, supply the name of your .txt file.
4.    Choose 1 or 2 preset voices to read your text. 
    *    (If you choose 2 voices, Tortoise will attempt to blend them into a new voice. But sometimes it will just randomly switch back and forth between the voices in entertaining ways.)
5.    Or, select "create custom voice" to create a new voice from your own audio file.
6.    Choose your processing speed: Higher-Quality is the slowest. "Fast" is pretty darn good!
7.    Choose a whole number for a "seed number," or just leave it random.
8.    If you are creating a new voice:
    *    Upload a 1-2 minute voice sample as a .wav file (with floating point format and a 22,050 sample rate) to your Google Drive in AI/Tortoise_TTS/Voices/Samples
    *    (**DEV TO-DO**: MAKE UPLOADING VOICE SAMPLE EASIER FOR USERS)
    *    In the "Optional - Upload Audio" cell, input the name of the .wav file you are using, and the name you want to give your new voice.
9.    **Run all!**

# **1. Check GPU & Mount Google Drive**
This process will create the needed folders in your Google Drive if not already present

In [None]:
import torch

# Check if GPU is available
if torch.cuda.is_available():
    device = torch.device("cuda")
    
    # Get GPU name
    gpu_name = torch.cuda.get_device_name(0)
    
    # Get GPU memory usage
    total_memory = torch.cuda.get_device_properties(0).total_memory / (1024 ** 3)
    allocated_memory = torch.cuda.memory_allocated(0) / (1024 ** 3)
    reserved_memory = torch.cuda.memory_reserved(0) / (1024 ** 3)
    
    print(f"Device: {device}")
    print(f"GPU: {gpu_name}")
    print(f"Total memory (GB): {total_memory:.2f}")
    print(f"Allocated memory (GB): {allocated_memory:.2f}")
    print(f"Reserved memory (GB): {reserved_memory:.2f}")
else:
    print("No GPU available.")


# Mount Google Drive
from google.colab import drive

drive.mount('/content/drive')

# Create file structure in Google Drive
!mkdir -p '/content/drive/MyDrive/AI/Tortoise_TTS/Models'
!mkdir -p '/content/drive/MyDrive/AI/Tortoise_TTS/Text'
!mkdir -p '/content/drive/MyDrive/AI/Tortoise_TTS/Samples'
!mkdir -p '/content/drive/MyDrive/AI/Tortoise_TTS/Voices'
!mkdir -p '/content/drive/MyDrive/AI/Tortoise_TTS/Voices/Output'
!mkdir -p '/content/drive/MyDrive/AI/Tortoise_TTS/Voices/Samples'


# **2. Notebook Settings**
Upload a UTF-8 encoded .txt file into the folder:
MyDrive/AI/Tortoise_TTS/Text

In [None]:
#@markdown #####**Choose your text file:**
text_file = "test10.txt" #@param {type:"string"}
#@markdown ---
#@markdown #####**Choose your voice:**
voice_name = "\"CUSTOM_VOICE_NAME\"" #@param ['CUSTOM_VOICE_NAME','weaver', 'train_empire', 'applejack', 'william', 'freeman', 'pat2', 'myself', 'tim_reynolds', 'train_mouse', 'halle', 'deniro', 'geralt', 'mol', 'pat', 'train_lescault', 'daniel', 'train_daws', 'train_atkins', 'jlaw', 'tom', 'emma', 'train_dotrice', 'train_dreams', 'train_grace', 'angie', 'train_kennard', 'rainbow', 'snakes', 'lj'] {allow-input: true}
#@markdown #####**Choose your second voice (if you want to combine voices):**
second_voice_name = "None" #@param ['None', 'weaver', 'train_empire', 'applejack', 'william', 'freeman', 'pat2', 'myself', 'tim_reynolds', 'train_mouse', 'halle', 'deniro', 'geralt', 'mol', 'pat', 'train_lescault', 'daniel', 'train_daws', 'train_atkins', 'jlaw', 'tom', 'emma', 'train_dotrice', 'train_dreams', 'train_grace', 'angie', 'train_kennard', 'rainbow', 'snakes', 'lj'] {allow-input: true}
#@markdown ---
#@markdown #####**Check below if you want to upload .wav files to make a custom voice instead:**
#@markdown Check for more options in the **"Upload Audio"** cells
create_custom_voice = True #@param{type:"boolean"}
#@markdown ---
#@markdown #####**Choose your processing speed; default is 'fast':**
speed_setting = "standard" #@param ["ultra_fast", "fast", "standard", "high_quality"]
#@markdown ---
#@markdown ##### **Choose your seed:**
#@markdown ##### 'random_seed' is based on system clock
#@markdown ##### Any other entry must be an interger
from time import time
random_seed = int(time())
seed_setting = random_seed #@param ["random_seed"] {type:"raw", allow-input: true}

# #@markdown #####**Check below if you want the the models downloaded from HuggingFace to be stored on your Google Drive:**
# local_model_save = True #@param{type:"boolean"}

# If create_custom_voice is selected, force change other variables to avoid errors
if create_custom_voice:
    voice_name = "CUSTOM_VOICE_NAME"
    second_voice_name = "None"

# **3. Downloading and installing the Tortoise-TTS model**

## **3.1** Install SciPy and Tortoise-TTS


In [None]:
# the scipy version packaged with colab is not tolerant of misformated WAV files.
# install the latest version.


# Creates global variable
global tts_cloned
try:
    tts_cloned
except NameError:
    tts_cloned = False

if not tts_cloned:
  !pip3 install -U scipy
  !git clone https://github.com/jnordberg/tortoise-tts.git
  %cd tortoise-tts
  !pip3 install -r requirements.txt
  !pip3 install transformers==4.19.0 einops==0.5.0 rotary_embedding_torch==0.1.5 unidecode==1.3.5
  !python3 setup.py install
  tts_cloned = True

## **3.2** Initialize TextToSpeech Instance

In [None]:
# Imports used through the rest of the notebook.

try:
    tts_initialized
except NameError:
    tts_initialized = False

if not tts_initialized:
  import os
  import torchaudio
  import torch.nn as nn
  import torch.nn.functional as F
  import IPython

  from tortoise.api import TextToSpeech
  from tortoise.utils.audio import load_audio, load_voice, load_voices
  
  tts = TextToSpeech()

tts_initialized = True

# **4. Optional - Upload audios sample to create a custom voice**

## **4.1** Slice .wav file into samples
Upload a 1-2 minute voice sample as a .wav file with floating point format and a 22,050 sample rate

Place the file in your Google Drive in AI/Tortoise_TTS/Voices/Samples

In [None]:
# Set the path to your Google Drive folder
samples_folder = "/content/drive/MyDrive/AI/Tortoise_TTS/Voices/Samples"

#@markdown ##### **Enter the name of your .wav file:** 
#@markdown (place file in your Google Drive in AI/Tortoise_TTS/Voices/Samples)
voice_sample = "sample.wav" #@param ["sample.wav"] {allow-input: true}
#@markdown ---
#@markdown ##### **Give your new voice a name:** 
CUSTOM_VOICE_NAME = "John_Doe" #@param ["John_Doe"] {allow-input: true}

!pip install pydub
from pydub import AudioSegment
import os
import random
import shutil
from tortoise.utils.audio import get_voices

sound = AudioSegment.from_file(f"{samples_folder}/{voice_sample}", format="wav")

output_folder = "audio_segments"

os.makedirs(output_folder, exist_ok=True)

min_duration = 6000
max_duration = 10000

for i in range(5):
    start_time = random.randint(0, len(sound) - max_duration)
    end_time = start_time + random.randint(min_duration, max_duration)

    segment = sound[start_time:end_time]
    output_filename = f"output_{i+1}.wav"
    output_filepath = os.path.join(output_folder, output_filename)
    segment.export(output_filepath, format="wav")

    print(f"Segment {i+1} saved to: {output_filepath}")

custom_voice_folder = f"tortoise/voices/{CUSTOM_VOICE_NAME}"
os.makedirs(custom_voice_folder, exist_ok=True)

input_files = [f for f in os.listdir(output_folder) if f.endswith('.wav')]

for i, input_file in enumerate(input_files):
    input_filepath = os.path.join(output_folder, input_file)
    output_filepath = os.path.join(custom_voice_folder, f'{i}.wav')
    shutil.copy(input_filepath, output_filepath)
    print(f"Segment {i+1} copied to: {output_filepath}")

from tortoise.utils.audio import get_voices

voices = get_voices()

# Use input_files instead of custom_voice_files
voices[CUSTOM_VOICE_NAME] = [os.path.join(custom_voice_folder, f"{i}.wav") for i in range(len(input_files))]

from tortoise.utils import audio

audio.voices = voices


# **5. Generate longer speech - TTS Processing**

In [None]:
from tortoise.utils.text import split_and_recombine_text
from datetime import datetime

outpath = "results/longform/"
text_folder_path = '/content/drive/MyDrive/AI/Tortoise_TTS/Text'

textfile_path = os.path.join(text_folder_path, text_file)

# Process text
with open(textfile_path, 'r', encoding='utf-8') as f:
    text = ' '.join([l for l in f.readlines()])
    if '|' in text:
        print("Found the '|' character in your text, which I will use as a cue for where to split it up. If this was not"
              "your intent, please remove all '|' characters from the input.")
        texts = text.split('|')
    else:
        texts = split_and_recombine_text(text)

seed = seed_setting

voice_outpath = os.path.join(outpath, voice_name)
os.makedirs(voice_outpath, exist_ok=True)

if second_voice_name == "None":
    voice_samples, conditioning_latents = load_voice(CUSTOM_VOICE_NAME)
else:
    voice_samples, conditioning_latents = load_voices([CUSTOM_VOICE_NAME, second_voice_name])

all_parts = []
for j, text in enumerate(texts):
    gen = tts.tts_with_preset(text, voice_samples=voice_samples, conditioning_latents=conditioning_latents,
                              preset=speed_setting, k=1, use_deterministic_seed=seed)
    gen = gen.squeeze(0).cpu()
    torchaudio.save(os.path.join(voice_outpath, f'{j}.wav'), gen, 24000)
    all_parts.append(gen)

full_audio = torch.cat(all_parts, dim=-1)
torchaudio.save(os.path.join(voice_outpath, 'combined.wav'), full_audio, 24000)

# Save to Google Drive
# Define the folder structure in Google Drive
gdrive_folder = "/content/drive/MyDrive/AI/Tortoise_TTS/Output"

# Create the folder structure if it doesn't exist
if not os.path.exists(gdrive_folder):
    os.makedirs(gdrive_folder)

# Save the audio file to Google Drive
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
gdrive_outpath = f"{gdrive_folder}/{voice_name}_{timestamp}.wav"
torchaudio.save(gdrive_outpath, full_audio, 24000)
print(f"File saved to Google Drive: {gdrive_outpath}")


# **6 Audio Output**

In [None]:
IPython.display.Audio(os.path.join(voice_outpath, 'combined.wav'))

# **7. Troubleshooting codes**

In [None]:
#@markdown ##### **Check below if you want to enable troubleshooting** 
troubleshoot_cells = False #@param{type:"boolean"}

if troubleshoot_cells is True:
    # Print list of available voices
    from tortoise.utils.audio import get_voices

    available_voices = get_voices()
    print("Available voices:", list(available_voices.keys()))


    # Check to see if the output audio file exists, and here it is
    if os.path.exists(os.path.join(voice_outpath, 'combined.wav')):
        print(f"The 'combined.wav' file is available at: {os.path.join(voice_outpath, 'combined.wav')}")
    else:
        print("The 'combined.wav' file is not found.")


    # Download the error dump
    if os.path.exists(alignment_debug_file_path):
        current_working_directory = os.getcwd()
        alignment_debug_file = "alignment_debug.pth"
        alignment_debug_file_path = os.path.join(current_working_directory, alignment_debug_file)
        files.download(alignment_debug_file_path)


    # Check if the custom voice files are saved in the correct folder
    import glob

    custom_voice_files = glob.glob(f"{custom_voice_folder}/*.wav")
    print(f"Custom voice folder: {custom_voice_folder}")
    print(f"Number of custom voice files: {len(custom_voice_files)}")
    for file in custom_voice_files:
        print(file)
    

    # Check the available voices after adding the custom voice:
    from tortoise.utils.audio import get_voices

    voices = get_voices()
    print("Available voices:")
    for voice_name in voices:
        print(voice_name)


    # Check if the custom voice is in the list of available voices:
    if CUSTOM_VOICE_NAME in voices:
        print(f"{CUSTOM_VOICE_NAME} is in the list of available voices.")
    if CUSTOM_VOICE_NAME in voices:
        print(f"{CUSTOM_VOICE_NAME} is in the list of available voices.")


    # Print the custom voice's files from the list of available voices:
    print(f"{CUSTOM_VOICE_NAME} files in the available voices list:")
    for file in voices[CUSTOM_VOICE_NAME]:
        print(file)
