<a href="https://colab.research.google.com/github/MK316/workshops/blob/main/Voice_clone_with_tortoise_tts.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 🔐 Voice cloning using {Tortoise-TTS}

- Tried on Jan.25, 2023

* Original code source [neonbjb/tortoise-tts](https://github.com/neonbjb/tortoise-tts)
+ Use this tool responsibly! - The speech of deepfakes
+ Steps 1 & 2 may take a couple of minutes. Wait until the code runs completely and show a green mark.

In [1]:
#@markdown 🚩**Step 1.** Install {scipy}, {tortoise-tts}, {transformers}  

#@markdown The installation may take a minute.
# the scipy version packaged with colab is not tolerant of misformated WAV files.
# install the latest version.

%%capture
!pip3 install -U scipy

!git clone https://github.com/jnordberg/tortoise-tts.git

# Changing current working directory to 'tortoise-tts' (from git clone)
%cd tortoise-tts
!pip3 install transformers==4.19.0
!pip3 install -r requirements.txt
!python3 setup.py install

In [2]:
#@markdown 🚩 Step 2. Import packages, load models used by Tortoise from the HuggingFace 
# Imports used through the rest of the notebook.

%%capture
import torch
import torchaudio
import torch.nn as nn
import torch.nn.functional as F

import IPython

from tortoise.api import TextToSpeech
from tortoise.utils.audio import load_audio, load_voice, load_voices

# This will download all the models used by Tortoise from the HuggingFace hub.
tts = TextToSpeech()

In [4]:
#@markdown 🚩 Step 3. Upload your file (voice file):

#@markdown **Note:** Best works when uploading at least 2 audio files. They must be in wav format, 6 to 10 seconds long.
# Optionally, upload use your own voice by running the next two cells. I recommend
# you upload at least 2 audio clips. They must be a WAV file, 6-10 seconds long.
CUSTOM_VOICE_NAME = "martin"

import os
from google.colab import files

custom_voice_folder = f"tortoise/voices/{CUSTOM_VOICE_NAME}"
os.makedirs(custom_voice_folder)
for i, file_data in enumerate(files.upload().values()):
  with open(os.path.join(custom_voice_folder, f'{i}.wav'), 'wb') as f:
    f.write(file_data)

Saving mynumber.wav to mynumber.wav


In [8]:
#@markdown 🚩 Step 4. Type text to create with the voice clone. (Try one or two sentences as it takes some time to process.)
# This is the text that will be spoken.

text = input("Type text to create with your voice: ")

# Pick a "preset mode" to determine quality. Options: {"ultra_fast", "fast" (default), "standard", "high_quality"}. See docs in api.py

preset_mode = "fast" #@param = ["fast", "ultra_fast", "standard", "high-quality"]

preset = preset_mode



Type text to create with your voice: This presentation was very boring. Don't you agree with me?


In [9]:
#@markdown 💎 Generate speech: voice clone
# Generate speech with the custotm voice.
voice_samples, conditioning_latents = load_voice(CUSTOM_VOICE_NAME)
gen = tts.tts_with_preset(text, voice_samples=voice_samples, conditioning_latents=conditioning_latents, 
                          preset=preset)
torchaudio.save(f'generated-{CUSTOM_VOICE_NAME}.wav', gen.squeeze(0).cpu(), 24000)
IPython.display.Audio(f'generated-{CUSTOM_VOICE_NAME}.wav')

Generating autoregressive samples..


100%|██████████| 6/6 [00:16<00:00,  2.80s/it]


Computing best candidates using CLVP and CVVP


100%|██████████| 6/6 [00:06<00:00,  1.02s/it]


Transforming autoregressive outputs into audio..


100%|██████████| 80/80 [00:10<00:00,  7.90it/s]


## 🎬 Video tutorial

How to clone any voice with AI [Channel by Martine Thissen](https://youtu.be/Kfr_FZof_hs)

In [None]:

#@markdown This is one of the tutorials on Youtube.
from IPython.display import YouTubeVideo, display
video = YouTubeVideo("Kfr_FZof_hs", width=500)
display(video)