We will import default settings from brainbox and setup tortoise-tts in accordance with them. 


In [1]:
from kaia.brainbox import BrainBox

settings = BrainBox().settings.tortoise_tts

First, we will checkout the repository. Change `ConsoleExecutor.wait` to true, if you experience any error and/or need to evaluate the output of the console command.

In [4]:
from kaia.infra.demos import ConsoleExecutor
from kaia.infra import Loc

ConsoleExecutor.wait = False

if not settings.tortoise_tts_path.is_dir():
    ConsoleExecutor.execute(f'git clone https://github.com/neonbjb/tortoise-tts.git {settings.tortoise_tts_path}')

Then, we will create an environment for tortoise-tts and install:
- `torch`, `torchvision` and `torchaudio` as required by tortoise tts
- `flask` for `brainbox` to work
- `notebook` for debugging, if necessary
- `tortoise-tts` itself

In [5]:
import os


if not os.path.isfile(settings.python_path):
    cmd = f'''
    {Loc.call_conda} remove --name  {settings.environment} --all -y
    {Loc.call_conda} create --name {settings.environment} python=3.8 -y
    {Loc.call_conda} activate {settings.environment}
    pip install torch==2.0.0 torchvision==0.15.1 torchaudio==2.0.1 --index-url https://download.pytorch.org/whl/cu117
    pip install notebook
    pip install flask
    pip install -e {settings.tortoise_tts_path}
    '''

    ConsoleExecutor.execute(cmd)

We will use Line voice, available at https://dota2.fandom.com/wiki/Line/Responses for demonstration. The files are located in `files/voice` folder.

Picking voices from Dota is not the best choise, as the samples are too short, and for many voices, the effects applied also seem to produce negative artifacts on the audio.

In [6]:
from yo_fluq_ds import *

Query.folder('files/voice').to_list()

[WindowsPath('files/voice/Vo_lina_lina_battlebegins_01.mp3.mpeg'),
 WindowsPath('files/voice/Vo_lina_lina_cm_02.mp3.mpeg'),
 WindowsPath('files/voice/Vo_lina_lina_kill_08.mp3.mpeg'),
 WindowsPath('files/voice/Vo_lina_lina_purch_03.mp3.mpeg'),
 WindowsPath('files/voice/Vo_lina_lina_rare_01.mp3.mpeg'),
 WindowsPath('files/voice/Vo_lina_lina_rare_02.mp3.mpeg'),
 WindowsPath('files/voice/Vo_lina_lina_rare_03.mp3.mpeg'),
 WindowsPath('files/voice/Vo_lina_lina_rare_04.mp3.mpeg'),
 WindowsPath('files/voice/Vo_lina_lina_respawn_08.mp3.mpeg'),
 WindowsPath('files/voice/Vo_lina_lina_respawn_11.mp3.mpeg'),
 WindowsPath('files/voice/Vo_lina_lina_rival_01.mp3.mpeg'),
 WindowsPath('files/voice/Vo_lina_lina_rival_04.mp3.mpeg'),
 WindowsPath('files/voice/Vo_lina_lina_rival_15.mp3.mpeg')]

We need to recode files to wav.

In [7]:
import os
import shutil

ffmpeg_path = {Loc.root_folder.parent/"ffmpeg/bin/ffmpeg"}
voice_folder = settings.get_voice_path(settings.test_voice)


if not os.path.isdir(voice_folder):
    os.makedirs(voice_folder, exist_ok=True)

    cmd = ''
    for index, file in enumerate(Query.folder('files/voice')):
        cmd+=f'{Loc.root_folder.parent/"ffmpeg/bin/ffmpeg"} -i "{file}" -ar 22050 {voice_folder/f"{index}.wav"} -y\n'

    ConsoleExecutor.execute(cmd)

Now, we can run tortoise-tts. When you do this for the first time, it takes a lot of time, as `tortoise-tts` needs to download models.

In [9]:
decider = BrainBox().create_deciders_dict()['TortoiseTTS']
with decider.debug(None):
    results = decider(text='Hello, my name is Lina, nice to meet you', voice='lina')

In [17]:
from ipywidgets import Audio, VBox

Query.en(results).select(lambda z: Audio.from_file(decider.file_cache/z, autoplay=False)).feed(list, VBox)

VBox(children=(Audio(value=b'RIFFH\xc0\x04\x00WAVEfmt \x10\x00\x00\x00\x03\x00\x01\x00\xc0]\x00\x00\x00w\x01\x…