### Hello there! 👋

If you're interested in using Roboshaul to generate Hebrew text-to-speech, you've come to the right place! I'll guide you through the steps so that you can start using it in no time, even if you're new to machine learning.

Here are the steps we'll follow in this tutorial:

1. Import necessary Python libraries
2. Download the trained version of the Roboshaul TTS model
3. Download the trained version of the spectrogram-to-wav model, trained on Shaul Amsterdamski's voice
4. Connect all the components and test the system by generating Hebrew text and hearing Roboshaul speak it out loud

Let's get started! in the end you'll be able to use our trained model, and have results similar to the ones in this demo page:
https://anonymous19283746.github.io/saspeech/

The infratructure we will be using is Coqui TTS
and you can learn more about it here: 
https://github.com/coqui-ai/TTS

In [None]:
!git clone https://github.com/shenberg/TTS
!pip install Cython # necessary for successful install of Coqui TTS
!pip install -e TTS

#### Import necessary Python libraries

In [None]:
import os
import subprocess
from pathlib import Path
from IPython.display import Audio

#### Download the trained version of the Roboshaul TTS model
Trained on 4 hours of Shaul Amsterdamski's voice + transcripts

There a 2 files there, download both of them into this location (I called mine `roboshaul`) under a folder called "`tts_model`"

https://drive.google.com/drive/folders/1C7xfx8p8iTaF73bvfvIdkGDPv01wvjmx

#### Download the trained version of the Mel-to-wav model
Trained on 30 hours of Shaul Amsterdamski's voice

There a 2 files there, download both of them into this location (I called mine `roboshaul`) under a folder called "`hifigan_model`"

https://drive.google.com/drive/folders/1SC6IQtdXH1SjHSgLGY1iZtl9nwDGQ072

#### Adding diacritics (Nikud) to Hebrew text
Our input has to have Nikud in order to turn Hebrew text into good sounding audio

There are 2 places where you can add Nikud easily online:
- https://nakdan.dicta.org.il/
- https://www.nakdan.com/

(When we trained our TTS model we used this repository to automate the process: https://github.com/elazarg/nakdimon (give it a ⭐️ on GitHub), by the way, if you are advanced in coding and would want to help this repository - integrating the Nikud process to this notebook can be a meanigful contribution)

#### Connect all the components and test the system by generating Hebrew text and hearing Roboshaul speak it out loud
- Define input text
- Load models

In [None]:
# This is the text that will be created as audio, feel free to change it ♡
input_text =  "אַתֶּם הֶאֱזַנְתֶּם לְחַיוֹת כִּיס, הַפּוֹדְקָאסְט הַכַּלְכָּלִי שֶׁל כָּאן." 

In [None]:
# tts model:
model_pth_path = Path('tts_model/saspeech_nikud_7350.pth')
model_config_path = model_pth_path.with_name('config_overflow.json')

In [None]:
# Mel-to-wav:
vocoder_pth_path = Path('hifigan_model/checkpoint_500000.pth')
vocoder_config_path = Path('hifigan_model/config_hifigan.json')

In [None]:
# Where will the outputs be saved?
output_folder = "outputs"

if not os.path.exists(output_folder):
    os.makedirs(output_folder)
    print(f"Folder named {output_folder} created.")
else:
    print(f"Folder named {output_folder} already exists.")

In [None]:
def escape_dquote(s):
    return s.replace('"', r'\"')

global_p = None

def run_model(text, output_wav_path):
    global global_p
    call_tts_string = f"""CUDA_VISIBLE_DEVICES=0 tts --text "{escape_dquote(text)}" \
        --model_path {model_pth_path} \
        --config_path {model_config_path} \
        --vocoder_path {vocoder_pth_path} \
        --vocoder_config_path {vocoder_config_path} \
        --out_path "{output_wav_path}" """
    try:
        print(call_tts_string)
        p = subprocess.Popen(['bash','-c',call_tts_string], start_new_session=True)
        global_p = p
        # throw an exception if the called process exited with an error
        p.communicate(timeout=60)
    except subprocess.TimeoutExpired as e:
        print(f'Timeout for {call_tts_string} (60s) expired', file=sys.stderr)
        print('Terminating the whole process group...', file=sys.stderr)
        os.killpg(os.getpgid(p.pid), signal.SIGTERM)

In [None]:
run_model(input_text, output_folder + "/output.wav")

### Listen to the result 👾

In [None]:
Audio(filename=output_folder + '/output.wav')