
# **Tacotron 2 (Multi-speaker) + TorchMoji Model Training (Kaggle Notebook)**
---
<a href="https://github.com/uberduck-ai/uberduck-ml-dev"> Uberduck Tacotron 2 (Multispeaker) + GSTs repo</a> & <a href="https://github.com/NVIDIA/tacotron2"> original tacotron 2 repo </a> | **Created by <a href="https://github.com/ColdFir5"> Michael </a>, Credits to <a href="https://www.kaggle.com/johnpaulk"> johnpaulbin </a> for helping put together step 8 for making tensorboard work on kaggle properly.**

This notebook will require: **A dataset**

The dataset should look like this: 

```
Kaggle Training Dataset/
          ├──wavs/
          │    ├──1.wav
          │    ├──2.wav
          │    ├──3.wav
          │    └──etc
          └──transcription.txt
               ├──wavs/1.wav|This is a test number 1!
               ├──wavs/2.wav|This is a test number 2!
               └──etc
```

**MAKE SURE 'GPU' HAS BEEN SELECTED AS THE ACCELERATOR IN THE NOTEBOOK SETTINGS**

*UPDATED 14/02/22, VERSION 6: Completely revamped and updated the notebook*

*UPDATED 14/02/22, VERSION 7: Fixed a bug*

*UPDATED 14/02/22, VERSION 8: Added a fix for the ```GLIBCXX_3.4.26 not found``` issue*

#### ***UPDATED 15/02/22, VERSION 9: Added TorchMoji! You can now train your models and it will predict emotion while training and give you the chance to set emotion while synthesising!***

*UPDATED 15/02/22, VERSION 10: Added the config file to the zipping process on step 10*

*UPDATED 18/02/22, VERSION 11: Minor fix*

---
# TRAINING INSTRUCTIONS
* Make sure to make your own version of this notebook for each new model
* **Import your dataset (22050hz, Mono, 16bit PCM audio) in the top right corner of the screen**
* Transcription file should look be in this format (LJS Formatting) for each wav (WITH PUNCTUATION):```wavs/1.wav|[Text here].```
* **RUN ALL STEPS INDIVIDUALLY AND MAKE SURE TO READ EACH STEP CAREFULLY**
* Fill in required inputs
* ***(VERY IMPORTANT)*** Once your model has been trained **DO NOT FORGET TO SAVE VERSION AND GO TO 'ADVANCED' AND CHECK 'ALWAYS SAVE OUTPUT'** so you do NOT LOSE progress
* ***(VERY IMPORTANT)*** Notebooks run for **12 hours at a time**, be sure to **save or download your models** at least **15 minutes before** the end
---
# CONTINUING TO TRAIN?

* **Be sure to change the ```warm_start_name``` to your latest trained model name (with the directory if required)**
* *Working on making this potentially be automatic, no promises ;)*

The dataset to use when continuing to train should look like this:

```
Kaggle Model Dataset/
          ├──checkpoints/
          │    └──tacotron2_3250.pt
          │
          └──runs/
               ├──events.out.tfevents.5239647356.gh52845952r9
               └──etc
```
p.s. When downloading your file after training they will be in this format already

---

# **1) User inputs**

Enter the name of your dataset, transcription file (with file extension), and if you are continuing with training or starting fresh

In [None]:
# Variables
dataset_name = "none"
transcript_file_name = "none"
training = "none"
models_dataset_name = "none"

# Inputs
while dataset_name == "none":
    dataset_name = input("What is the name of your training dataset?: ")

while transcript_file_name == "none":
    transcript_file_name = input("What is the name of your training transcription file (add file extention: eg .txt)?: ")
    
while training not in ("Continue","New","C","N"):
    print("Please type either 'Continue' or 'New'")
    training = input("Do you wish to continue training your model or start a new? [Continue/New]: ").title()
    
if training in ("C","Continue"):
    while models_dataset_name == "none":
        models_dataset_name = input("What is the name of your dataset within Kaggle to continue with training?: ")

# User Completion
print("---------------------------------------\nStep 1 completed")

---
# **2) Install the Git Repo + the requirements**

In [None]:
# Download the repo requirements 
!pip install -q git+https://github.com/johnpaulbin/uberduck-ml-dev.git --upgrade

# Make a new folder called "project"
!mkdir project/

# Open the newly created directory
%cd project/

# User Completion
print("---------------------------------------\nStep 2 completed")

---
# **2.1) Download pre-trained model & Torchmoji model with vocab file**

The pre-trained model provided is to warm start the process of training a new model with torchmoji embeddings meaning the ability to predict emotion while training and set emotion to audio synthesis!

In [None]:
# Download a pre-trained model to warm start with
!wget "https://github.com/johnpaulbin/uberduck-ml-dev/releases/download/v1/tacotron2_statedict.pt" -O tacotron2_statedict.pt

# Download torchmoji trained model
!wget "https://github.com/johnpaulbin/torchMoji/releases/download/files/pytorch_model.bin" -O torchmoji_model.bin

# Download the vocab file for torchmoji
!wget "https://raw.githubusercontent.com/johnpaulbin/torchMoji/master/model/vocabulary.json" -O vocabulary.json

# User Completion
print("---------------------------------------\nStep 2.1 completed")

---
# **3) Transfer dataset over to working env**

When moving your transcription file over it automatically renames it to ```train_filelist.txt```

In [None]:
# Import Libraries
import os

# Move all files to the working environment
os.system(f'cp -a ../../input/{dataset_name}/wavs /kaggle/working/project/')
os.system(f'cp -a ../../input/{dataset_name}/{transcript_file_name} /kaggle/working/project/train_filelist.txt')

# User Completion
print("---------------------------------------\nStep 3 completed")

---
# **4) Continuing to train? Transfers model and logs into the working directory**

In [None]:
# Check if they are continuing with training
if training in ("C","Continue"):
    os.system(f'cp -a ../../input/{models_dataset_name}/checkpoints /kaggle/working/project/')
    os.system(f'cp -a ../../input/{models_dataset_name}/runs /kaggle/working/project/')
else:
    print("You are training a new model meaning there will be no need for further file transfers.")
    
# User Completion
print("---------------------------------------\nStep 4 completed")

---
# **5) Edit transcription file to multi-speaker format**

* Adds the full file directory hierarchy
* Adds ```|0``` (single speaker) to the end of the line for multi-speaker formatting 
* e.g. ```/kaggle/working/project/wavs/1.wav|This is a test!|0```

#### If you want to train more than one speaker, I would recommend that you skip this step and you manually edit your transcription file to have the appropriate ```|0``` ```|1``` ```|2``` ```|3``` suffixes.

In [None]:
# Create a copy of the training transcription list
os.system(f"cp -a train_filelist.txt train_filelist_copy.txt")

# Open the transcription file and edit the file to the multi-speaker format
with open(f"train_filelist_copy.txt") as f:
    # Open the transcription file in working area with 'write' permissions
    with open(f"train_filelist.txt", "w") as f1:
        for line in f:
            # Write newly edited lines to the transcription file (Overwrite)
            transcript_line = f"/kaggle/working/project/{line[:-1]}|0\n"
            f1.write(transcript_line)
    
    # Re-open the transcription file with new edits and delete the last line which is empty
    transcript_file = open("train_filelist.txt")
    transcript_file_lines = transcript_file.readlines()
    transcript_file_lines = transcript_file_lines[:-1]
    transcript_file_lines.append(transcript_line[:-1])
    transcript_file.close()
    
    # Re-save transcription file
    with open("train_filelist.txt", "w") as transcript_file:
        for line in transcript_file_lines:
            transcript_file.write(line)
            
# Remove the copy of the transcription list as it will no longer be needed
!rm train_filelist_copy.txt

# User Completion
print("---------------------------------------\nStep 5 completed")

---
# **6) Assign settings to configuration file**

### Notes:
- Change ```n_speakers``` if running multi-speaker to the number of speakers you will have.
- Do NOT change ```training_audiopaths_and_text``` and ```val_audiopaths_and_text``` as these have been set already.

### If you want to train a multispeaker model:
- Add your ```sample_inference_speaker_ids``` in a list, e.g: ```[0, 1, 2, 3]```

### If you want to continue training a model:
- Change ```warm_start_name``` to your latest trained model name (with the directory, e.g. ```checkpoints/tacotron2_3250.pt```)
- Change ```ignored_layers``` to ```["null"]```



In [None]:
%%writefile tacotron2_config.json
{
    "batch_size": 18,
    "checkpoint_name": "uberduck_model",
    "checkpoint_path": "checkpoints",
    "cudnn_enabled": true,
    "dataset_path": ".",
    "debug": false,
    "distributed_run": false,
    "epochs": 5001,
    "epochs_per_checkpoint": 50,
    "fp16_run": false,
    "include_f0": false,
    "learning_rate": 5e-4,
    "log_dir": "runs",
    "n_speakers": 1,
    "p_arpabet": 1.0,
    "has_speaker_embedding": true,
    "sample_inference_speaker_ids": [0],
    "sample_rate": 22050,
    "steps_per_sample": 50,
    "text_cleaners": ["english_cleaners"],
 
 
    "training_audiopaths_and_text": "train_filelist.txt",
    "val_audiopaths_and_text": "train_filelist.txt",
 

    "warm_start_name": "tacotron2_statedict.pt",
    "ignore_layers": ["speaker_embedding.weight"],
    "seed": 123,
    "gst_dim": 2304,
    "gst_type": "torchmoji",
    "torchmoji_vocabulary_file":"vocabulary.json",
    "torchmoji_model_file": "torchmoji_model.bin"
}

---
# **8) Connect ngrok in order to use tensorboard (OPTIONAL)**

- Obtain your auth token from ngrok though: https://dashboard.ngrok.com/get-started/setup

- Once obtained, paste it in the section below where it says ```YOURTOKEN```

- Then run the block of code

**DO NOT CHANGE ANYTHING ELSE**

In [None]:
# Import Libraries
import os
import multiprocessing

# Open project directory
%cd project

# Variable to hold your auth token
ngrok_auth_token = "YOURTOKEN"

# If the zip doesn't exist, then download it
if not os.path.exists("/kaggle/working/project/ngrok-stable-linux-amd64.zip"):
    # Download the zip and unzip it
    !wget https://bin.equinox.io/c/4VmDzA7iaHb/ngrok-stable-linux-amd64.zip
    !unzip ngrok-stable-linux-amd64.zip

# Login to ngrok using your auth token
!./ngrok authtoken "$ngrok_auth_token"

# Run tensorboard on ngrok to monitor your progress
pool = multiprocessing.Pool(processes = 10)
results_of_processes = [pool.apply_async(os.system, args=(cmd, ), callback = None )
                        for cmd in [
                        f"tensorboard --logdir ./runs/ --host 0.0.0.0 --port 6006 &",
                        f"./ngrok http 6006 &"
                        ]]

# Display link for tensorboard
print("Connect to your tensorboard through this link:")
!curl -s http://localhost:4040/api/tunnels | python3 -c \
    "import sys, json; print(json.load(sys.stdin)['tunnels'][0]['public_url'])"
        
# User Completion
print("---------------------------------------\nStep 8 completed")

---
# **9) Train Tacotron 2 Multi-Speaker + TorchMoji model**

Trains your model using the newly created configuration file

In [None]:
# Opens project directory
%cd project

# Trains model using the configuration file
!python -m uberduck_ml_dev.exec.train_tacotron2 --config "tacotron2_config.json"

# User Completion
print("---------------------------------------\nStep 9 completed")

---
# **9.1) Run the block of code below ONLY if you recieve an error saying ```GLIBCXX_3.4.26 not found``` while attempting to train**

* This downloading and installing process will take a couple of minutes
* Please be patient and wait for it to finish
* **Once finished run step 9 again**


In [None]:
# Install all required fixes to allow the training prcoess to run smoothly
!apt-get install sudo > /dev/null
!sudo add-apt-repository -y ppa:ubuntu-toolchain-r/test > /dev/null
!sudo apt-get -y dist-upgrade > /dev/null
!strings /usr/lib/x86_64-linux-gnu/libstdc++.so.6 | grep GLIBCXX

# User Completion
print("---------------------------------------\nStep 9.1 completed")

---
# **10) Download model + logs**

* Click the download link located below once step 10 has finished executing in order to download your zip containing your **latest version of your model** and all your logs

* After uploading your model to your Google Drive, you can synthesize your model <a href="https://colab.research.google.com/drive/1g9W1stWS6RdeLT9PT5vIgXk_C19fnSx9?usp=sharing#scrollTo=Ye6XioU1TNvf"> here.</a>

### <a href="../tacotron2_files.zip/"> Download</a>

In [None]:
# Import libraries
import glob
import re
import os

# Open project directory
%cd project

# Function to find latest trained model
def latest_model_iteration():
  # Get all generative models loaded into a list
  model_list = glob.glob("./checkpoints/tacotron2_*.pt")

  # Create new list for all iteration numbers of models
  model_iterations = []

  # Loop through list of models and obtain the iteration number
  for model in model_list:
    # Finds the iteration number of identified model
    iteration_num = re.findall("[0-9]",model)
    
    # Remove the first "2"
    iteration_num = iteration_num[1::]

    # Add iteration number to the list of model iterations avaliable
    model_iterations.append("".join(iteration_num))

  # Sort the model iterations list from high to low
  model_iterations = sorted(model_iterations, reverse=True)

  # Return back the highest model iteration number
  return model_iterations[0]

# Assign varible to none
download_ready = "none"

# Check to see if they want to zip up their files ready to download
while download_ready not in ("Y","N","Yes","No"):
    download_ready = input("Are you ready to begin zipping your files to download? [Yes/No]: ").title()

# If they are ready to zip and download
if download_ready in ("Yes","Y"):
    # Assigns variable the latest model iteration number
    model_iteration = latest_model_iteration()
    
    # Check if zip file exists, if yes then delete if no then make new zip file
    if os.path.exists("/kaggle/working/project/tacotron2_files.zip"):
        !rm ../tacotron2_files.zip
    
    # Zip files with maximum level of compression
    os.system(f"zip -9 -r ../tacotron2_files.zip checkpoints/tacotron2_{model_iteration}.pt runs tacotron2_config.json")

# User Completion 
print("---------------------------------------\nStep 10 completed")