
# **Uberduck Mellotron Training (Kaggle Notebook)**
---
<a href="https://github.com/uberduck-ai/uberduck-ml-dev"> Uberduck Mellotron </a> & original Mellotron repo<a href="https://github.com/NVIDIA/mellotron"> Mellotron </a> | **Created by <a href="https://github.com/ColdFir5"> Michael </a>**, **Massive thanks to <a href="https://github.com/johnpaulbin"> johnpaulbin </a> for helping put together the components for this notebook, and the rest of the uberduck development team**

This notebook will require: **A dataset**

The dataset should look like this: 

```
Kaggle Dataset/
          ├──wavs/
          │    ├──1.wav
          │    ├──2.wav
          │    ├──3.wav
          │    └──etc
          └──transcription.txt
               ├──wavs/1.wav|This is a test number 1!
               ├──wavs/2.wav|This is a test number 2!
               └──etc
```

**MAKE SURE 'GPU' HAS BEEN SELECTED AS THE ACCELERATOR IN THE NOTEBOOK SETTINGS**

*Updated 24/10/21 VERSION 3-5: Bugs fixed*

**The project is still under development and the notebook will be tweaked and updated overtime, be sure to check for updates!**

---
# TRAINING INSTRUCTIONS
* **Make sure to make your own version of this notebook for each new model**
* **Import your dataset (22050hz, Mono, 16bit PCM audio) in the top right corner of the screen**
* **Transcription file should look be in this format for each wav (WITH PUNCTUATION):**```wavs/1.wav|[Text here].```
* **RUN ALL**
* **Fill in required inputs**
* **Follow though all steps**
* ***(VERY IMPORTANT)*** Once your model has been trained **DO NOT FORGET TO SAVE VERSION AND GO TO 'ADVANCED' AND CHECK 'ALWAYS SAVE OUTPUT'** so you do NOT LOSE progress
---
# CONTINUING TO TRAIN?

*Coming soon potentially*

---

# **1) User inputs**

Enter the name of your dataset and transcription file (with file extension)

In [None]:
# Variables
dataset_name = "none"
transcript_file_name = "none"

# Inputs
while dataset_name == "none":
    dataset_name = input("What is the name of your dataset?: ")

while transcript_file_name == "none":
    transcript_file_name = input("What is the name of your training transcription file (add file extention: eg .txt)?: ")

# User Completion
print("Step 1 Completed.")

---
# **2) Clone Git Repo and install requirements & download LJSpeech model**

The LJSpeech model is to warm start the process

In [None]:
# Clone repository 
!git clone -q https://github.com/uberduck-ai/uberduck-ml-dev

# Go into the main directory for mellotron
%cd uberduck-ml-dev

# Install requirements
!pip install -e .

# Download LJspeech model
!pip install gdown
!gdown --id 1UwDARlUl8JvB2xSuyMFHFsIWELVpgQD4

# User Completion
print("Step 2 Completed.")

---
# **3) Transfer dataset over to working env**

In [None]:
# Import libraries
import os

# Move dataset into working environment
os.system(f'cp -a ../../input/{dataset_name} /kaggle/working/uberduck-ml-dev')

# User Completion
print("Step 3 Completed.")

---
# **4) Edit transcript to Mellotron's transcription file format**

* Adds the full file directory hierarchy
* Adds ```|0``` to the end of the line for multispeaker formatting
* e.g. ```/kaggle/working/uberduck-ml-dev/dataset/wavs/1.wav|This is a test!|0```

In [None]:
# Open the transcription file and edit the file to Mellotron's format
with open(f"../../input/{dataset_name}/{transcript_file_name}") as f:
    # Open the transcription file in working area with 'write' permissions
    with open(f"{dataset_name}/{transcript_file_name}", "w") as f1:
        for line in f:
            # Write newly edited lines to the transcription file (Overwrite)
            transcript_line = f"/kaggle/working/uberduck-ml-dev/{dataset_name}/{line[:-1]}|0\n"
            f1.write(transcript_line)
    
    # Re-open the transcription file with new edits and delete the last line which is empty
    transcript_file = open(f"{dataset_name}/{transcript_file_name}")
    transcript_file_lines = transcript_file.readlines()
    transcript_file_lines = transcript_file_lines[:-1]
    transcript_file_lines.append(transcript_line[:-1])
    transcript_file.close()
    
    # Re-save transcription file
    with open(f"{dataset_name}/{transcript_file_name}", "w") as transcript_file:
        for line in transcript_file_lines:
            transcript_file.write(line)
            
# User Completion
print("Step 4 Completed.")

---
# **5) Convert text file transcription to Arpa**

In [None]:
# Import libraries and install resources
import os
!pip install inflect

# Download the Arpa dictionary
if not os.path.exists("/kaggle/working/uberduck-ml-dev/merged.dict.txt"):
    !wget "https://github.com/johnpaulbin/tacotron2/releases/download/Main/merged.dict.txt"

# Move the python file into the working directory
os.system(f'cp -a ../../input/arpabet/ /kaggle/working/')

# Converts text into Arpa and saves the new file
os.system(f"python /kaggle/working/arpabet/convert_arpabet.py --file='/kaggle/working/uberduck-ml-dev/{dataset_name}/{transcript_file_name}'")

# User Completion
print("Step 5 Completed.")

---
# **6) Assign settings to configuration file**

In [None]:
# Import libraries
import json

# Assign variables
batch_size_num = 0
character_name = ""
epoch_num = 0
checkpoint_num = 0

# User inputs (with validation check)
while batch_size_num <= 0:
    batch_size_num = int(input("What batch size would you like to use (recommended: 24): "))

while character_name == "":  
    character_name = input("What is the name of your character?: ")

while epoch_num <= 0:
    epoch_num = int(input("How many epoch would you like to train your model to? (recommended: 5000): "))
    
while checkpoint_num <= 0:
    checkpoint_num = int(input("How often would you like to save a version of your model (recommended: 500): Epoch - "))

# Add "ARPA" to the end of the filename for config file
listtranscript_file_name = transcript_file_name.split(".")
listtranscript_file_name[0] = f"{listtranscript_file_name[0]}ARPA"

arpatranscript_file_name = ".".join(listtranscript_file_name)

# Open the config file and edit values
with open('tacotron2_config.json') as f:
    json_config = json.load(f)
    json_config["batch_size"] = batch_size_num
    json_config["checkpoint_name"] = character_name
    json_config["checkpoint_path"] = "/kaggle/working/checkpoint"
    json_config["dataset_path"] = f"/kaggle/working/uberduck-ml-dev/{dataset_name}"
    json_config["warm_start_name"] = "mellotron_ljs.pt"
    json_config["epochs"] = epoch_num
    json_config["include_f0"] = True
    json_config["n_speakers"] = 1
    json_config["fp16_run"] = False
    json_config["n_frames_per_step_initial"] = 1
    json_config["epochs_per_checkpoint"] = checkpoint_num
    json_config["training_audiopaths_and_text"] = f"/kaggle/working/uberduck-ml-dev/{dataset_name}/{arpatranscript_file_name}"
    json_config["val_audiopaths_and_text"] = f"/kaggle/working/uberduck-ml-dev/{dataset_name}/{arpatranscript_file_name}"
    json_config.update({f"ignore_layers": None})
                         
# Save the JSON files with new updated values
with open('tacotron2_config.json', 'w') as JSON_FILE:
    json.dump(json_config, JSON_FILE)
                         
# User Completion
print("Step 6 Completed.")

---
# **7) Train Mellotron model**

Trains your Mellotron model using the newly edited configuration file

In [None]:
# Train model using config file
!bash train.sh "tacotron2_config.json"

# User Completion
print("Step 7 Completed.")