<a href="https://colab.research.google.com/github/dramaticrobotic/Tacotron_2_Legacy/blob/main/Train_a_legacy_Tacotron_2_model.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **A legacy Tacotron 2 model trainer notebook**

### Organized and simplified by mega b#6696; edited by Gosmokeless28 | [Tacotron 2](https://github.com/NVIDIA/tacotron2)

### Code originally by CookieGalaxy

---

### (Here is [an output generator notebook](https://colab.research.google.com/drive/1NVA3ndxhYWsKn-zwh3NnzMMgoVdJ5xUx) with which you can test your TT2 model)

### **Warning:**  If you're using a non-chromium web browser, you may encounter difficulties if you upload files that are larger than one megabyte in filesize.

# Optional cells. Unhide by clicking the arrow on the left.

In [1]:
#@markdown # Check which GPU you've been allocated
#@markdown ### Disconnect from & delete the runtime if you haven't been allocated a desired GPU.

#@markdown ---

#@markdown ## It is recommended not to use a K80 GPU

!nvidia-smi -L
#@markdown All GPUs work, but each of them vary in speed. K80 GPUs are usable, but not recommended.

#@markdown ---

GPU 0: Tesla T4 (UUID: GPU-61041405-2574-1715-742b-986a38b3530d)


In [2]:
#@markdown # Anti-disconnection
#@markdown ## Run this cell to prevent the session from being terminated involuntarily; it will be terminated automatically after 23–24 hours, though.

import IPython
js_code = '''
function ClickConnect(){
console.log("Working");
document.querySelector("colab-toolbar-button#connect").click()
}
setInterval(ClickConnect,60000)
'''
display(IPython.display.Javascript(js_code))

<IPython.core.display.Javascript object>

In [8]:
#@markdown ## Import files
#@markdown ### (Step #2 needs to be executed before this cell is ran)
#@markdown Instead of using this cell, you can upload the .wav files manually to `tacotron2/wavs` and the `transcript.txt` file to `tacotron2/filelists` using the sidebar on the left.
#
#@markdown ---
#
#@markdown #### Where you want to import your dataset from
#@markdown This can be a direct download URL, a Google Drive file (URL or ID), a MEGA file (only URL for now), or a local file path (sidebar → right-click file → `Copy path`). **Folders are not supported**, so for this, you need to have your files be in a .zip file.
dataset_source = "C:\\Users\\drama\\Downloads\\Angel Dust (Pilot)\\Angel Dust (Pilot).zip" #@param {type:"string"}
#@markdown #### What type of dataset it is
#@markdown The available choices are:
#@markdown - **LJ Speech dataset format** - `transcript.txt` and a `wavs` folder.
####@markdown - **/mlp/ dataset format** - one folder per .wav, each with `audio.wav` and `label.json`. This format is used by this pile of every My Little Pony voice in existence: https://drive.google.com/drive/folders/10smbrQMZrDoLs5QMgCULVMs9geNdr4W4
#@markdown - **Just unzip** - plain .wav files. Works like "Unzip file to unpack wavs" from the previous version of this notebook. For this, you need to import the `transcript.txt` manually.
loader = "LJ Speech dataset format" #@param ["LJ Speech dataset format", "Just unzip"]
#@markdown #### Whether to remove already imported .wav files or not
#@markdown Enabled by default. If unchecked, this will combine the current dataset with the new one. Watch out for identical filenames across datasets!
remove_current_files = True #@param {type:"boolean"}
####
####@markdown ---
####
####@markdown ### Data cleanup
####@markdown Note: these are pretty slow, can't be stopped without restarting the runtime, and you probably don't need them if the dataset is good.
####@markdown #### Maximum volume counted as silence for removal (percentage)
####@markdown `-1` (disabled) by default. The loader can remove silence from the start and end of the audio, to prevent the AI from having to spend precious neurons on memorizing the bounds of each clip. **Recommended** for the /mlp/ datasets, with a value of `0.1` (as used in [one of their notebooks](https://colab.research.google.com/drive/1Tv6yaMQ0rxX9Zru3_D16Yzp5gQNsgn9h#scrollTo=05SOBkCj7Lrm)). If using with other datasets, zoom in on one of the audio files with Audacity to check what level the silence is.
###silence_max =  -1#@param {type:"number"}
####@markdown #### Whether to normalize the audio
####@markdown Disabled by default. This sets the audio volume for each clip as high as it can be without going too far, which might be bad if some of your clips are quiet for a reason (e.g. whispering). You probably don't need this anyway, though.
###normalize = False #@param {type:"boolean"}
####
####@markdown ---
####
####@markdown ### /mlp/ importer only
####@markdown #### Maximum noisiness level
####@markdown Each file is marked by how clean it is. For characters with less voicelines, you might want to loosen your restriction. You'll see the amounts of each type of clip when you run the cell. Your download is cached, so just re-run the cell with different settings if you want to change this.
###mlp_max_noise = "Noisy" #@param ["Clean", "Noisy", "Very Noisy"]
####@markdown #### Allowed emotions
####@markdown Empty (allowing all) by default. Each file also has one or more emotion tags. Write down the ones you want, separated by commas. Note that only one of the clip's emotion tags needs to match, so a clip with `Happy, Shouting` will be allowed even if you only specified `Happy`.
####
####@markdown Common emotion tags (taken from [here](https://docs.google.com/document/d/1xe1Clvdg6EFFDtIkkFwT-NPLRDPvkV4G675SUKjxVRU/edit#heading=h.dt6j5eyzo9id)): `Neutral, Happy, Amused, Sad, Annoyed, Angry, Disgust, Sarcastic, Smug, Fear, Anxious, Confused, Surprised, Tired, Whispering, Shouting, Whining, Crazy`
###mlp_allowed_emotions = "" #@param {type:"string"}
####@markdown #### Disallowed emotions
####@markdown `Whispering` by default. Same deal as above, just the other way around.
###mlp_disallowed_emotions = "Whispering" #@param {type:"string"}

import gdown
import json
import os
import re
import shutil
from tqdm.notebook import tqdm
###!apt install sox

os.chdir("/content/tacotron2")
 
if not os.path.isdir("/content/tacotron2/wavs"):
  raise Exception("You must run step 1 before running this.")
 
wavs_path = "/content/tacotron2/wavs"
filelist_path = "/content/tacotron2/filelists/list.txt"
 
def add_to_list_txt(to_add):
  with open(filelist_path, "a+") as f:
    if not f.seek(0,2) == "\n":
      f.write("\n")
    f.write(to_add)
 
if remove_current_files:
  !rm -rf {wavs_path}
  !mkdir {wavs_path}
  !rm -f {filelist_path}

!rm -rf /content/tempwavs
!rm -rf /content/tempwavs2

try:
  latest_downloaded
except NameError:
  latest_downloaded = None

if latest_downloaded == dataset_source:
  print("%s is already downloaded" % dataset_source)
else:
  latest_downloaded = dataset_source
  !rm -f /content/dataset
 
  m = re.search("^https?://drive\.google\.com/file/d/([a-zA-Z0-9\-_]+)/", dataset_source)
  if m:
    dataset_source = m.group(1)
 
  if dataset_source.startswith("http"):
    if dataset_source.startswith("https://mega.nz/"):
      !/content/tacotron2/megadown.sh {dataset_source} -o /content/dataset
    else:
      !curl -s -L $dataset_source -o /content/dataset
  elif dataset_source.startswith("/"):
    !cp $dataset_source /content/dataset
  else:
    gdown.download("https://drive.google.com/uc?id=" + dataset_source, "/content/dataset", quiet = False)
 
!7z x /content/dataset -o/content/datasetunzip

if loader == "LJSpeech-style dataset":
  #!unzip -q /content/dataset -d /content/datasetunzip
  !mv /content/datasetunzip/wavs /content/tempwavs
  if remove_current_files:
    !mv /content/datasetunzip/list.txt {filelist_path}
  else:
    with open("/content/datasetunzip/list.txt") as f:
      add_to_list_txt(f.read())
  #!rm -r /content/datasetunzip
 
###elif loader == "/mlp/ dataset":
###  #!mkdir /content/datasetuntar
###  #!tar -xf /content/dataset -C /content/datasetuntar
###  !mkdir /content/tempwavs
###  list_txt = []
###  noise_chart = {"": 0, "Clean": 0, "Noisy": 1, "Very Noisy": 2}
###  noise_stats = {}
###  emotion_stats = {}
###  mlp_allowed_emotions = [i.strip() for i in mlp_allowed_emotions.split(",")] if mlp_allowed_emotions else []
###  mlp_disallowed_emotions = [i.strip() for i in mlp_disallowed_emotions.split(",")] if mlp_disallowed_emotions else []
###  counter = 0
###  for dirpath, dirnames, filenames in os.walk("/content/datasetunzip"):
###    for i in dirnames:
###      with open(os.path.join(dirpath, i, "label.json")) as f:
###        data = json.load(f)
###        noise_stats[data["noise"]] = noise_stats.get(data["noise"], 0) + 1
###        for j in data["tags"]:
###          emotion_stats[j] = emotion_stats.get(j, 0) + 1
###        if noise_chart[data["noise"]] <= noise_chart[mlp_max_noise]:
###          if len(mlp_allowed_emotions) == 0 or any([i in mlp_allowed_emotions for i in data["tags"]]):
###            if not any([i in mlp_disallowed_emotions for i in data["tags"]]):
###              list_txt.append("wavs/" + i + ".wav|" + data["utterance"]["content"])
###              os.rename(os.path.join(dirpath, i, "audio.wav"), os.path.join("/content/tempwavs", i + ".wav"))
###              counter += 1
###  if remove_current_files:
###    with open(filelist_path, "w") as f:
###      f.write("\n".join(list_txt))
###  else:
###    add_to_list_txt("\n".join(list_txt))
###  #!rm -r /content/datasetuntar
###  print(noise_stats)
###  print("\n".join([" ".join([str(j) for j in i]) for i in sorted(emotion_stats.items(), key = lambda x:x[1], reverse = True)]))
###  print(f"{counter} loaded")
 
elif loader == "Just unzip":
  #!unzip -q /content/dataset -d /content/tempwavs
  !mv /content/datasetunzip /content/tempwavs
 
!rm -r /content/datasetunzip
 
###if silence_max != -1:
###  print("Trimming silence...")
###  !mkdir /content/tempwavs2
###  for i in tqdm(list(os.scandir("/content/tempwavs"))):
###    !sox {i.path} {os.path.join("/content/tempwavs2", i.name)} silence 1 0.05 {silence_max}% reverse silence 1 0.05 {silence_max}% reverse
###  !rm -rf /content/tempwavs
###  !mv /content/tempwavs2 /content/tempwavs
 
###if normalize:
###  print("Normalizing...")
###  !mkdir /content/tempwavs2
###  for i in tqdm(list(os.scandir("/content/tempwavs"))):
###    !sox {i.path} {os.path.join("/content/tempwavs2", i.name)} gain -h
###  !rm -rf /content/tempwavs
###  !mv /content/tempwavs2 /content/tempwavs
 
# https://unix.stackexchange.com/a/626625
!cp --force --archive --update --link /content/tempwavs/. {wavs_path}
!rm -rf /content/tempwavs
 
print("tacotron2/wavs now contains %s files" % len(os.listdir(wavs_path)))

C:\Users\drama\Downloads\Angel Dust (Pilot)\Angel Dust (Pilot).zip is already downloaded

7-Zip [64] 16.02 : Copyright (c) 1999-2016 Igor Pavlov : 2016-05-21
p7zip Version 16.02 (locale=en_US.UTF-8,Utf16=on,HugeFiles=on,64 bits,2 CPUs Intel(R) Xeon(R) CPU @ 2.00GHz (50653),ASM,AES-NI)

Scanning the drive for archives:
  0M Scan /content/                   
ERROR: No more files
/content/dataset



System ERROR:
Unknown error -2147024872
rm: cannot remove '/content/datasetunzip': No such file or directory
cp: cannot stat '/content/tempwavs/.': No such file or directory
tacotron2/wavs now contains 0 files


# Training

In [3]:
#@markdown ## **1** Mount Google Drive

#Google Drive Authentication Token
from google.colab import drive
drive.mount('drive')

Mounted at drive


In [4]:
#@markdown ## **2** After running this cell, insert the wavs—required to be in 22,050 Hz, mono, and 16 bit—into the /content/tacotron2/wavs folder on the left.

#@markdown ---

#@markdown ## If the dataset contains a lot of .wav files, you should use the optional cell, "Import files", after running this.

#@markdown #### Execute this step to install Tacotron 2 and dependencies
!pip install git+https://github.com/justinjohn0306/gdown.git
import os
!git clone -q https://github.com/NVIDIA/tacotron2
os.chdir('tacotron2')
!git submodule init
!git submodule update
!pip install -q unidecode tensorboardX
!apt-get install pv
!apt-get install jq
!wget https://raw.githubusercontent.com/tonikelope/megadown/master/megadown -O megadown.sh
!chmod 755 megadown.sh

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting git+https://github.com/justinjohn0306/gdown.git
  Cloning https://github.com/justinjohn0306/gdown.git to /tmp/pip-req-build-oc082uxx
  Running command git clone -q https://github.com/justinjohn0306/gdown.git /tmp/pip-req-build-oc082uxx
  Running command git submodule update --init --recursive -q
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
    Preparing wheel metadata ... [?25l[?25hdone
Building wheels for collected packages: gdown
  Building wheel for gdown (PEP 517) ... [?25l[?25hdone
  Created wheel for gdown: filename=gdown-4.5.3-py3-none-any.whl size=14857 sha256=a4d39cf062b0d34d14eae1063448558389cb483ceae26c5a1933ddf643cd33cd
  Stored in directory: /tmp/pip-ephem-wheel-cache-g16lejhk/wheels/b8/fd/2e/577ecbc3fc8775dd17e609dffdcad1ebd61ebe70bea9dd0c2b
Successfully built gdown
Installing collected package

In [6]:
#@title monkey patch for tensorflow 2
%%writefile hparams.py
import tensorflow as tf
from text import symbols

class HParamsAlternative(dict):
    def __init__(self, *args, **kwargs):
        super(HParamsAlternative, self).__init__(*args, **kwargs)
        self.__dict__ = self

def create_hparams(hparams_string=None, verbose=False):
    """Create model hyperparameters. Parse nondefault from given string."""

    hparams = HParamsAlternative(
        ################################
        # Experiment Parameters        #
        ################################
        epochs=500,
        iters_per_checkpoint=1000,
        seed=1234,
        dynamic_loss_scaling=True,
        fp16_run=False,
        distributed_run=False,
        dist_backend="nccl",
        dist_url="tcp://localhost:54321",
        cudnn_enabled=True,
        cudnn_benchmark=False,
        ignore_layers=['embedding.weight'],

        ################################
        # Data Parameters             #
        ################################
        load_mel_from_disk=False,
        training_files='filelists/ljs_audio_text_train_filelist.txt',
        validation_files='filelists/ljs_audio_text_val_filelist.txt',
        text_cleaners=['basic_cleaners'],

        ################################
        # Audio Parameters             #
        ################################
        max_wav_value=32768.0,
        sampling_rate=22050,
        filter_length=1024,
        hop_length=256,
        win_length=1024,
        n_mel_channels=80,
        mel_fmin=0.0,
        mel_fmax=8000.0,

        ################################
        # Model Parameters             #
        ################################
        n_symbols=len(symbols),
        symbols_embedding_dim=512,

        # Encoder parameters
        encoder_kernel_size=5,
        encoder_n_convolutions=3,
        encoder_embedding_dim=512,

        # Decoder parameters
        n_frames_per_step=1,  # currently only 1 is supported
        decoder_rnn_dim=1024,
        prenet_dim=256,
        max_decoder_steps=1000,
        gate_threshold=0.5,
        p_attention_dropout=0.1,
        p_decoder_dropout=0.1,

        # Attention parameters
        attention_rnn_dim=1024,
        attention_dim=128,

        # Location Layer parameters
        attention_location_n_filters=32,
        attention_location_kernel_size=31,

        # Mel-post processing network parameters
        postnet_embedding_dim=512,
        postnet_kernel_size=5,
        postnet_n_convolutions=5,

        ################################
        # Optimization Hyperparameters #
        ################################
        use_saved_learning_rate=False,
        learning_rate=1e-3,
        weight_decay=1e-6,
        grad_clip_thresh=1.0,
        batch_size=64,
        mask_padding=True  # set model's padded outputs to padded values
    )

    if hparams_string:
        tf.logging.info('Parsing command line hparams: %s', hparams_string)
        hparams.parse(hparams_string)

    if verbose:
        tf.logging.info('Final parsed hparams: %s', hparams.values())

    return hparams

Overwriting hparams.py


In [7]:
#@markdown Download the base model
%matplotlib inline
import os
if os.getcwd() != '/content/tacotron2':
    os.chdir('tacotron2')
import time
import argparse
import math
from numpy import finfo

import torch
from distributed import apply_gradient_allreduce
import torch.distributed as dist
from torch.utils.data.distributed import DistributedSampler
from torch.utils.data import DataLoader

from model import Tacotron2
from data_utils import TextMelLoader, TextMelCollate
from loss_function import Tacotron2Loss
from logger import Tacotron2Logger
from hparams import create_hparams
 
import random
import numpy as np

import layers
from utils import load_wav_to_torch, load_filepaths_and_text
from text import text_to_sequence
from math import e
#from tqdm import tqdm # Terminal
#from tqdm import tqdm_notebook as tqdm # Legacy Notebook TQDM
from tqdm.notebook import tqdm # Modern Notebook TQDM
from distutils.dir_util import copy_tree
import matplotlib.pylab as plt

def download_from_google_drive(file_id, file_name):
  # download a file from the Google Drive link
  !rm -f ./cookie
  !curl -c ./cookie -s -L "https://drive.google.com/uc?export=download&id={file_id}" > /dev/null
  confirm_text = !awk '/download/ {print $NF}' ./cookie
  confirm_text = confirm_text[0]
  !curl -Lb ./cookie "https://drive.google.com/uc?export=download&confirm={confirm_text}&id={file_id}" -o {file_name}

def create_mels():
    print("Generating Mels")
    stft = layers.TacotronSTFT(
                hparams.filter_length, hparams.hop_length, hparams.win_length,
                hparams.n_mel_channels, hparams.sampling_rate, hparams.mel_fmin,
                hparams.mel_fmax)
    def save_mel(filename):
        audio, sampling_rate = load_wav_to_torch(filename)
        if sampling_rate != stft.sampling_rate:
            raise ValueError("{} {} SR doesn't match target {} SR".format(filename, 
                sampling_rate, stft.sampling_rate))
        audio_norm = audio / hparams.max_wav_value
        audio_norm = audio_norm.unsqueeze(0)
        audio_norm = torch.autograd.Variable(audio_norm, requires_grad=False)
        melspec = stft.mel_spectrogram(audio_norm)
        melspec = torch.squeeze(melspec, 0).cpu().numpy()
        np.save(filename.replace('.wav', ''), melspec)

    import glob
    wavs = glob.glob('wavs/*.wav')
    for i in tqdm(wavs):
        save_mel(i)


def reduce_tensor(tensor, n_gpus):
    rt = tensor.clone()
    dist.all_reduce(rt, op=dist.reduce_op.SUM)
    rt /= n_gpus
    return rt


def init_distributed(hparams, n_gpus, rank, group_name):
    assert torch.cuda.is_available(), "Distributed mode requires CUDA."
    print("Initializing Distributed")

    # Set cuda device so everything is done on the right GPU.
    torch.cuda.set_device(rank % torch.cuda.device_count())

    # Initialize distributed communication
    dist.init_process_group(
        backend=hparams.dist_backend, init_method=hparams.dist_url,
        world_size=n_gpus, rank=rank, group_name=group_name)

    print("Done initializing distributed")


def prepare_dataloaders(hparams):
    # Get data, data loaders and collate function ready
    trainset = TextMelLoader(hparams.training_files, hparams)
    valset = TextMelLoader(hparams.validation_files, hparams)
    collate_fn = TextMelCollate(hparams.n_frames_per_step)

    if hparams.distributed_run:
        train_sampler = DistributedSampler(trainset)
        shuffle = False
    else:
        train_sampler = None
        shuffle = True

    train_loader = DataLoader(trainset, num_workers=1, shuffle=shuffle,
                              sampler=train_sampler,
                              batch_size=hparams.batch_size, pin_memory=False,
                              drop_last=True, collate_fn=collate_fn)
    return train_loader, valset, collate_fn


def prepare_directories_and_logger(output_directory, log_directory, rank):
    if rank == 0:
        if not os.path.isdir(output_directory):
            os.makedirs(output_directory)
            os.chmod(output_directory, 0o775)
        logger = Tacotron2Logger(os.path.join(output_directory, log_directory))
    else:
        logger = None
    return logger


def load_model(hparams):
    model = Tacotron2(hparams).cuda()
    if hparams.fp16_run:
        model.decoder.attention_layer.score_mask_value = finfo('float16').min

    if hparams.distributed_run:
        model = apply_gradient_allreduce(model)

    return model


def warm_start_model(checkpoint_path, model, ignore_layers):
    assert os.path.isfile(checkpoint_path)
    print("Warm starting model from checkpoint '{}'".format(checkpoint_path))
    checkpoint_dict = torch.load(checkpoint_path, map_location='cpu')
    model_dict = checkpoint_dict['state_dict']
    if len(ignore_layers) > 0:
        model_dict = {k: v for k, v in model_dict.items()
                      if k not in ignore_layers}
        dummy_dict = model.state_dict()
        dummy_dict.update(model_dict)
        model_dict = dummy_dict
    model.load_state_dict(model_dict)
    return model


def load_checkpoint(checkpoint_path, model, optimizer):
    assert os.path.isfile(checkpoint_path)
    print("Loading checkpoint '{}'".format(checkpoint_path))
    checkpoint_dict = torch.load(checkpoint_path, map_location='cpu')
    model.load_state_dict(checkpoint_dict['state_dict'])
    optimizer.load_state_dict(checkpoint_dict['optimizer'])
    learning_rate = checkpoint_dict['learning_rate']
    iteration = checkpoint_dict['iteration']
    print("Loaded checkpoint '{}' from iteration {}" .format(
        checkpoint_path, iteration))
    return model, optimizer, learning_rate, iteration


def save_checkpoint(model, optimizer, learning_rate, iteration, filepath):
    import random
    if random.random() > 0.85:
        print("Saving model and optimizer state at iteration {} to {}".format(
            iteration, filepath))
        try:
            torch.save({'iteration': iteration,
                    'state_dict': model.state_dict(),
                    'optimizer': optimizer.state_dict(),
                    'learning_rate': learning_rate}, filepath)
        except KeyboardInterrupt:
            print("interrupt received while saving, waiting for save to complete.")
            torch.save({'iteration': iteration,'state_dict': model.state_dict(),'optimizer': optimizer.state_dict(),'learning_rate': learning_rate}, filepath)
        print("Model Saved")

def plot_alignment(alignment, info=None):
    %matplotlib inline
    fig, ax = plt.subplots(figsize=(int(alignment_graph_width/100), int(alignment_graph_height/100)))
    im = ax.imshow(alignment, cmap='inferno', aspect='auto', origin='lower',
                   interpolation='none')
    ax.autoscale(enable=True, axis="y", tight=True)
    fig.colorbar(im, ax=ax)
    xlabel = 'Decoder timestep'
    if info is not None:
        xlabel += '\n\n' + info
    plt.xlabel(xlabel)
    plt.ylabel('Encoder timestep')
    plt.tight_layout()
    fig.canvas.draw()
    plt.show()

def validate(model, criterion, valset, iteration, batch_size, n_gpus,
             collate_fn, logger, distributed_run, rank, epoch, start_eposh, learning_rate):
    """Handles all the validation scoring and printing"""
    model.eval()
    with torch.no_grad():
        val_sampler = DistributedSampler(valset) if distributed_run else None
        val_loader = DataLoader(valset, sampler=val_sampler, num_workers=1,
                                shuffle=False, batch_size=batch_size,
                                pin_memory=False, collate_fn=collate_fn)

        val_loss = 0.0
        for i, batch in enumerate(val_loader):
            x, y = model.parse_batch(batch)
            y_pred = model(x)
            loss = criterion(y_pred, y)
            if distributed_run:
                reduced_val_loss = reduce_tensor(loss.data, n_gpus).item()
            else:
                reduced_val_loss = loss.item()
            val_loss += reduced_val_loss
        val_loss = val_loss / (i + 1)

    model.train()
    if rank == 0:
        print("Epoch: {} Validation loss {}: {:9f}  Time: {:.1f}m LR: {:.6f}".format(epoch, iteration, val_loss,(time.perf_counter()-start_eposh)/60, learning_rate))
        logger.log_validation(val_loss, model, y, y_pred, iteration)
        if hparams.show_alignments:
            %matplotlib inline
            _, mel_outputs, gate_outputs, alignments = y_pred
            idx = random.randint(0, alignments.size(0) - 1)
            plot_alignment(alignments[idx].data.cpu().numpy().T)

def train(output_directory, log_directory, checkpoint_path, warm_start, n_gpus,
          rank, group_name, hparams, log_directory2):
    """Training and validation logging results to tensorboard and stdout

    Params
    ------
    output_directory (string): directory to save checkpoints
    log_directory (string) directory to save tensorboard logs
    checkpoint_path(string): checkpoint path
    n_gpus (int): number of gpus
    rank (int): rank of current gpu
    hparams (object): comma separated list of "name=value" pairs.
    """
    if hparams.distributed_run:
        init_distributed(hparams, n_gpus, rank, group_name)

    torch.manual_seed(hparams.seed)
    torch.cuda.manual_seed(hparams.seed)

    model = load_model(hparams)
    learning_rate = hparams.learning_rate
    optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate,
                                 weight_decay=hparams.weight_decay)

    if hparams.fp16_run:
        from apex import amp
        model, optimizer = amp.initialize(
            model, optimizer, opt_level='O2')

    if hparams.distributed_run:
        model = apply_gradient_allreduce(model)

    criterion = Tacotron2Loss()

    logger = prepare_directories_and_logger(
        output_directory, log_directory, rank)

    train_loader, valset, collate_fn = prepare_dataloaders(hparams)

    # Load checkpoint if one exists
    iteration = 0
    epoch_offset = 0
    if checkpoint_path is not None and os.path.isfile(checkpoint_path):
        if warm_start:
            model = warm_start_model(
                checkpoint_path, model, hparams.ignore_layers)
        else:
            model, optimizer, _learning_rate, iteration = load_checkpoint(
                checkpoint_path, model, optimizer)
            if hparams.use_saved_learning_rate:
                learning_rate = _learning_rate
            iteration += 1  # next iteration is iteration + 1
            epoch_offset = max(0, int(iteration / len(train_loader)))
    else:
      os.path.isfile("/content/tacotron2/pretrained_model")
      %cd /dev/null
      !/content/tacotron2/megadown.sh https://mega.nz/#!WXY3RILA!KyoGHtfB_sdhmLFoykG2lKWhh0GFdwMkk7OwAjpQHRo --o pretrained_model
      %cd /content/tacotron2
      model = warm_start_model("/content/tacotron2/pretrained_model", model, hparams.ignore_layers)
      # download LJSpeech pretrained model if no checkpoint already exists
    
    start_eposh = time.perf_counter()
    learning_rate = 0.0
    model.train()
    is_overflow = False
    # ================ MAIN TRAINNIG LOOP! ===================
    for epoch in tqdm(range(epoch_offset, hparams.epochs)):
        print("\nStarting Epoch: {} Iteration: {}".format(epoch, iteration))
        start_eposh = time.perf_counter() # eposh is russian, not a typo
        for i, batch in tqdm(enumerate(train_loader), total=len(train_loader)):
            start = time.perf_counter()
            if iteration < hparams.decay_start: learning_rate = hparams.A_
            else: iteration_adjusted = iteration - hparams.decay_start; learning_rate = (hparams.A_*(e**(-iteration_adjusted/hparams.B_))) + hparams.C_
            learning_rate = max(hparams.min_learning_rate, learning_rate) # output the largest number
            for param_group in optimizer.param_groups:
                param_group['lr'] = learning_rate

            model.zero_grad()
            x, y = model.parse_batch(batch)
            y_pred = model(x)

            loss = criterion(y_pred, y)
            if hparams.distributed_run:
                reduced_loss = reduce_tensor(loss.data, n_gpus).item()
            else:
                reduced_loss = loss.item()
            if hparams.fp16_run:
                with amp.scale_loss(loss, optimizer) as scaled_loss:
                    scaled_loss.backward()
            else:
                loss.backward()

            if hparams.fp16_run:
                grad_norm = torch.nn.utils.clip_grad_norm_(
                    amp.master_params(optimizer), hparams.grad_clip_thresh)
                is_overflow = math.isnan(grad_norm)
            else:
                grad_norm = torch.nn.utils.clip_grad_norm_(
                    model.parameters(), hparams.grad_clip_thresh)

            optimizer.step()

            if not is_overflow and rank == 0:
                duration = time.perf_counter() - start
                logger.log_training(
                    reduced_loss, grad_norm, learning_rate, duration, iteration)
                #print("Batch {} loss {:.6f} Grad Norm {:.6f} Time {:.6f}".format(iteration, reduced_loss, grad_norm, duration), end='\r', flush=True)

            iteration += 1
        validate(model, criterion, valset, iteration,
                 hparams.batch_size, n_gpus, collate_fn, logger,
                 hparams.distributed_run, rank, epoch, start_eposh, learning_rate)
        save_checkpoint(model, optimizer, learning_rate, iteration, checkpoint_path)
        if log_directory2 != None:
            copy_tree(log_directory, log_directory2)
def check_dataset(hparams):
    from utils import load_wav_to_torch, load_filepaths_and_text
    import os
    import numpy as np
    def check_arr(filelist_arr):
        for i, file in enumerate(filelist_arr):
            if len(file) > 2:
                print("|".join(file), "\nhas multiple '|', this may not be an error.")
            if hparams.load_mel_from_disk and '.wav' in file[0]:
                print("[WARNING]", file[0], " in filelist while expecting .npy .")
            else:
                if not hparams.load_mel_from_disk and '.npy' in file[0]:
                    print("[WARNING]", file[0], " in filelist while expecting .wav .")
            if (not os.path.exists(file[0])):
                print("|".join(file), "\n[WARNING] does not exist.")
            if len(file[1]) < 3:
                print("|".join(file), "\n[info] has no/very little text.")
            if not ((file[1].strip())[-1] in r"!?,.;:"):
                print("|".join(file), "\n[info] has no ending punctuation.")
            mel_length = 1
            if hparams.load_mel_from_disk and '.npy' in file[0]:
                melspec = torch.from_numpy(np.load(file[0], allow_pickle=True))
                mel_length = melspec.shape[1]
            if mel_length == 0:
                print("|".join(file), "\n[WARNING] has 0 duration.")
    print("Checking Training Files")
    audiopaths_and_text = load_filepaths_and_text(hparams.training_files) # get split lines from training_files text file.
    check_arr(audiopaths_and_text)
    print("Checking Validation Files")
    audiopaths_and_text = load_filepaths_and_text(hparams.validation_files) # get split lines from validation_files text file.
    check_arr(audiopaths_and_text)
    print("Finished Checking")

warm_start=False#sorry bout that
n_gpus=1
rank=0
group_name=None

# ---- DEFAULT PARAMETERS DEFINED HERE ----
hparams = create_hparams()
model_filename = 'current_model'
hparams.training_files = "filelists/clipper_train_filelist.txt"
hparams.validation_files = "filelists/clipper_val_filelist.txt"
#hparams.use_mmi=True,          # not used in this notebook
#hparams.use_gaf=True,          # not used in this notebook
#hparams.max_gaf=0.5,           # not used in this notebook
#hparams.drop_frame_rate = 0.2  # not used in this notebook
hparams.p_attention_dropout=0.1
hparams.p_decoder_dropout=0.1
hparams.decay_start = 15000
hparams.A_ = 5e-4
hparams.B_ = 8000
hparams.C_ = 0
hparams.min_learning_rate = 1e-5
generate_mels = True
hparams.show_alignments = True
alignment_graph_height = 600
alignment_graph_width = 1000
hparams.batch_size = 32
hparams.load_mel_from_disk = True
hparams.ignore_layers = []
hparams.epochs = 10000
torch.backends.cudnn.enabled = hparams.cudnn_enabled
torch.backends.cudnn.benchmark = hparams.cudnn_benchmark
output_directory = '/content/drive/My Drive/colab/outdir' # Location to save Checkpoints
log_directory = '/content/tacotron2/logs' # Location to save Log files locally
log_directory2 = '/content/drive/My Drive/colab/logs' # Location to copy log files (done at the end of each epoch to cut down on I/O)
checkpoint_path = output_directory+(r'/')+model_filename

# ---- Replace .wav with .npy in filelists ----
!sed -i -- 's,.wav|,.npy|,g' filelists/*.txt
!sed -i -- 's,.wav|,.npy|,g' {hparams.training_files}
!sed -i -- 's,.wav|,.npy|,g' {hparams.validation_files}
# ---- Replace .wav with .npy in filelists ----

%cd /content/tacotron2

data_path = 'wavs'
!mkdir {data_path}

  return s in _symbol_to_id and s is not '_' and s is not '~'
  return s in _symbol_to_id and s is not '_' and s is not '~'


sed: can't read filelists/clipper_train_filelist.txt: No such file or directory
sed: can't read filelists/clipper_val_filelist.txt: No such file or directory
/content
mkdir: cannot create directory ‘wavs’: File exists


In [11]:
#@markdown ## **3** Set the model parameters
#@markdown ---
#@markdown #### The name of the TT2 model
model_filename = 'Tacotron_2_model' #@param {type: "string"}

#@markdown #### Import the transcript .txt file into the /content/tacotron2/filelists folder on the left.
Training_file = "filelists/transcript.txt" #@param {type: "string"}
hparams.training_files = Training_file
hparams.validation_files = Training_file

# hparams to Tune
#hparams.use_mmi=True,          # not used in this notebook
#hparams.use_gaf=True,          # not used in this notebook
#hparams.max_gaf=0.5,           # not used in this notebook
#hparams.drop_frame_rate = 0.2  # not used in this notebook
hparams.p_attention_dropout=0.1
hparams.p_decoder_dropout=0.1

# Learning Rate             # https://www.desmos.com/calculator/ptgcz4vzsw / https://cdn.discordapp.com/attachments/841461499946860576/993611452499902504/scrnli_7_4_2022_1-16-48_PM.png
hparams.decay_start = 15000         # wait till decay_start to start decaying learning rate
hparams.B_ = 8000                   # Decay Rate
hparams.C_ = 0                      # Shift learning rate equation by this value

# Quality of Life
generate_mels = True
hparams.show_alignments = True
alignment_graph_height = 600
alignment_graph_width = 1000

#@markdown #### The amount of epochs to train the model for.
#@markdown ###### You probably should not change this parameter, one of the reasons being that training a legacy TT2 model for an unnecessarily larger epoch amount does **not** result in it being better ultimately (Read the batch size paragraph).
hparams.epochs = 200 #@param {type: "integer"}

#@markdown #### The batch size. Lower if you don't have enough RAM.
#@markdown ###### If the GPU you've been allocated is a K80, use a batch size that is no larger than `16`, and if the GPU you've been allocated is a P100, use a batch size that is no larger than `32`.
#@markdown ###### Furthermore, training the model with a low batch size will give it more time to learn in the course of it being trained.
#@markdown ###### In particular, `4` is the ideal batch size to use when it comes to training a non-ARPAbet Tacotron 2 model, and the batch size of `8` should be used for training an ARPAbet TT2 model.
#@markdown ###### However, even though training a model with a low batch size gives it more time to learn, training a model with a batch size that is **too** low causes it to learn **too much** in the course of it being trained, which consequently overfits it.
#@markdown ###### That outcome is also what occurs as a result of a model being trained for too many epochs. If you're not sure about which one of the aforementioned batch sizes to choose to train the model with, use the preset batch size, which is `8` by default.
hparams.batch_size = 8 #@param {type: "integer"}
hparams.load_mel_from_disk = True
hparams.ignore_layers = [] # Layers to reset (None by default, other than foreign languages this param can be ignored)

#@markdown #### The learning rate and the minimum learning rate
hparams.learning_rate = 5e-4 #@param
hparams.min_learning_rate = 1e-5 #@param

torch.backends.cudnn.enabled = hparams.cudnn_enabled
torch.backends.cudnn.benchmark = hparams.cudnn_benchmark

#@markdown #### Where to save the model when training
output_directory = '/content/drive/MyDrive/Tacotron 2/models' #@param {type: "string"}
log_directory = '/content/tacotron2/logs' # Location to save Log files locally
log_directory2 = '/content/drive/MyDrive/Tacotron 2/logs' # Location to copy log files (done at the end of each epoch to cut down on I/O)
checkpoint_path = output_directory+(r'/')+model_filename

#@markdown ---

In [13]:
#@markdown ## **4** Convert the wavs into mel spectrograms
#@markdown ### This cell also checks for missing files
print("Generating mels")
if generate_mels:
    create_mels()

print("Checking for missing files")
# ---- Replace .wav with .npy in filelists ----
!sed -i -- 's,.wav|,.npy|,g' "{hparams.training_files}"; sed -i -- 's,.wav|,.npy|,g' "{hparams.validation_files}"

check_dataset(hparams)

Generating mels
Generating Mels


  0%|          | 0/100 [00:00<?, ?it/s]

Checking for missing files
Checking Training Files
 


IndexError: ignored

In [None]:
#@markdown # **5** Train the model

print('FP16 Run:', hparams.fp16_run)
print('Dynamic Loss Scaling:', hparams.dynamic_loss_scaling)
print('Distributed Run:', hparams.distributed_run)
print('cuDNN Enabled:', hparams.cudnn_enabled)
print('cuDNN Benchmark:', hparams.cudnn_benchmark)
train(output_directory, log_directory, checkpoint_path,
      warm_start, n_gpus, rank, group_name, hparams, log_directory2)

# **Now, the eggheads think this is what good training looks like:**

![img.png](https://media.discordapp.net/attachments/835971020569051216/851469553355587614/download_2.png)

# **But I think... it looks more like this:**

![img.png](https://cdn.discordapp.com/attachments/625091103887982602/992537047879393450/Screenshotter--YouTube-KingOfTheHillAdsStricklandMessageadultswimhistory-008.jpg)

JS to prevent idle timeout:

Press F12 OR CTRL + SHIFT + I OR right click on this website -> inspect;
then click on the console tab and paste in the following code:

```javascript
function ClickConnect(){
console.log("Working");
document.querySelector("colab-toolbar-button#connect").click()
}
setInterval(ClickConnect,60000)
```