# TalkNet Training
_Last updated: 2021-06-23_

To train a 22KHz TalkNet, run the cells below and follow the instructions.

This will take a while, and you might have to do it in multiple Colab sessions. The notebook will automatically resume training any models from the last saved checkpoint. If you're resuming from a new session, always re-run steps 1 through 5 first.

##**IMPORTANT:** 
Your Trash folder on Drive will fill up with old checkpoints
as you train the various models. Keep an eye on your Drive storage, and empty the trash if it starts to become full.

In [None]:
#@markdown **Step 1:** Check which GPU you've been allocated.

#@markdown You want a P100, V100 or T4. 
#@markdown If you get a P4 or K80, factory reset the runtime and try again.
!nvidia-smi -L
!nvidia-smi

GPU 0: Tesla P100-PCIE-16GB (UUID: GPU-4d3b4a6e-7b22-a6a9-d9c9-c29b0296cf5b)
Thu Jul 22 19:00:27 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.42.01    Driver Version: 460.32.03    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  Tesla P100-PCIE...  Off  | 00000000:00:04.0 Off |                    0 |
| N/A   48C    P0    32W / 250W |      0MiB / 16280MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+----------

In [None]:
#@markdown **Step 2:** Mount Google Drive.
from google.colab import drive
drive.mount('drive')

Drive already mounted at drive; to attempt to forcibly remount, call drive.mount("drive", force_remount=True).


In [None]:
#@markdown **Step 3:** Configure training data paths. Upload the following to your Drive and change the paths below:
#@markdown * A dataset of .wav files, packaged as a .zip or .tar file
#@markdown * Training and validation filelists, in LJSpeech format with relative paths (note: ARPABET transcripts don't work yet)
#@markdown * An output path for checkpoints

import os

dataset = "/content/drive/My Drive/colab/queenwavs.zip" #@param {type:"string"}
train_filelist = "/content/drive/My Drive/colab/queenlist.txt" #@param {type:"string"}
val_filelist = "/content/drive/My Drive/colab/queenlist.txt" #@param {type:"string"}
output_dir = "/content/drive/My Drive/talknet/QueenTalk" #@param {type:"string"}
assert os.path.exists(dataset), "Cannot find dataset"
assert os.path.exists(train_filelist), "Cannot find training filelist"
assert os.path.exists(val_filelist), "Cannot find validation filelist"
if not os.path.exists(output_dir):
   os.makedirs(output_dir)
print("OK")


OK


In [None]:
#@markdown **Step 4:** Download NVIDIA NeMo.
%tensorflow_version 2.x
import os
import time
!pip install torch_stft
!pip install kaldiio
!pip install pydub
!pip install frozendict
!pip install unidecode
!pip install pyannote.audio
!pip install g2p_en
!pip install crepe
!pip install pesq
!pip install pystoi

os.chdir('/content')
!apt-get install sox libsndfile1 ffmpeg
!pip install wget unidecode tensorboardX pysptk frozendict torch_stft torchtext==0.9.1 torchaudio==0.8.0 kaldiio pydub pyannote.audio g2p_en pesq pystoi crepe ffmpeg-python
!python -m pip install git+https://github.com/SortAnon/NeMo.git
!git clone -q https://github.com/SortAnon/hifi-gan.git

!mkdir -p conf && cd conf \
&& wget https://raw.githubusercontent.com/SortAnon/NeMo/main/examples/tts/conf/talknet-durs.yaml \
&& wget https://raw.githubusercontent.com/SortAnon/NeMo/main/examples/tts/conf/talknet-pitch.yaml \
&& wget https://raw.githubusercontent.com/SortAnon/NeMo/main/examples/tts/conf/talknet-spect.yaml \
&& cd ..

# Download pre-trained models
zip_path = "tts_en_talknet_1.0.0rc1.zip"
for i in range(10):
    if not os.path.exists(zip_path) or os.stat(zip_path).st_size < 100:
        time.sleep(1)
        !wget --content-disposition https://api.ngc.nvidia.com/v2/models/nvidia/nemo/tts_en_talknet/versions/1.0.0rc1/zip -O {zip_path}
!unzip -qo {zip_path}

Reading package lists... Done
Building dependency tree       
Reading state information... Done
libsndfile1 is already the newest version (1.0.28-4ubuntu0.18.04.1).
ffmpeg is already the newest version (7:3.4.8-0ubuntu0.2).
sox is already the newest version (14.4.2-3ubuntu0.18.04.1).
0 upgraded, 0 newly installed, 0 to remove and 40 not upgraded.
Collecting tensorboardX
  Using cached tensorboardX-2.4-py2.py3-none-any.whl (124 kB)
Collecting torchtext==0.9.1
  Using cached torchtext-0.9.1-cp37-cp37m-manylinux1_x86_64.whl (7.1 MB)
Collecting torchaudio==0.8.0
  Using cached torchaudio-0.8.0-cp37-cp37m-manylinux1_x86_64.whl (1.9 MB)
Collecting ffmpeg-python
  Using cached ffmpeg_python-0.2.0-py3-none-any.whl (25 kB)
Collecting torch==1.8.1
  Using cached torch-1.8.1-cp37-cp37m-manylinux1_x86_64.whl (804.1 MB)
INFO: pip is looking at multiple versions of <Python from Requires-Python> to determine which version is compatible with other requirements. This could take a while.
INFO: pip is lo

In [None]:
#@markdown **Step 5:** Dataset processing, part 1.

#@markdown If this step fails, try the following:
#@markdown * Make sure your filelists are correct. They should have relative 
#@markdown paths that match the contents of the archive.

import os
import shutil
import sys
import json
import nemo
import torch
!pip install torchaudio
!pip install pysptk
!pip install ffmpeg
import torchaudio
import numpy as np
from pysptk import sptk
from pathlib import Path
from tqdm.notebook import tqdm
import ffmpeg

def generate_json(inpath, outpath):
    output = ""
    sample_rate = 22050
    with open(inpath, "r", encoding="utf8") as f:
        for l in f.readlines():
            lpath = l.split("|")[0].strip()
            if lpath[:5] != "wavs/":
                lpath = "wavs/" + lpath
            size = os.stat(
                os.path.join(os.path.dirname(inpath), lpath)
            ).st_size
            x = {
                "audio_filepath": lpath,
                "duration": size / (sample_rate * 2),
                "text": l.split("|")[1].strip(),
            }
            output += json.dumps(x) + "\n"
        with open(outpath, "w", encoding="utf8") as w:
            w.write(output)
import subprocess
def convert_to_22k(inpath):
    a=inpath.split("/content/wavs/",1)[1] 
    b=inpath.strip()[-4:].lower()
    c=a.split(b,1)[0]+ ".wav"
    if b != ".wav":
      subprocess.run("ffmpeg -i %a -acodec pcm_u8 -ar 22050 %b" %a,b)

# Extract dataset
os.chdir('/content')
if os.path.exists("/content/wavs"):
    shutil.rmtree("/content/wavs")
os.mkdir("wavs")
os.chdir("wavs")
if dataset[-4:] == ".zip":
    !unzip -q "{dataset}"
elif dataset[-4:] == ".tar":
    !tar -xf "{dataset}"
else:
    raise Exception("Unknown extension for dataset")
if os.path.exists("/content/wavs/wavs"):
    shutil.move("/content/wavs/wavs", "/content/tempwavs")
    shutil.rmtree("/content/wavs")
    shutil.move("/content/tempwavs", "/content/wavs")

# Filelist for preprocessing
os.chdir('/content')
shutil.copy(train_filelist, "trainfiles.txt")
shutil.copy(val_filelist, "valfiles.txt")
seen_files = []
with open("trainfiles.txt") as f:
    t = f.read().split("\n")
with open("valfiles.txt") as f:
    v = f.read().split("\n")
    all_filelist = t[:] + v[:]
with open("/content/allfiles.txt", "w") as f:
    for x in all_filelist:
        if x.strip() == "":
            continue
        if x.split("|")[0] not in seen_files:
            seen_files.append(x.split("|")[0])
            f.write(x.strip() + "\n")

# Ensure audio is 22k
print("Converting audio...")
for r, _, f in os.walk("/content/wavs"):
    for name in tqdm(f):
        convert_to_22k(os.path.join(r, name))

# Convert to JSON
generate_json("trainfiles.txt", "trainfiles.json")
generate_json("valfiles.txt", "valfiles.json")
generate_json("allfiles.txt", "allfiles.json")

print("OK")

Converting audio...


HBox(children=(FloatProgress(value=0.0, max=75.0), HTML(value='')))


OK


In [None]:
#@markdown **Step 6:** Dataset processing, part 2. This takes a while, but
#@markdown you only have to run this once per dataset (results are saved to Drive).

#@markdown If this step fails, try the following:
#@markdown * Make sure your dataset only contains WAV files.

# Extract phoneme duration
!pip install crepe
import json
from nemo.collections.asr.models import EncDecCTCModel
asr_model = EncDecCTCModel.from_pretrained(model_name="asr_talknet_aligner").cpu().eval()

def forward_extractor(tokens, log_probs, blank):
    """Computes states f and p."""
    n, m = len(tokens), log_probs.shape[0]
    # `f[s, t]` -- max sum of log probs for `s` first codes
    # with `t` first timesteps with ending in `tokens[s]`.
    f = np.empty((n + 1, m + 1), dtype=float)
    f.fill(-(10 ** 9))
    p = np.empty((n + 1, m + 1), dtype=int)
    f[0, 0] = 0.0  # Start
    for s in range(1, n + 1):
        c = tokens[s - 1]
        for t in range((s + 1) // 2, m + 1):
            f[s, t] = log_probs[t - 1, c]
            # Option #1: prev char is equal to current one.
            if s == 1 or c == blank or c == tokens[s - 3]:
                options = f[s : (s - 2 if s > 1 else None) : -1, t - 1]
            else:  # Is not equal to current one.
                options = f[s : (s - 3 if s > 2 else None) : -1, t - 1]
            f[s, t] += np.max(options)
            p[s, t] = np.argmax(options)
    return f, p


def backward_extractor(f, p):
    """Computes durs from f and p."""
    n, m = f.shape
    n -= 1
    m -= 1
    durs = np.zeros(n, dtype=int)
    if f[-1, -1] >= f[-2, -1]:
        s, t = n, m
    else:
        s, t = n - 1, m
    while s > 0:
        durs[s - 1] += 1
        s -= p[s, t]
        t -= 1
    assert durs.shape[0] == n
    assert np.sum(durs) == m
    assert np.all(durs[1::2] > 0)
    return durs

def preprocess_tokens(tokens, blank):
    new_tokens = [blank]
    for c in tokens:
        new_tokens.extend([c, blank])
    tokens = new_tokens
    return tokens

data_config = {
    'manifest_filepath': "allfiles.json",
    'sample_rate': 22050,
    'labels': asr_model.decoder.vocabulary,
    'batch_size': 1,
}

parser = nemo.collections.asr.data.audio_to_text.AudioToCharWithDursF0Dataset.make_vocab(
    notation='phonemes', punct=True, spaces=True, stresses=False, add_blank_at="last"
)

dataset = nemo.collections.asr.data.audio_to_text._AudioTextDataset(
    manifest_filepath=data_config['manifest_filepath'], sample_rate=data_config['sample_rate'], parser=parser,
)

dl = torch.utils.data.DataLoader(
    dataset=dataset, batch_size=data_config['batch_size'], collate_fn=dataset.collate_fn, shuffle=False,
)

blank_id = asr_model.decoder.num_classes_with_blank - 1

if os.path.exists(os.path.join(output_dir, "durations.pt")):
    print("durations.pt already exists; skipping")
else:
    dur_data = {}
    for sample_idx, test_sample in tqdm(enumerate(dl), total=len(dl)):
        log_probs, _, greedy_predictions = asr_model(
            input_signal=test_sample[0], input_signal_length=test_sample[1]
        )

        log_probs = log_probs[0].cpu().detach().numpy()
        seq_ids = test_sample[2][0].cpu().detach().numpy()

        target_tokens = preprocess_tokens(seq_ids, blank_id)

        f, p = forward_extractor(target_tokens, log_probs, blank_id)
        durs = backward_extractor(f, p)

        dur_key = Path(dl.dataset.collection[sample_idx].audio_file).stem
        dur_data[dur_key] = {
            'blanks': torch.tensor(durs[::2], dtype=torch.long).cpu().detach(), 
            'tokens': torch.tensor(durs[1::2], dtype=torch.long).cpu().detach()
        }

        del test_sample

    torch.save(dur_data, os.path.join(output_dir, "durations.pt"))

#Extract F0 (pitch)
import crepe
from scipy.io import wavfile

def crepe_f0(audio_file, hop_length=256):
    sr, audio = wavfile.read(audio_file)
    audio_x = np.arange(0, len(audio)) / 22050.0
    time, frequency, confidence, activation = crepe.predict(audio, sr, viterbi=True)

    x = np.arange(0, len(audio), hop_length) / 22050.0
    freq_interp = np.interp(x, time, frequency)
    conf_interp = np.interp(x, time, confidence)
    audio_interp = np.interp(x, audio_x, np.absolute(audio)) / 32768.0
    weights = [0.5, 0.25, 0.25]
    audio_smooth = np.convolve(audio_interp, np.array(weights)[::-1], "same")

    conf_threshold = 0.25
    audio_threshold = 0.0005
    for i in range(len(freq_interp)):
        if conf_interp[i] < conf_threshold:
            freq_interp[i] = 0.0
        if audio_smooth[i] < audio_threshold:
            freq_interp[i] = 0.0

    # Hack to make f0 and mel lengths equal
    if len(audio) % hop_length == 0:
        freq_interp = np.pad(freq_interp, pad_width=[0, 1])
    return torch.from_numpy(freq_interp.astype(np.float32))

if os.path.exists(os.path.join(output_dir, "f0s.pt")):
    print("f0s.pt already exists; skipping")
else:
    f0_data = {}
    with open("allfiles.json") as f:
        for i, l in enumerate(f.readlines()):
            print(str(i))
            audio_path = json.loads(l)["audio_filepath"]
            f0_data[Path(audio_path).stem] = crepe_f0(audio_path)

    # calculate f0 stats (mean & std) only for train set
    with open("trainfiles.json") as f:
        train_ids = {Path(json.loads(l)["audio_filepath"]).stem for l in f}
    all_f0 = torch.cat([f0[f0 >= 1e-5] for f0_id, f0 in f0_data.items() if f0_id in train_ids])

    F0_MEAN, F0_STD = all_f0.mean().item(), all_f0.std().item()        
    print("F0_MEAN: " + str(F0_MEAN) + ", F0_STD: " + str(F0_STD))
    torch.save(f0_data, os.path.join(output_dir, "f0s.pt"))
    with open(os.path.join(output_dir, "f0_info.json"), "w") as f:
        f.write(json.dumps({"FO_MEAN": F0_MEAN, "F0_STD": F0_STD}))



[NeMo W 2021-07-22 19:02:06 optimizers:47] Apex was not found. Using the lamb optimizer will error out.


[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /root/nltk_data...
[nltk_data]   Package averaged_perceptron_tagger is already up-to-
[nltk_data]       date!
[nltk_data] Downloading package cmudict to /root/nltk_data...
[nltk_data]   Package cmudict is already up-to-date!


[NeMo W 2021-07-22 19:02:12 experimental:28] Module <class 'nemo.collections.asr.data.audio_to_text_dali.AudioToCharDALIDataset'> is experimental, not ready for production and is not fully supported. Use at your own risk.


[NeMo I 2021-07-22 19:02:12 cloud:56] Found existing object /root/.cache/torch/NeMo/NeMo_1.0.2/qn5x5_libri_tts_phonemes/656c7439dd3a0d614978529371be498b/qn5x5_libri_tts_phonemes.nemo.
[NeMo I 2021-07-22 19:02:12 cloud:62] Re-using file from: /root/.cache/torch/NeMo/NeMo_1.0.2/qn5x5_libri_tts_phonemes/656c7439dd3a0d614978529371be498b/qn5x5_libri_tts_phonemes.nemo
[NeMo I 2021-07-22 19:02:12 common:676] Instantiating model from pre-trained checkpoint


[NeMo W 2021-07-22 19:02:13 features:230] Using torch_stft is deprecated and will be removed in 1.1.0. Please set stft_conv and stft_exact_pad to False for FilterbankFeatures and AudioToMelSpectrogramPreprocessor. Please set exact_pad to True as needed.


[NeMo I 2021-07-22 19:02:13 features:252] PADDING: 1
[NeMo I 2021-07-22 19:02:13 features:262] STFT using conv
[NeMo I 2021-07-22 19:02:17 modelPT:439] Model EncDecCTCModel was successfully restored from /root/.cache/torch/NeMo/NeMo_1.0.2/qn5x5_libri_tts_phonemes/656c7439dd3a0d614978529371be498b/qn5x5_libri_tts_phonemes.nemo.
[NeMo I 2021-07-22 19:02:17 collections:173] Dataset loaded with 75 files totalling 0.05 hours
[NeMo I 2021-07-22 19:02:17 collections:174] 0 files were filtered totalling 0.00 hours
durations.pt already exists; skipping
f0s.pt already exists; skipping


In [None]:
#@markdown **Step 7:** Train duration predictor.
!pip install pystoi
#@markdown If CUDA runs out of memory, try the following:
#@markdown * Click on Runtime -> Restart runtime, re-run step 3, and try again.
#@markdown * If that doesn't help, reduce the batch size (default 64).
batch_size = 64 #@param {type:"integer"}

epochs = 20
learning_rate = 1e-3
min_learning_rate = 3e-6
load_checkpoints = True

import os
from hydra.experimental import compose, initialize
from hydra.core.global_hydra import GlobalHydra
from omegaconf import OmegaConf
import pytorch_lightning as pl
from nemo.collections.common.callbacks import LogEpochTimeCallback
from nemo.collections.tts.models import TalkNetDursModel
from nemo.core.config import hydra_runner
from nemo.utils.exp_manager import exp_manager

def train(cfg):
    cfg.sample_rate = 22050
    cfg.train_dataset = "trainfiles.json"
    cfg.validation_datasets = "valfiles.json"
    cfg.durs_file = os.path.join(output_dir, "durations.pt")
    cfg.f0_file = os.path.join(output_dir, "f0s.pt")
    cfg.trainer.accelerator = "dp"
    cfg.trainer.max_epochs = epochs
    cfg.trainer.check_val_every_n_epoch = 5
    cfg.model.train_ds.dataloader_params.batch_size = batch_size
    cfg.model.validation_ds.dataloader_params.batch_size = batch_size
    cfg.model.optim.lr = learning_rate
    cfg.model.optim.sched.min_lr = min_learning_rate
    cfg.exp_manager.exp_dir = output_dir

    # Find checkpoints
    ckpt_path = ""
    if load_checkpoints:
      path0 = os.path.join(output_dir, "TalkNetDurs")
      if os.path.exists(path0):
          path1 = sorted(os.listdir(path0))
          for i in range(len(path1)):
              path2 = os.path.join(path0, path1[-(1+i)], "checkpoints")
              if os.path.exists(path2):
                  match = [x for x in os.listdir(path2) if "last.ckpt" in x]
                  if len(match) > 0:
                      ckpt_path = os.path.join(path2, match[0])
                      print("Resuming training from " + match[0])
                      break
    
    if ckpt_path != "":
        trainer = pl.Trainer(**cfg.trainer, resume_from_checkpoint = ckpt_path)
        model = TalkNetDursModel(cfg=cfg.model, trainer=trainer)
    else:
        warmstart_path = "/content/talknet_durs.nemo"
        trainer = pl.Trainer(**cfg.trainer)
        model = TalkNetDursModel.restore_from(warmstart_path, override_config_path=cfg)
        model.set_trainer(trainer)
        model.setup_training_data(cfg.model.train_ds)
        model.setup_validation_data(cfg.model.validation_ds)
        model.setup_optimization(cfg.model.optim)
        print("Warm-starting from " + warmstart_path)
    exp_manager(trainer, cfg.get('exp_manager', None))
    trainer.callbacks.extend([pl.callbacks.LearningRateMonitor(), LogEpochTimeCallback()])  # noqa
    trainer.fit(model)

GlobalHydra().clear()
initialize(config_path="conf")
cfg = compose(config_name="talknet-durs")
train(cfg)




      message="hydra.experimental.initialize() is no longer experimental."
    
      message="hydra.experimental.compose() is no longer experimental."
    
GPU available: True, used: True
TPU available: False, using: 0 TPU cores
[NeMo W 2021-07-22 19:02:22 modelPT:139] If you intend to do training or fine-tuning, please call the ModelPT.setup_training_data() method and provide a valid configuration file to setup the train data loader.
    Train config : 
    dataset:
      _target_: nemo.collections.asr.data.audio_to_text.AudioToCharWithDursF0Dataset
      manifest_filepath: trainfiles.json
      max_duration: null
      min_duration: 0.1
      int_values: false
      load_audio: false
      normalize: false
      sample_rate: 22050
      trim: false
      durs_file: /content/drive/My Drive/talknet/QueenTalk/durations.pt
      f0_file: /content/drive/My Drive/talknet/QueenTalk/f0s.pt
      blanking: true
      vocab:
        notation: phonemes
        punct: true
        spaces: true


[NeMo I 2021-07-22 19:02:22 modelPT:439] Model TalkNetDursModel was successfully restored from /content/talknet_durs.nemo.
[NeMo I 2021-07-22 19:02:22 collections:173] Dataset loaded with 75 files totalling 0.05 hours
[NeMo I 2021-07-22 19:02:22 collections:174] 0 files were filtered totalling 0.00 hours


      cpuset_checked))
    


[NeMo I 2021-07-22 19:02:22 collections:173] Dataset loaded with 75 files totalling 0.05 hours
[NeMo I 2021-07-22 19:02:22 collections:174] 0 files were filtered totalling 0.00 hours


[NeMo W 2021-07-22 19:02:23 modelPT:661] The lightning trainer received accelerator: dp. We recommend to use 'ddp' instead.


[NeMo I 2021-07-22 19:02:23 modelPT:751] Optimizer config = Adam (
    Parameter Group 0
        amsgrad: False
        betas: (0.9, 0.999)
        eps: 1e-08
        lr: 0.001
        weight_decay: 1e-06
    )
[NeMo I 2021-07-22 19:02:23 lr_scheduler:625] Scheduler "<nemo.core.optim.lr_scheduler.CosineAnnealing object at 0x7fa95a0a4790>" 
    will be used during training (effective maximum steps = 40) - 
    Parameters : 
    (min_lr: 3.0e-06
    warmup_ratio: 0.02
    max_steps: 40
    )
Warm-starting from /content/talknet_durs.nemo
[NeMo I 2021-07-22 19:02:23 exp_manager:216] Experiments will be logged at /content/drive/My Drive/talknet/QueenTalk/TalkNetDurs/2021-07-22_19-02-23
[NeMo I 2021-07-22 19:02:23 exp_manager:563] TensorboardLogger has been set up


      'Argument `period` in `ModelCheckpoint` is deprecated in v1.3 and will be removed in v1.5.'
    
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
[NeMo W 2021-07-22 19:02:23 modelPT:661] The lightning trainer received accelerator: dp. We recommend to use 'ddp' instead.


[NeMo I 2021-07-22 19:02:23 modelPT:751] Optimizer config = Adam (
    Parameter Group 0
        amsgrad: False
        betas: (0.9, 0.999)
        eps: 1e-08
        lr: 0.001
        weight_decay: 1e-06
    )
[NeMo I 2021-07-22 19:02:23 lr_scheduler:625] Scheduler "<nemo.core.optim.lr_scheduler.CosineAnnealing object at 0x7fa95a0927d0>" 
    will be used during training (effective maximum steps = 40) - 
    Parameters : 
    (min_lr: 3.0e-06
    warmup_ratio: 0.02
    max_steps: 40
    )



  | Name  | Type           | Params
-----------------------------------------
0 | embed | Embedding      | 7.6 K 
1 | model | ConvASREncoder | 2.5 M 
2 | proj  | Conv1d         | 513   
-----------------------------------------
2.5 M     Trainable params
0         Non-trainable params
2.5 M     Total params
9.841     Total estimated model params size (MB)


HBox(children=(FloatProgress(value=1.0, bar_style='info', description='Validation sanity check', layout=Layout…

      value = torch.tensor(value, device=device, dtype=torch.float)
    




HBox(children=(FloatProgress(value=1.0, bar_style='info', description='Training', layout=Layout(flex='2'), max…

      cpuset_checked))
    
      "The signature of `Callback.on_train_epoch_end` has changed in v1.3."
    


HBox(children=(FloatProgress(value=1.0, bar_style='info', description='Validating', layout=Layout(flex='2'), m…

Epoch 4, global step 9: val_acc reached 80.74003 (best 80.74003), saving model to "/content/drive/My Drive/talknet/QueenTalk/TalkNetDurs/2021-07-22_19-02-23/checkpoints/TalkNetDurs--val_acc=80.74-epoch=4.ckpt" as top 3


HBox(children=(FloatProgress(value=1.0, bar_style='info', description='Validating', layout=Layout(flex='2'), m…

Epoch 9, global step 19: val_acc reached 81.45404 (best 81.45404), saving model to "/content/drive/My Drive/talknet/QueenTalk/TalkNetDurs/2021-07-22_19-02-23/checkpoints/TalkNetDurs--val_acc=81.45-epoch=9.ckpt" as top 3


HBox(children=(FloatProgress(value=1.0, bar_style='info', description='Validating', layout=Layout(flex='2'), m…

Epoch 14, global step 29: val_acc reached 81.72182 (best 81.72182), saving model to "/content/drive/My Drive/talknet/QueenTalk/TalkNetDurs/2021-07-22_19-02-23/checkpoints/TalkNetDurs--val_acc=81.72-epoch=14.ckpt" as top 3


HBox(children=(FloatProgress(value=1.0, bar_style='info', description='Validating', layout=Layout(flex='2'), m…

Epoch 19, global step 39: val_acc reached 81.97624 (best 81.97624), saving model to "/content/drive/My Drive/talknet/QueenTalk/TalkNetDurs/2021-07-22_19-02-23/checkpoints/TalkNetDurs--val_acc=81.98-epoch=19.ckpt" as top 3
Saving latest checkpoint...





In [None]:
#@markdown **Step 8:** Train pitch predictor.

#@markdown If CUDA runs out of memory, try the following:
#@markdown * Click on Runtime -> Restart runtime, re-run step 3, and try again.
#@markdown * If that doesn't help, reduce the batch size (default 64).
batch_size = 64 #@param {type:"integer"}
epochs = 50

import json

with open(os.path.join(output_dir, "f0_info.json"), "r") as f:
    f0_info = json.load(f)
    f0_mean = f0_info["FO_MEAN"]
    f0_std = f0_info["F0_STD"]

learning_rate = 1e-3
min_learning_rate = 3e-6
load_checkpoints = True

import os
from hydra.experimental import compose, initialize
from hydra.core.global_hydra import GlobalHydra
from omegaconf import OmegaConf
import pytorch_lightning as pl
from nemo.collections.common.callbacks import LogEpochTimeCallback
from nemo.collections.tts.models import TalkNetPitchModel
from nemo.core.config import hydra_runner
from nemo.utils.exp_manager import exp_manager

def train(cfg):
    cfg.sample_rate = 22050
    cfg.train_dataset = "trainfiles.json"
    cfg.validation_datasets = "valfiles.json"
    cfg.durs_file = os.path.join(output_dir, "durations.pt")
    cfg.f0_file = os.path.join(output_dir, "f0s.pt")
    cfg.trainer.accelerator = "dp"
    cfg.trainer.max_epochs = epochs
    cfg.trainer.check_val_every_n_epoch = 5
    cfg.model.f0_mean=f0_mean
    cfg.model.f0_std=f0_std
    cfg.model.train_ds.dataloader_params.batch_size = batch_size
    cfg.model.validation_ds.dataloader_params.batch_size = batch_size
    cfg.model.optim.lr = learning_rate
    cfg.model.optim.sched.min_lr = min_learning_rate
    cfg.exp_manager.exp_dir = output_dir

    # Find checkpoints
    ckpt_path = ""
    if load_checkpoints:
      path0 = os.path.join(output_dir, "TalkNetPitch")
      if os.path.exists(path0):
          path1 = sorted(os.listdir(path0))
          for i in range(len(path1)):
              path2 = os.path.join(path0, path1[-(1+i)], "checkpoints")
              if os.path.exists(path2):
                  match = [x for x in os.listdir(path2) if "last.ckpt" in x]
                  if len(match) > 0:
                      ckpt_path = os.path.join(path2, match[0])
                      print("Resuming training from " + match[0])
                      break
    
    if ckpt_path != "":
        trainer = pl.Trainer(**cfg.trainer, resume_from_checkpoint = ckpt_path)
        model = TalkNetPitchModel(cfg=cfg.model, trainer=trainer)
    else:
        warmstart_path = "/content/talknet_pitch.nemo"
        trainer = pl.Trainer(**cfg.trainer)
        model = TalkNetPitchModel.restore_from(warmstart_path, override_config_path=cfg)
        model.set_trainer(trainer)
        model.setup_training_data(cfg.model.train_ds)
        model.setup_validation_data(cfg.model.validation_ds)
        model.setup_optimization(cfg.model.optim)
        print("Warm-starting from " + warmstart_path)
    exp_manager(trainer, cfg.get('exp_manager', None))
    trainer.callbacks.extend([pl.callbacks.LearningRateMonitor(), LogEpochTimeCallback()])  # noqa
    trainer.fit(model)

GlobalHydra().clear()
initialize(config_path="conf")
cfg = compose(config_name="talknet-pitch")
train(cfg)

      message="hydra.experimental.initialize() is no longer experimental."
    
      message="hydra.experimental.compose() is no longer experimental."
    
GPU available: True, used: True
TPU available: False, using: 0 TPU cores
[NeMo W 2021-07-22 19:02:39 modelPT:139] If you intend to do training or fine-tuning, please call the ModelPT.setup_training_data() method and provide a valid configuration file to setup the train data loader.
    Train config : 
    dataset:
      _target_: nemo.collections.asr.data.audio_to_text.AudioToCharWithDursF0Dataset
      manifest_filepath: trainfiles.json
      max_duration: null
      min_duration: 0.1
      int_values: false
      load_audio: false
      normalize: false
      sample_rate: 22050
      trim: false
      durs_file: /content/drive/My Drive/talknet/QueenTalk/durations.pt
      f0_file: /content/drive/My Drive/talknet/QueenTalk/f0s.pt
      blanking: true
      vocab:
        notation: phonemes
        punct: true
        spaces: true


[NeMo I 2021-07-22 19:02:40 modelPT:439] Model TalkNetPitchModel was successfully restored from /content/talknet_pitch.nemo.
[NeMo I 2021-07-22 19:02:40 collections:173] Dataset loaded with 75 files totalling 0.05 hours
[NeMo I 2021-07-22 19:02:40 collections:174] 0 files were filtered totalling 0.00 hours


      cpuset_checked))
    


[NeMo I 2021-07-22 19:02:40 collections:173] Dataset loaded with 75 files totalling 0.05 hours
[NeMo I 2021-07-22 19:02:40 collections:174] 0 files were filtered totalling 0.00 hours


[NeMo W 2021-07-22 19:02:40 modelPT:661] The lightning trainer received accelerator: dp. We recommend to use 'ddp' instead.


[NeMo I 2021-07-22 19:02:40 modelPT:751] Optimizer config = Adam (
    Parameter Group 0
        amsgrad: False
        betas: (0.9, 0.999)
        eps: 1e-08
        lr: 0.001
        weight_decay: 1e-06
    )
[NeMo I 2021-07-22 19:02:40 lr_scheduler:625] Scheduler "<nemo.core.optim.lr_scheduler.CosineAnnealing object at 0x7fa959671a10>" 
    will be used during training (effective maximum steps = 100) - 
    Parameters : 
    (min_lr: 3.0e-06
    warmup_ratio: 0.02
    max_steps: 100
    )
Warm-starting from /content/talknet_pitch.nemo
[NeMo I 2021-07-22 19:02:40 exp_manager:216] Experiments will be logged at /content/drive/My Drive/talknet/QueenTalk/TalkNetPitch/2021-07-22_19-02-23
[NeMo I 2021-07-22 19:02:40 exp_manager:563] TensorboardLogger has been set up


      'Argument `period` in `ModelCheckpoint` is deprecated in v1.3 and will be removed in v1.5.'
    
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
[NeMo W 2021-07-22 19:02:40 modelPT:661] The lightning trainer received accelerator: dp. We recommend to use 'ddp' instead.


[NeMo I 2021-07-22 19:02:40 modelPT:751] Optimizer config = Adam (
    Parameter Group 0
        amsgrad: False
        betas: (0.9, 0.999)
        eps: 1e-08
        lr: 0.001
        weight_decay: 1e-06
    )
[NeMo I 2021-07-22 19:02:40 lr_scheduler:625] Scheduler "<nemo.core.optim.lr_scheduler.CosineAnnealing object at 0x7fa958725350>" 
    will be used during training (effective maximum steps = 100) - 
    Parameters : 
    (min_lr: 3.0e-06
    warmup_ratio: 0.02
    max_steps: 100
    )



  | Name      | Type              | Params
------------------------------------------------
0 | embed     | GaussianEmbedding | 7.6 K 
1 | model     | ConvASREncoder    | 2.5 M 
2 | sil_proj  | Conv1d            | 513   
3 | body_proj | Conv1d            | 513   
------------------------------------------------
2.5 M     Trainable params
0         Non-trainable params
2.5 M     Total params
9.843     Total estimated model params size (MB)


HBox(children=(FloatProgress(value=1.0, bar_style='info', description='Validation sanity check', layout=Layout…

      value = torch.tensor(value, device=device, dtype=torch.float)
    




HBox(children=(FloatProgress(value=1.0, bar_style='info', description='Training', layout=Layout(flex='2'), max…

      cpuset_checked))
    


HBox(children=(FloatProgress(value=1.0, bar_style='info', description='Validating', layout=Layout(flex='2'), m…

Epoch 4, global step 9: val_body_mae reached 29.43637 (best 29.43637), saving model to "/content/drive/My Drive/talknet/QueenTalk/TalkNetPitch/2021-07-22_19-02-23/checkpoints/TalkNetPitch--val_body_mae=29.44-epoch=4.ckpt" as top 3


HBox(children=(FloatProgress(value=1.0, bar_style='info', description='Validating', layout=Layout(flex='2'), m…

Epoch 9, global step 19: val_body_mae reached 16.89599 (best 16.89599), saving model to "/content/drive/My Drive/talknet/QueenTalk/TalkNetPitch/2021-07-22_19-02-23/checkpoints/TalkNetPitch--val_body_mae=16.90-epoch=9.ckpt" as top 3


HBox(children=(FloatProgress(value=1.0, bar_style='info', description='Validating', layout=Layout(flex='2'), m…

Epoch 14, global step 29: val_body_mae reached 13.00530 (best 13.00530), saving model to "/content/drive/My Drive/talknet/QueenTalk/TalkNetPitch/2021-07-22_19-02-23/checkpoints/TalkNetPitch--val_body_mae=13.01-epoch=14.ckpt" as top 3


HBox(children=(FloatProgress(value=1.0, bar_style='info', description='Validating', layout=Layout(flex='2'), m…

Epoch 19, global step 39: val_body_mae reached 10.61074 (best 10.61074), saving model to "/content/drive/My Drive/talknet/QueenTalk/TalkNetPitch/2021-07-22_19-02-23/checkpoints/TalkNetPitch--val_body_mae=10.61-epoch=19.ckpt" as top 3


HBox(children=(FloatProgress(value=1.0, bar_style='info', description='Validating', layout=Layout(flex='2'), m…

Epoch 24, global step 49: val_body_mae reached 9.76613 (best 9.76613), saving model to "/content/drive/My Drive/talknet/QueenTalk/TalkNetPitch/2021-07-22_19-02-23/checkpoints/TalkNetPitch--val_body_mae=9.77-epoch=24.ckpt" as top 3


HBox(children=(FloatProgress(value=1.0, bar_style='info', description='Validating', layout=Layout(flex='2'), m…

Epoch 29, global step 59: val_body_mae reached 8.89739 (best 8.89739), saving model to "/content/drive/My Drive/talknet/QueenTalk/TalkNetPitch/2021-07-22_19-02-23/checkpoints/TalkNetPitch--val_body_mae=8.90-epoch=29.ckpt" as top 3


HBox(children=(FloatProgress(value=1.0, bar_style='info', description='Validating', layout=Layout(flex='2'), m…

Epoch 34, global step 69: val_body_mae reached 8.25129 (best 8.25129), saving model to "/content/drive/My Drive/talknet/QueenTalk/TalkNetPitch/2021-07-22_19-02-23/checkpoints/TalkNetPitch--val_body_mae=8.25-epoch=34.ckpt" as top 3


HBox(children=(FloatProgress(value=1.0, bar_style='info', description='Validating', layout=Layout(flex='2'), m…

Epoch 39, global step 79: val_body_mae reached 7.98620 (best 7.98620), saving model to "/content/drive/My Drive/talknet/QueenTalk/TalkNetPitch/2021-07-22_19-02-23/checkpoints/TalkNetPitch--val_body_mae=7.99-epoch=39.ckpt" as top 3


HBox(children=(FloatProgress(value=1.0, bar_style='info', description='Validating', layout=Layout(flex='2'), m…

Epoch 44, global step 89: val_body_mae reached 7.81853 (best 7.81853), saving model to "/content/drive/My Drive/talknet/QueenTalk/TalkNetPitch/2021-07-22_19-02-23/checkpoints/TalkNetPitch--val_body_mae=7.82-epoch=44.ckpt" as top 3


HBox(children=(FloatProgress(value=1.0, bar_style='info', description='Validating', layout=Layout(flex='2'), m…

Epoch 49, global step 99: val_body_mae reached 7.84060 (best 7.81853), saving model to "/content/drive/My Drive/talknet/QueenTalk/TalkNetPitch/2021-07-22_19-02-23/checkpoints/TalkNetPitch--val_body_mae=7.84-epoch=49.ckpt" as top 3
Saving latest checkpoint...





In [None]:
#@markdown **Step 9:** Train spectrogram generator. 200+ epochs are recommended. 

#@markdown This is the slowest of the three models to train, and the hardest to
#@markdown get good results from. If your character sounds noisy or robotic,
#@markdown try improving the dataset, or adjusting the epochs and learning rate.

epochs = 200 #@param {type:"integer"}

#@markdown If CUDA runs out of memory, try the following:
#@markdown * Click on Runtime -> Restart runtime, re-run step 3, and try again.
#@markdown * If that doesn't help, reduce the batch size (default 32).
batch_size = 32 #@param {type:"integer"}

#@markdown Advanced settings. You can probably leave these at their defaults (1e-3, 3e-6, empty, checked).
learning_rate = 1e-3 #@param {type:"number"}
min_learning_rate = 3e-6 #@param {type:"number"}
pretrained_path = "" #@param {type:"string"}
load_checkpoints = True #@param {type:"boolean"}

import os
from hydra.experimental import compose, initialize
from hydra.core.global_hydra import GlobalHydra
from omegaconf import OmegaConf
import pytorch_lightning as pl
from nemo.collections.common.callbacks import LogEpochTimeCallback
from nemo.collections.tts.models import TalkNetSpectModel
from nemo.core.config import hydra_runner
from nemo.utils.exp_manager import exp_manager

def train(cfg):
    cfg.sample_rate = 22050
    cfg.train_dataset = "trainfiles.json"
    cfg.validation_datasets = "valfiles.json"
    cfg.durs_file = os.path.join(output_dir, "durations.pt")
    cfg.f0_file = os.path.join(output_dir, "f0s.pt")
    cfg.trainer.accelerator = "dp"
    cfg.trainer.max_epochs = epochs
    cfg.trainer.check_val_every_n_epoch = 5
    cfg.model.train_ds.dataloader_params.batch_size = batch_size
    cfg.model.validation_ds.dataloader_params.batch_size = batch_size
    cfg.model.optim.lr = learning_rate
    cfg.model.optim.sched.min_lr = min_learning_rate
    cfg.exp_manager.exp_dir = output_dir

    # Find checkpoints
    ckpt_path = ""
    if load_checkpoints:
      path0 = os.path.join(output_dir, "TalkNetSpect")
      if os.path.exists(path0):
          path1 = sorted(os.listdir(path0))
          for i in range(len(path1)):
              path2 = os.path.join(path0, path1[-(1+i)], "checkpoints")
              if os.path.exists(path2):
                  match = [x for x in os.listdir(path2) if "last.ckpt" in x]
                  if len(match) > 0:
                      ckpt_path = os.path.join(path2, match[0])
                      print("Resuming training from " + match[0])
                      break
    
    if ckpt_path != "":
        trainer = pl.Trainer(**cfg.trainer, resume_from_checkpoint = ckpt_path)
        model = TalkNetSpectModel(cfg=cfg.model, trainer=trainer)
    else:
        if pretrained_path != "":
            warmstart_path = pretrained_path
        else:
            warmstart_path = "/content/talknet_spect.nemo"
        trainer = pl.Trainer(**cfg.trainer)
        model = TalkNetSpectModel.restore_from(warmstart_path, override_config_path=cfg)
        model.set_trainer(trainer)
        model.setup_training_data(cfg.model.train_ds)
        model.setup_validation_data(cfg.model.validation_ds)
        model.setup_optimization(cfg.model.optim)
        print("Warm-starting from " + warmstart_path)
    exp_manager(trainer, cfg.get('exp_manager', None))
    trainer.callbacks.extend([pl.callbacks.LearningRateMonitor(), LogEpochTimeCallback()])  # noqa
    trainer.fit(model)

GlobalHydra().clear()
initialize(config_path="conf")
cfg = compose(config_name="talknet-spect")
train(cfg)

      message="hydra.experimental.initialize() is no longer experimental."
    
      message="hydra.experimental.compose() is no longer experimental."
    
GPU available: True, used: True
TPU available: False, using: 0 TPU cores
[NeMo W 2021-07-22 19:03:30 modelPT:139] If you intend to do training or fine-tuning, please call the ModelPT.setup_training_data() method and provide a valid configuration file to setup the train data loader.
    Train config : 
    dataset:
      _target_: nemo.collections.asr.data.audio_to_text.AudioToCharWithDursF0Dataset
      manifest_filepath: trainfiles.json
      max_duration: null
      min_duration: 0.1
      int_values: false
      load_audio: true
      normalize: false
      sample_rate: 22050
      trim: false
      durs_file: /content/drive/My Drive/talknet/QueenTalk/durations.pt
      f0_file: /content/drive/My Drive/talknet/QueenTalk/f0s.pt
      blanking: true
      vocab:
        notation: phonemes
        punct: true
        spaces: true
 

[NeMo I 2021-07-22 19:03:30 features:252] PADDING: 1
[NeMo I 2021-07-22 19:03:30 features:269] STFT using torch
[NeMo I 2021-07-22 19:03:30 modelPT:439] Model TalkNetSpectModel was successfully restored from /content/talknet_spect.nemo.
[NeMo I 2021-07-22 19:03:31 collections:173] Dataset loaded with 75 files totalling 0.05 hours
[NeMo I 2021-07-22 19:03:31 collections:174] 0 files were filtered totalling 0.00 hours


      cpuset_checked))
    


[NeMo I 2021-07-22 19:03:31 collections:173] Dataset loaded with 75 files totalling 0.05 hours
[NeMo I 2021-07-22 19:03:31 collections:174] 0 files were filtered totalling 0.00 hours


[NeMo W 2021-07-22 19:03:31 modelPT:661] The lightning trainer received accelerator: dp. We recommend to use 'ddp' instead.


[NeMo I 2021-07-22 19:03:31 modelPT:751] Optimizer config = Adam (
    Parameter Group 0
        amsgrad: False
        betas: (0.9, 0.999)
        eps: 1e-08
        lr: 0.001
        weight_decay: 1e-06
    )
[NeMo I 2021-07-22 19:03:31 lr_scheduler:625] Scheduler "<nemo.core.optim.lr_scheduler.CosineAnnealing object at 0x7fa9590657d0>" 
    will be used during training (effective maximum steps = 600) - 
    Parameters : 
    (min_lr: 3.0e-06
    warmup_ratio: 0.02
    max_steps: 600
    )
Warm-starting from /content/talknet_spect.nemo
[NeMo I 2021-07-22 19:03:31 exp_manager:216] Experiments will be logged at /content/drive/My Drive/talknet/QueenTalk/TalkNetSpect/2021-07-22_19-02-23
[NeMo I 2021-07-22 19:03:31 exp_manager:563] TensorboardLogger has been set up


      'Argument `period` in `ModelCheckpoint` is deprecated in v1.3 and will be removed in v1.5.'
    
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
[NeMo W 2021-07-22 19:03:31 modelPT:661] The lightning trainer received accelerator: dp. We recommend to use 'ddp' instead.


[NeMo I 2021-07-22 19:03:31 modelPT:751] Optimizer config = Adam (
    Parameter Group 0
        amsgrad: False
        betas: (0.9, 0.999)
        eps: 1e-08
        lr: 0.001
        weight_decay: 1e-06
    )
[NeMo I 2021-07-22 19:03:31 lr_scheduler:625] Scheduler "<nemo.core.optim.lr_scheduler.CosineAnnealing object at 0x7fa95909c390>" 
    will be used during training (effective maximum steps = 600) - 
    Parameters : 
    (min_lr: 3.0e-06
    warmup_ratio: 0.02
    max_steps: 600
    )



  | Name         | Type                              | Params
-------------------------------------------------------------------
0 | preprocessor | AudioToMelSpectrogramPreprocessor | 0     
1 | embed        | GaussianEmbedding                 | 7.6 K 
2 | norm_f0      | MaskedInstanceNorm1d              | 0     
3 | res_f0       | StyleResidual                     | 512   
4 | model        | ConvASREncoder                    | 8.7 M 
5 | proj         | Conv1d                            | 82.0 K
-------------------------------------------------------------------
8.7 M     Trainable params
0         Non-trainable params
8.7 M     Total params
34.986    Total estimated model params size (MB)


HBox(children=(FloatProgress(value=1.0, bar_style='info', description='Validation sanity check', layout=Layout…

[NeMo W 2021-07-22 19:03:31 patch_utils:50] torch.stft() signature has been updated for PyTorch 1.7+
    Please update PyTorch to remain compatible with later versions of NeMo.
      value = torch.tensor(value, device=device, dtype=torch.float)
    




HBox(children=(FloatProgress(value=1.0, bar_style='info', description='Training', layout=Layout(flex='2'), max…

      cpuset_checked))
    


HBox(children=(FloatProgress(value=1.0, bar_style='info', description='Validating', layout=Layout(flex='2'), m…

Epoch 4, global step 14: val_loss reached 0.93634 (best 0.93634), saving model to "/content/drive/My Drive/talknet/QueenTalk/TalkNetSpect/2021-07-22_19-02-23/checkpoints/TalkNetSpect--val_loss=0.94-epoch=4.ckpt" as top 3


HBox(children=(FloatProgress(value=1.0, bar_style='info', description='Validating', layout=Layout(flex='2'), m…

Epoch 9, global step 29: val_loss reached 0.65983 (best 0.65983), saving model to "/content/drive/My Drive/talknet/QueenTalk/TalkNetSpect/2021-07-22_19-02-23/checkpoints/TalkNetSpect--val_loss=0.66-epoch=9.ckpt" as top 3


HBox(children=(FloatProgress(value=1.0, bar_style='info', description='Validating', layout=Layout(flex='2'), m…

Epoch 14, global step 44: val_loss reached 0.50001 (best 0.50001), saving model to "/content/drive/My Drive/talknet/QueenTalk/TalkNetSpect/2021-07-22_19-02-23/checkpoints/TalkNetSpect--val_loss=0.50-epoch=14.ckpt" as top 3


HBox(children=(FloatProgress(value=1.0, bar_style='info', description='Validating', layout=Layout(flex='2'), m…

Epoch 19, global step 59: val_loss reached 0.41748 (best 0.41748), saving model to "/content/drive/My Drive/talknet/QueenTalk/TalkNetSpect/2021-07-22_19-02-23/checkpoints/TalkNetSpect--val_loss=0.42-epoch=19.ckpt" as top 3


HBox(children=(FloatProgress(value=1.0, bar_style='info', description='Validating', layout=Layout(flex='2'), m…

Epoch 24, global step 74: val_loss reached 0.35107 (best 0.35107), saving model to "/content/drive/My Drive/talknet/QueenTalk/TalkNetSpect/2021-07-22_19-02-23/checkpoints/TalkNetSpect--val_loss=0.35-epoch=24.ckpt" as top 3


HBox(children=(FloatProgress(value=1.0, bar_style='info', description='Validating', layout=Layout(flex='2'), m…

Epoch 29, global step 89: val_loss reached 0.30696 (best 0.30696), saving model to "/content/drive/My Drive/talknet/QueenTalk/TalkNetSpect/2021-07-22_19-02-23/checkpoints/TalkNetSpect--val_loss=0.31-epoch=29.ckpt" as top 3


HBox(children=(FloatProgress(value=1.0, bar_style='info', description='Validating', layout=Layout(flex='2'), m…

Epoch 34, global step 104: val_loss reached 0.33488 (best 0.30696), saving model to "/content/drive/My Drive/talknet/QueenTalk/TalkNetSpect/2021-07-22_19-02-23/checkpoints/TalkNetSpect--val_loss=0.33-epoch=34.ckpt" as top 3


HBox(children=(FloatProgress(value=1.0, bar_style='info', description='Validating', layout=Layout(flex='2'), m…

Epoch 39, global step 119: val_loss reached 0.27840 (best 0.27840), saving model to "/content/drive/My Drive/talknet/QueenTalk/TalkNetSpect/2021-07-22_19-02-23/checkpoints/TalkNetSpect--val_loss=0.28-epoch=39.ckpt" as top 3


HBox(children=(FloatProgress(value=1.0, bar_style='info', description='Validating', layout=Layout(flex='2'), m…

Epoch 44, global step 134: val_loss reached 0.29412 (best 0.27840), saving model to "/content/drive/My Drive/talknet/QueenTalk/TalkNetSpect/2021-07-22_19-02-23/checkpoints/TalkNetSpect--val_loss=0.29-epoch=44.ckpt" as top 3


HBox(children=(FloatProgress(value=1.0, bar_style='info', description='Validating', layout=Layout(flex='2'), m…

Epoch 49, global step 149: val_loss reached 0.24645 (best 0.24645), saving model to "/content/drive/My Drive/talknet/QueenTalk/TalkNetSpect/2021-07-22_19-02-23/checkpoints/TalkNetSpect--val_loss=0.25-epoch=49.ckpt" as top 3


HBox(children=(FloatProgress(value=1.0, bar_style='info', description='Validating', layout=Layout(flex='2'), m…

Epoch 54, global step 164: val_loss reached 0.24240 (best 0.24240), saving model to "/content/drive/My Drive/talknet/QueenTalk/TalkNetSpect/2021-07-22_19-02-23/checkpoints/TalkNetSpect--val_loss=0.24-epoch=54.ckpt" as top 3


HBox(children=(FloatProgress(value=1.0, bar_style='info', description='Validating', layout=Layout(flex='2'), m…

Epoch 59, global step 179: val_loss reached 0.23844 (best 0.23844), saving model to "/content/drive/My Drive/talknet/QueenTalk/TalkNetSpect/2021-07-22_19-02-23/checkpoints/TalkNetSpect--val_loss=0.24-epoch=59.ckpt" as top 3


HBox(children=(FloatProgress(value=1.0, bar_style='info', description='Validating', layout=Layout(flex='2'), m…

Epoch 64, global step 194: val_loss reached 0.22155 (best 0.22155), saving model to "/content/drive/My Drive/talknet/QueenTalk/TalkNetSpect/2021-07-22_19-02-23/checkpoints/TalkNetSpect--val_loss=0.22-epoch=64.ckpt" as top 3


HBox(children=(FloatProgress(value=1.0, bar_style='info', description='Validating', layout=Layout(flex='2'), m…

Epoch 69, global step 209: val_loss reached 0.21172 (best 0.21172), saving model to "/content/drive/My Drive/talknet/QueenTalk/TalkNetSpect/2021-07-22_19-02-23/checkpoints/TalkNetSpect--val_loss=0.21-epoch=69.ckpt" as top 3


HBox(children=(FloatProgress(value=1.0, bar_style='info', description='Validating', layout=Layout(flex='2'), m…

Epoch 74, global step 224: val_loss reached 0.21961 (best 0.21172), saving model to "/content/drive/My Drive/talknet/QueenTalk/TalkNetSpect/2021-07-22_19-02-23/checkpoints/TalkNetSpect--val_loss=0.22-epoch=74.ckpt" as top 3


HBox(children=(FloatProgress(value=1.0, bar_style='info', description='Validating', layout=Layout(flex='2'), m…

Epoch 79, global step 239: val_loss reached 0.20116 (best 0.20116), saving model to "/content/drive/My Drive/talknet/QueenTalk/TalkNetSpect/2021-07-22_19-02-23/checkpoints/TalkNetSpect--val_loss=0.20-epoch=79.ckpt" as top 3


HBox(children=(FloatProgress(value=1.0, bar_style='info', description='Validating', layout=Layout(flex='2'), m…

Epoch 84, global step 254: val_loss reached 0.19025 (best 0.19025), saving model to "/content/drive/My Drive/talknet/QueenTalk/TalkNetSpect/2021-07-22_19-02-23/checkpoints/TalkNetSpect--val_loss=0.19-epoch=84.ckpt" as top 3


HBox(children=(FloatProgress(value=1.0, bar_style='info', description='Validating', layout=Layout(flex='2'), m…

Epoch 89, global step 269: val_loss reached 0.18740 (best 0.18740), saving model to "/content/drive/My Drive/talknet/QueenTalk/TalkNetSpect/2021-07-22_19-02-23/checkpoints/TalkNetSpect--val_loss=0.19-epoch=89.ckpt" as top 3


HBox(children=(FloatProgress(value=1.0, bar_style='info', description='Validating', layout=Layout(flex='2'), m…

Epoch 94, global step 284: val_loss reached 0.20004 (best 0.18740), saving model to "/content/drive/My Drive/talknet/QueenTalk/TalkNetSpect/2021-07-22_19-02-23/checkpoints/TalkNetSpect--val_loss=0.20-epoch=94.ckpt" as top 3


HBox(children=(FloatProgress(value=1.0, bar_style='info', description='Validating', layout=Layout(flex='2'), m…

Epoch 99, global step 299: val_loss reached 0.18409 (best 0.18409), saving model to "/content/drive/My Drive/talknet/QueenTalk/TalkNetSpect/2021-07-22_19-02-23/checkpoints/TalkNetSpect--val_loss=0.18-epoch=99.ckpt" as top 3


HBox(children=(FloatProgress(value=1.0, bar_style='info', description='Validating', layout=Layout(flex='2'), m…

Epoch 104, global step 314: val_loss was not in top 3


HBox(children=(FloatProgress(value=1.0, bar_style='info', description='Validating', layout=Layout(flex='2'), m…

Epoch 109, global step 329: val_loss reached 0.17179 (best 0.17179), saving model to "/content/drive/My Drive/talknet/QueenTalk/TalkNetSpect/2021-07-22_19-02-23/checkpoints/TalkNetSpect--val_loss=0.17-epoch=109.ckpt" as top 3


HBox(children=(FloatProgress(value=1.0, bar_style='info', description='Validating', layout=Layout(flex='2'), m…

Epoch 114, global step 344: val_loss reached 0.18105 (best 0.17179), saving model to "/content/drive/My Drive/talknet/QueenTalk/TalkNetSpect/2021-07-22_19-02-23/checkpoints/TalkNetSpect--val_loss=0.18-epoch=114.ckpt" as top 3


HBox(children=(FloatProgress(value=1.0, bar_style='info', description='Validating', layout=Layout(flex='2'), m…

Epoch 119, global step 359: val_loss reached 0.16717 (best 0.16717), saving model to "/content/drive/My Drive/talknet/QueenTalk/TalkNetSpect/2021-07-22_19-02-23/checkpoints/TalkNetSpect--val_loss=0.17-epoch=119.ckpt" as top 3


HBox(children=(FloatProgress(value=1.0, bar_style='info', description='Validating', layout=Layout(flex='2'), m…

Epoch 124, global step 374: val_loss reached 0.17733 (best 0.16717), saving model to "/content/drive/My Drive/talknet/QueenTalk/TalkNetSpect/2021-07-22_19-02-23/checkpoints/TalkNetSpect--val_loss=0.18-epoch=124.ckpt" as top 3


HBox(children=(FloatProgress(value=1.0, bar_style='info', description='Validating', layout=Layout(flex='2'), m…

Epoch 129, global step 389: val_loss reached 0.16042 (best 0.16042), saving model to "/content/drive/My Drive/talknet/QueenTalk/TalkNetSpect/2021-07-22_19-02-23/checkpoints/TalkNetSpect--val_loss=0.16-epoch=129.ckpt" as top 3


HBox(children=(FloatProgress(value=1.0, bar_style='info', description='Validating', layout=Layout(flex='2'), m…

Epoch 134, global step 404: val_loss reached 0.16079 (best 0.16042), saving model to "/content/drive/My Drive/talknet/QueenTalk/TalkNetSpect/2021-07-22_19-02-23/checkpoints/TalkNetSpect--val_loss=0.16-epoch=134.ckpt" as top 3


HBox(children=(FloatProgress(value=1.0, bar_style='info', description='Validating', layout=Layout(flex='2'), m…

Epoch 139, global step 419: val_loss reached 0.15884 (best 0.15884), saving model to "/content/drive/My Drive/talknet/QueenTalk/TalkNetSpect/2021-07-22_19-02-23/checkpoints/TalkNetSpect--val_loss=0.16-epoch=139.ckpt" as top 3


HBox(children=(FloatProgress(value=1.0, bar_style='info', description='Validating', layout=Layout(flex='2'), m…

Epoch 144, global step 434: val_loss was not in top 3


HBox(children=(FloatProgress(value=1.0, bar_style='info', description='Validating', layout=Layout(flex='2'), m…

Epoch 149, global step 449: val_loss reached 0.16017 (best 0.15884), saving model to "/content/drive/My Drive/talknet/QueenTalk/TalkNetSpect/2021-07-22_19-02-23/checkpoints/TalkNetSpect--val_loss=0.16-epoch=149.ckpt" as top 3


HBox(children=(FloatProgress(value=1.0, bar_style='info', description='Validating', layout=Layout(flex='2'), m…

Epoch 154, global step 464: val_loss reached 0.16003 (best 0.15884), saving model to "/content/drive/My Drive/talknet/QueenTalk/TalkNetSpect/2021-07-22_19-02-23/checkpoints/TalkNetSpect--val_loss=0.16-epoch=154.ckpt" as top 3


HBox(children=(FloatProgress(value=1.0, bar_style='info', description='Validating', layout=Layout(flex='2'), m…

Epoch 159, global step 479: val_loss reached 0.15567 (best 0.15567), saving model to "/content/drive/My Drive/talknet/QueenTalk/TalkNetSpect/2021-07-22_19-02-23/checkpoints/TalkNetSpect--val_loss=0.16-epoch=159.ckpt" as top 3


HBox(children=(FloatProgress(value=1.0, bar_style='info', description='Validating', layout=Layout(flex='2'), m…

Epoch 164, global step 494: val_loss reached 0.15409 (best 0.15409), saving model to "/content/drive/My Drive/talknet/QueenTalk/TalkNetSpect/2021-07-22_19-02-23/checkpoints/TalkNetSpect--val_loss=0.15-epoch=164.ckpt" as top 3


HBox(children=(FloatProgress(value=1.0, bar_style='info', description='Validating', layout=Layout(flex='2'), m…

Epoch 169, global step 509: val_loss reached 0.15595 (best 0.15409), saving model to "/content/drive/My Drive/talknet/QueenTalk/TalkNetSpect/2021-07-22_19-02-23/checkpoints/TalkNetSpect--val_loss=0.16-epoch=169.ckpt" as top 3


HBox(children=(FloatProgress(value=1.0, bar_style='info', description='Validating', layout=Layout(flex='2'), m…

Epoch 174, global step 524: val_loss reached 0.15372 (best 0.15372), saving model to "/content/drive/My Drive/talknet/QueenTalk/TalkNetSpect/2021-07-22_19-02-23/checkpoints/TalkNetSpect--val_loss=0.15-epoch=174.ckpt" as top 3


HBox(children=(FloatProgress(value=1.0, bar_style='info', description='Validating', layout=Layout(flex='2'), m…

Epoch 179, global step 539: val_loss reached 0.15384 (best 0.15372), saving model to "/content/drive/My Drive/talknet/QueenTalk/TalkNetSpect/2021-07-22_19-02-23/checkpoints/TalkNetSpect--val_loss=0.15-epoch=179.ckpt" as top 3


HBox(children=(FloatProgress(value=1.0, bar_style='info', description='Validating', layout=Layout(flex='2'), m…

Epoch 184, global step 554: val_loss reached 0.15238 (best 0.15238), saving model to "/content/drive/My Drive/talknet/QueenTalk/TalkNetSpect/2021-07-22_19-02-23/checkpoints/TalkNetSpect--val_loss=0.15-epoch=184.ckpt" as top 3


HBox(children=(FloatProgress(value=1.0, bar_style='info', description='Validating', layout=Layout(flex='2'), m…

Epoch 189, global step 569: val_loss reached 0.15325 (best 0.15238), saving model to "/content/drive/My Drive/talknet/QueenTalk/TalkNetSpect/2021-07-22_19-02-23/checkpoints/TalkNetSpect--val_loss=0.15-epoch=189.ckpt" as top 3


HBox(children=(FloatProgress(value=1.0, bar_style='info', description='Validating', layout=Layout(flex='2'), m…

Epoch 194, global step 584: val_loss was not in top 3


HBox(children=(FloatProgress(value=1.0, bar_style='info', description='Validating', layout=Layout(flex='2'), m…

Epoch 199, global step 599: val_loss was not in top 3
Saving latest checkpoint...





In [None]:
#@markdown **Step 10:** Generate GTA spectrograms. This will help HiFi-GAN learn what your TalkNet model sounds like.

#@markdown If this step fails, make sure you've finished training the spectrogram generator.

import sys
import os
import torch
import numpy as np
from tqdm import tqdm
from nemo.collections.tts.models import TalkNetSpectModel
import shutil

def fix_paths(inpath):
    output = ""
    with open(inpath, "r", encoding="utf8") as f:
        for l in f.readlines():
            if l[:5].lower() != "wavs/":
                output += "wavs/" + l
            else:
                output += l
    with open(inpath, "w", encoding="utf8") as w:
        w.write(output)

shutil.copyfile(train_filelist, "/content/hifi-gan/training.txt")
shutil.copyfile(val_filelist, "/content/hifi-gan/validation.txt")
fix_paths("/content/hifi-gan/training.txt")
fix_paths("/content/hifi-gan/validation.txt")
fix_paths("/content/allfiles.txt")

os.chdir('/content')
indir = "wavs"
outdir = "hifi-gan/wavs"
if not os.path.exists(outdir):
    os.mkdir(outdir)

model_path = ""
path0 = os.path.join(output_dir, "TalkNetSpect")
if os.path.exists(path0):
    path1 = sorted(os.listdir(path0))
    for i in range(len(path1)):
        path2 = os.path.join(path0, path1[-(1+i)], "checkpoints")
        if os.path.exists(path2):
            match = [x for x in os.listdir(path2) if "TalkNetSpect.nemo" in x]
            if len(match) > 0:
                model_path = os.path.join(path2, match[0])
                break
assert model_path != "", "TalkNetSpect.nemo not found"

dur_path = os.path.join(output_dir, "durations.pt")
f0_path = os.path.join(output_dir, "f0s.pt")

model = TalkNetSpectModel.restore_from(model_path)
model.eval()
with open("allfiles.txt", "r", encoding="utf-8") as f:
    dataset = f.readlines()
durs = torch.load(dur_path)
f0s = torch.load(f0_path)

for x in tqdm(dataset):
    x_name = os.path.splitext(os.path.basename(x.split("|")[0].strip()))[0]
    x_tokens = model.parse(text=x.split("|")[1].strip())
    x_durs = (
        torch.stack(
            (
                durs[x_name]["blanks"],
                torch.cat((durs[x_name]["tokens"], torch.zeros(1).int())),
            ),
            dim=1,
        )
        .view(-1)[:-1]
        .view(1, -1)
        .to("cuda:0")
    )
    x_f0s = f0s[x_name].view(1, -1).to("cuda:0")
    x_spect = model.force_spectrogram(tokens=x_tokens, durs=x_durs, f0=x_f0s)
    rel_path = os.path.splitext(x.split("|")[0].strip())[0][5:]
    abs_dir = os.path.join(outdir, os.path.dirname(rel_path))
    if abs_dir != "" and not os.path.exists(abs_dir):
        os.makedirs(abs_dir, exist_ok=True)
    np.save(os.path.join(outdir, rel_path + ".npy"), x_spect.detach().cpu().numpy())


[NeMo W 2021-07-22 19:10:40 modelPT:139] If you intend to do training or fine-tuning, please call the ModelPT.setup_training_data() method and provide a valid configuration file to setup the train data loader.
    Train config : 
    dataset:
      _target_: nemo.collections.asr.data.audio_to_text.AudioToCharWithDursF0Dataset
      manifest_filepath: trainfiles.json
      max_duration: null
      min_duration: 0.1
      int_values: false
      load_audio: true
      normalize: false
      sample_rate: 22050
      trim: false
      durs_file: /content/drive/My Drive/talknet/QueenTalk/durations.pt
      f0_file: /content/drive/My Drive/talknet/QueenTalk/f0s.pt
      blanking: true
      vocab:
        notation: phonemes
        punct: true
        spaces: true
        stresses: false
        add_blank_at: last
    dataloader_params:
      drop_last: false
      shuffle: true
      batch_size: 32
      num_workers: 4
    
[NeMo W 2021-07-22 19:10:40 modelPT:146] If you intend to do valida

[NeMo I 2021-07-22 19:10:40 features:252] PADDING: 1
[NeMo I 2021-07-22 19:10:40 features:269] STFT using torch
[NeMo I 2021-07-22 19:10:41 modelPT:439] Model TalkNetSpectModel was successfully restored from /content/drive/My Drive/talknet/QueenTalk/TalkNetSpect/2021-07-22_19-02-23/checkpoints/TalkNetSpect.nemo.


100%|██████████| 75/75 [00:03<00:00, 24.62it/s]


In [None]:
#@markdown **Step 11:** Train HiFi-GAN. 2,000+ steps are recommended.
#@markdown Stop this cell to finish training the model.

#@markdown If CUDA runs out of memory, click on Runtime -> Restart runtime, re-run step 3, and try again.
#@markdown If this step still fails to start, make sure step 10 finished successfully.

#@markdown Note: If the training process starts at step 2500000, delete the HiFiGAN folder and try again.

import gdown
d = 'https://drive.google.com/uc?id='

os.chdir('/content/hifi-gan')
assert os.path.exists("wavs"), "Spectrogram folder not found"

if not os.path.exists(os.path.join(output_dir, "HiFiGAN")):
    os.makedirs(os.path.join(output_dir, "HiFiGAN"))
if not os.path.exists(os.path.join(output_dir, "HiFiGAN", "do_00000000")):
    print("Downloading universal model...")
    gdown.download(d+"1qpgI41wNXFcH-iKq1Y42JlBC9j0je8PW", os.path.join(output_dir, "HiFiGAN", "g_00000000"), quiet=False)
    gdown.download(d+"1O63eHZR9t1haCdRHQcEgMfMNxiOciSru", os.path.join(output_dir, "HiFiGAN", "do_00000000"), quiet=False)
    start_from_universal = "--warm_start True "
else:
    start_from_universal = ""

!python train.py --fine_tuning True --config config_v1b.json \
{start_from_universal} \
--checkpoint_interval 250 --checkpoint_path "{os.path.join(output_dir, 'HiFiGAN')}" \
--input_training_file "/content/hifi-gan/training.txt" \
--input_validation_file "/content/hifi-gan/validation.txt" \
--input_wavs_dir ".." --input_mels_dir "wavs"


Downloading universal model...


Downloading...
From: https://drive.google.com/uc?id=1qpgI41wNXFcH-iKq1Y42JlBC9j0je8PW
To: /content/drive/My Drive/talknet/QueenTalk/HiFiGAN/g_00000000
55.8MB [00:00, 140MB/s]
Downloading...
From: https://drive.google.com/uc?id=1O63eHZR9t1haCdRHQcEgMfMNxiOciSru
To: /content/drive/My Drive/talknet/QueenTalk/HiFiGAN/do_00000000
960MB [00:12, 75.4MB/s]


[1;30;43mStreaming output truncated to the last 5000 lines.[0m
  sampling_rate, data = read(full_path)
  sampling_rate, data = read(full_path)
Steps : 9805, Gen Loss Total : 27.884, Mel-Spec. Error : 0.386, s/b : 1.156
Time taken for epoch 2452 is 5 sec

Current learning rate: 1.0670515716693414e-37
Epoch: 2453
  sampling_rate, data = read(full_path)
  sampling_rate, data = read(full_path)
  sampling_rate, data = read(full_path)
  sampling_rate, data = read(full_path)
Steps : 9810, Gen Loss Total : 27.972, Mel-Spec. Error : 0.388, s/b : 1.144
Time taken for epoch 2453 is 5 sec

Current learning rate: 1.0350400245192612e-37
Epoch: 2454
  sampling_rate, data = read(full_path)
  sampling_rate, data = read(full_path)
  sampling_rate, data = read(full_path)
  sampling_rate, data = read(full_path)
Steps : 9815, Gen Loss Total : 27.688, Mel-Spec. Error : 0.385, s/b : 1.147
Time taken for epoch 2454 is 5 sec

Current learning rate: 1.0039888237836832e-37
Epoch: 2455
  sampling_rate, data = r

In [None]:
#@markdown **Step 12:** Package the models. They'll be saved to the output directory as [character_name]_TalkNet.zip.

character_name = "Queen" #@param {type:"string"}

#@markdown When done, generate a Drive share link, with permissions set to "Anyone with the link". 
#@markdown You can then use it with the [Controllable TalkNet notebook](https://colab.research.google.com/drive/1aj6Jk8cpRw7SsN3JSYCv57CrR6s0gYPB) 
#@markdown by selecting "Custom model" as your character.

#@markdown This cell will also move the training checkpoints and logs to the trash.
#@markdown That should free up roughly 2 GB of space on your Drive (remember to empty your trash).
#@markdown If you wish to keep them, uncheck this box.

delete_checkpoints = True #@param {type:"boolean"}

import shutil
from zipfile import ZipFile

def find_talknet(model_dir):
    ckpt_path = ""
    path0 = os.path.join(output_dir, model_dir)
    if os.path.exists(path0):
        path1 = sorted(os.listdir(path0))
        for i in range(len(path1)):
            path2 = os.path.join(path0, path1[-(1+i)], "checkpoints")
            if os.path.exists(path2):
                match = [x for x in os.listdir(path2) if ".nemo" in x]
                if len(match) > 0:
                    ckpt_path = os.path.join(path2, match[0])
                    break
    assert ckpt_path != "", "Couldn't find " + model_dir
    return ckpt_path

durs_path = find_talknet("TalkNetDurs")
pitch_path = find_talknet("TalkNetPitch")
spect_path = find_talknet("TalkNetSpect")
assert os.path.exists(os.path.join(output_dir, "HiFiGAN", "g_00000000")), "Couldn't find HiFi-GAN"

zip = ZipFile(os.path.join(output_dir, character_name + "_TalkNet.zip"), 'w')
zip.write(durs_path, "TalkNetDurs.nemo")
zip.write(pitch_path, "TalkNetPitch.nemo")
zip.write(spect_path, "TalkNetSpect.nemo")
zip.write(os.path.join(output_dir, "HiFiGAN", "g_00000000"), "hifiganmodel")
zip.write(os.path.join(output_dir, "HiFiGAN", "config.json"), "config.json")
zip.write(os.path.join(output_dir, "f0_info.json"), "f0_info.json")
zip.close()
print("Archived model to " + os.path.join(output_dir, character_name + "_TalkNet.zip"))

if delete_checkpoints:
    shutil.rmtree((os.path.join(output_dir, "TalkNetDurs")))
    shutil.rmtree((os.path.join(output_dir, "TalkNetPitch")))
    shutil.rmtree((os.path.join(output_dir, "TalkNetSpect")))
    shutil.rmtree((os.path.join(output_dir, "HiFiGAN")))
