# Conectar al vostre "Google Drive"
Per sintetitzar les frases és a dir fer un "inference" necessitem els models de Tacotron2 i també del vocoder. La manera de fer-ho és a partir dels enllaços afegir els fitxers al vostre google drive.

Un cop els fitxers estan a les nostres carpetes, podem donar els permisos a aquest "notebook" per accedir a les carpetes del drive. D'aquesta manera els fitxers seran visibles pel codi.

Els enllaços són:
* [Model de Tacotron2](https://drive.google.com/open?id=1c5ZTuT7J08wLUoVZ2KkUs_VdZuJ86ZqA
), entrenat amb LJ Speech
* [Model de Waveglow](https://drive.google.com/open?id=1WsibBTsuRg_SF2Z6L6NFRTT-NjEy1oTx), entrenat amb LJ Speech
* [Dades de CMU Arctic](https://drive.google.com/open?id=1-DWmBkD99R09wEMb9r2MSD_l9qkgbNOf
); només la veu KSP, procecessat per Tacotron2

In [1]:
from google.colab import drive
drive.mount('/content/drive')
!ls "/content/drive/My Drive/tacotron_models"

Go to this URL in a browser: https://accounts.google.com/o/oauth2/auth?client_id=947318989803-6bn6qk8qdgf4n4g3pfee6491hc0brc4i.apps.googleusercontent.com&redirect_uri=urn%3Aietf%3Awg%3Aoauth%3A2.0%3Aoob&scope=email%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdocs.test%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdrive%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdrive.photos.readonly%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fpeopleapi.readonly&response_type=code

Enter your authorization code:
··········
Mounted at /content/drive
arctic_ksp_checkpoint_2500	  checkpoint_64b_1e-4l_2500
checkpoint_0			  checkpoint_64b_4e-4l_04dout_1500
checkpoint_250			  checkpoint_64b_4e-4l_1750
checkpoint_3500			  logs
checkpoint_42500		  tacotron2_statedict.pt
checkpoint_500			  test_wav
checkpoint_64b_1e-4l_04dout_2750  waveglow_256channels.pt


# Importar el codi
El "notebook" de colab ens deixa executar ordres del terminal d'un linux, mitjançant el `!` i `%`. A més, els servidors del colab venen amb certes aplicacions instal·lades com a CUDA i git.

Per importar el codi, farem un clon de github.

In [2]:
import sys
!git clone https://github.com/NVIDIA/tacotron2
%cd tacotron2
!git submodule init; git submodule update
sys.path.append('/content/tacotron2/waveglow')

Cloning into 'tacotron2'...
remote: Enumerating objects: 3, done.[K
remote: Counting objects:  33% (1/3)[Kremote: Counting objects:  66% (2/3)[Kremote: Counting objects: 100% (3/3)[Kremote: Counting objects: 100% (3/3), done.[K
remote: Compressing objects: 100% (3/3), done.[K
remote: Total 360 (delta 0), reused 2 (delta 0), pack-reused 357[K
Receiving objects: 100% (360/360), 2.68 MiB | 4.79 MiB/s, done.
Resolving deltas: 100% (179/179), done.
/content/tacotron2
Submodule 'waveglow' (https://github.com/NVIDIA/waveglow) registered for path 'waveglow'
Cloning into '/content/tacotron2/waveglow'...
Submodule path 'waveglow': checked out '4b1001fa3336a1184b8293745bb89b177457f09b'


NameError: ignored

# Instal·lació de les llibreries i cridar-les

In [5]:
%%bash
pip install numpy scipy librosa unidecode inflect librosa tensorboardX

Collecting unidecode
  Downloading https://files.pythonhosted.org/packages/d0/42/d9edfed04228bacea2d824904cae367ee9efd05e6cce7ceaaedd0b0ad964/Unidecode-1.1.1-py2.py3-none-any.whl (238kB)
Collecting tensorboardX
  Downloading https://files.pythonhosted.org/packages/a6/5c/e918d9f190baab8d55bad52840d8091dd5114cc99f03eaa6d72d404503cc/tensorboardX-1.9-py2.py3-none-any.whl (190kB)
Installing collected packages: unidecode, tensorboardX
Successfully installed tensorboardX-1.9 unidecode-1.1.1


In [6]:
# Generic libraries
import matplotlib
%matplotlib inline
import matplotlib.pylab as plt

import IPython.display as ipd
from scipy.io.wavfile import write

import numpy as np
import torch
from audio_processing import griffin_lim

# tacotron2 modules
from hparams import create_hparams
from model import Tacotron2
from layers import TacotronSTFT, STFT

import distributed
from train import load_model
from text import text_to_sequence
#from denoiser import Denoiser

# Carregar els models
Per generar una veu, Tacotron2 necessita dos passos: el primer generar els mel espectrogrames i el segon generar les ones a partir dels espectrogrames. Per aquesta raó necessitem dos models un per Tacotron2 un altre pel Vocoder. En aquest cas un model de Waveglow.

Amb aquest pas estem carregant els dos models a la memòria.

In [7]:
hparams = create_hparams()
rate = 22050
hparams.sampling_rate = rate

checkpoint_path = "../drive/My Drive/tacotron_models/tacotron2_statedict.pt"
model = load_model(hparams)
model.load_state_dict(torch.load(checkpoint_path)['state_dict'])
_ = model.cuda().eval().half()

The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:
  * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
  * https://github.com/tensorflow/addons
  * https://github.com/tensorflow/io (for I/O related ops)
If you depend on functionality not listed there, please file an issue.



In [8]:
waveglow_path = '../drive/My Drive/tacotron_models/waveglow_256channels.pt'
waveglow = torch.load(waveglow_path)['model']
waveglow.cuda().eval().half()
for k in waveglow.convinv:
    k.float()
#denoiser = Denoiser(waveglow)

# fix for the "AttributeError: 'ConvTranspose1d' object has no attribute 'padding_mode'"
for m in waveglow.modules():
    if 'Conv' in str(type(m)):
        setattr(m, 'padding_mode', 'zeros')



# Sintetitzar la veu

Aquí introduirem un text, per generar la veu.

Fixeu-vos la crida als dos models. Podem escoltar el resultat dins del "notebook" mitjançant el modul `ipython display`.

In [16]:
# introduce the text
text = "It's always darkest before it becomes totally black."

# preprocessing
sequence = np.array(text_to_sequence(text, ['english_cleaners']))[None, :]
sequence = torch.from_numpy(sequence).to(device='cuda', dtype=torch.int64)

# run the models
mel_outputs, mel_outputs_postnet, _, alignments = model.inference(sequence)
with torch.no_grad():
    audio = waveglow.infer(mel_outputs_postnet)
audio_numpy = audio[0].data.cpu().numpy()

# make audio listenable
ipd.Audio(audio_numpy, rate=rate)




In [0]:
write('/content/drive/My Drive/test_wavs/nvidia01.wav', rate, audio_numpy)

# Fer servir l'algorisme Griffin-Lim

L'algorisme de Griffin-Lim facilita sintetitzar la veu sense la necessitat d'un vocoder entrenat per xarxes neuronals. Quan estem experimentant amb dades noves, i no tenim cap vocoder entrenat, aquest algorisme ajuda fer un control de qualitat ràpid. 

In [0]:
def infer(checkpoint_path, griffin_iters, text):
    hparams = create_hparams()
    hparams.sampling_rate = 22050

    model = load_model(hparams)
    model.load_state_dict(torch.load(checkpoint_path)['state_dict'])
    _ = model.cuda().eval()#.half()

    sequence = np.array(text_to_sequence(text, ['english_cleaners']))[None, :]
    sequence = torch.autograd.Variable(torch.from_numpy(sequence)).cuda().long()

    mel_outputs, mel_outputs_postnet, _, alignments = model.inference(sequence)

    taco_stft = TacotronSTFT(hparams.filter_length, hparams.hop_length, hparams.win_length, sampling_rate=hparams.sampling_rate)

    mel_decompress = taco_stft.spectral_de_normalize(mel_outputs_postnet)
    mel_decompress = mel_decompress.transpose(1, 2).data.cpu()
    spec_from_mel_scaling = 1000
    spec_from_mel = torch.mm(mel_decompress[0], taco_stft.mel_basis)
    spec_from_mel = spec_from_mel.transpose(0, 1).unsqueeze(0)
    spec_from_mel = spec_from_mel * spec_from_mel_scaling

    audio = griffin_lim(torch.autograd.Variable(spec_from_mel[:, :, :-1]), taco_stft.stft_fn, griffin_iters)

    audio = audio.squeeze()
    audio = audio.cpu().numpy()
    return audio


In [12]:
audio_griffin = infer(checkpoint_path, 60, text)
ipd.Audio(audio_griffin, rate=rate)

In [0]:
write('../../drive/My Drive/test_wavs/nvidia01_griffin.wav', hparams.sampling_rate, audio_griffin)

# Generar gràfiques

Per donar una ullada a l'alineament i els espectrograma, fem servir matplotlib.

In [0]:
def plot_data(data, figsize=(16, 16)):
    fig, axes = plt.subplots(len(data), 1, figsize=figsize)
    for i in range(len(data)):
        axes[i].imshow(data[i], aspect='auto', origin='bottom', 
                       interpolation='none')
    plt.savefig('/content/result.png')

In [0]:
plot_data((mel_outputs.float().data.cpu().numpy()[0],
           mel_outputs_postnet.float().data.cpu().numpy()[0],
           alignments.float().data.cpu().numpy()[0].T))