# Wavenet Implementation with PyTorch

## Preparation of the Environment

In [0]:
#@title If you want to mount your Google Drive, you should do it here by commenting in the following Code

#from google.colab import drive
#drive.mount('/content/drive')


In [0]:
#@title Cloning of the important Script from GitHub
!git clone https://github.com/vincentherrmann/pytorch-wavenet.git

In [0]:
#@title Navigating to the Script that was just downloaded
%cd /content/pytorch-wavenet/

In [0]:
#@title Version 0.3.0 of PyTorch is required. Therefore it needs to be downloaded and installed.
!pip3 install http://download.pytorch.org/whl/cu80/torch-0.3.0.post4-cp36-cp36m-linux_x86_64.whl 

In [0]:
# @title Tensorflow version 1.12.0 is required. This version is downloaded and installed here.


!pip install tensorflow==1.12.0
import tensorflow as tf
print(tf.__version__)

##The next four blocks of code activate the Tensorboard, on which you can get with the help of the created link.


In [0]:
!wget https://bin.equinox.io/c/4VmDzA7iaHb/ngrok-stable-linux-amd64.zip
!unzip ngrok-stable-linux-amd64.zip

In [0]:
LOG_DIR = 'logs/chaconne_model'
get_ipython().system_raw(
    'tensorboard --logdir {} --host 0.0.0.0 --port 6006 &'
    .format(LOG_DIR)
)

In [0]:
get_ipython().system_raw('./ngrok http 6006 &')

In [0]:
!curl -s http://localhost:4040/api/tunnels | python3 -c \
    "import sys, json; print(json.load(sys.stdin)['tunnels'][0]['public_url'])"

##Code for the Training and Generating of the WaveNet model.

In [0]:
#@title First all necessary packages are getting imported
import torch
from wavenet_model import *
from audio_data import WavenetDataset
from wavenet_training import *
from model_logging import *

In [0]:
# @title then, the data type and the label are getting initialized on cuda if a gpu is available.
# initialize cuda option
dtype = torch.FloatTensor # data type
ltype = torch.LongTensor # label type

use_cuda = torch.cuda.is_available()
if use_cuda:
    print('use gpu')
    dtype = torch.cuda.FloatTensor
    ltype = torch.cuda.LongTensor
    torch.set_default_tensor_type('torch.cuda.FloatTensor')
else: 
    print('gpu not available')

In [0]:
#@title Here the acutal Neural Network is created.
model = WaveNetModel(layers=10,
                     blocks=3,
                     dilation_channels=32,
                     residual_channels=32,
                     skip_channels=1024,
                     end_channels=512, 
                     output_length=16,
                     dtype=dtype, 
                     bias=True)
#model = load_latest_model_from('snapshots', use_cuda=use_cuda)

#model = model.cuda()

print('model: ', model)
print('receptive field: ', model.receptive_field)
print('parameter count: ', model.parameter_count())

###Training

In [0]:
#@title In this Codeblock, the Trainingdata is getting initialized. Therefore the Trainingdata needs to be in the path train_samples/bach_chaconne. In this Implementation, there is already a dataset.npz file, with audio-data of a violin. If you want to use other data, you need to remove the dataset.npz file. You can then put some audio files that you like to train the model on in this path. The Code then transforms these files into the dataset.npz file. This file is used by the Model in the end.
data = WavenetDataset(dataset_file='train_samples/bach_chaconne/dataset.npz',
                      item_length=model.receptive_field + model.output_length - 1,
                      target_length=model.output_length,
                      file_location='train_samples/bach_chaconne',
                      test_stride=500)
print('the dataset has ' + str(len(data)) + ' items')

In [0]:
#@title As long as the Tensorboardlogger is used, this function is used as logging function. Also it should give some intermediate results.
def generate_and_log_samples(step):
    sample_length=32000


    gen_model = load_latest_model_from('snapshots', use_cuda=False)
   
    print("start generating...")
    samples = generate_audio(gen_model,
                             length=sample_length,
                             temperatures=[0.5])
    tf_samples = tf.convert_to_tensor(samples, dtype=tf.float32)
    logger.audio_summary('temperature_0.5', tf_samples, step, sr=16000)

    samples = generate_audio(gen_model,
                             length=sample_length,
                             temperatures=[1.])
    tf_samples = tf.convert_to_tensor(samples, dtype=tf.float32)
    logger.audio_summary('temperature_1.0', tf_samples, step, sr=16000)
    print("audio clips generated")

In [0]:
#@title One can either use the Tensorboardlogger , with which you can follow your trainingdata on Tensorboard or one can use a simple Logger, where the logging-data is displayed directly on the console.
logger = TensorboardLogger(log_interval=200,
                           validation_interval=400,
                           generate_interval=1000,
                           generate_function=generate_and_log_samples,
                           log_dir="logs/chaconne_model")

#logger = Logger(log_interval=200,
#                validation_interval=400,
#                generate_interval=1000)

In [0]:
#@title In the following Codeblock, the WaveNet model is trainied. Therefore a trainer is created. This trainier is getting the batch_size and the number of epochs as parameters.
trainer = WavenetTrainer(model=model.cuda(),
                         dataset=data,
                         lr=0.001,
                         snapshot_path='snapshots',
                         snapshot_name='chaconne_model',
                         snapshot_interval=1000,
                         logger=logger,
                         dtype=dtype,
                         ltype=ltype)


print('start training...')
trainer.train(batch_size=6,
              epochs=10)

###Generating

In [0]:
#@title In this Codeblock, an output is created. This is happening on the basis of the trained model. The length of the created output can be varied with the variable num_samples. 16000 Samples are one second of output. The rate is responsible for the playing-speed of the ouptut.
start_data = data[250000][0] # use start data from the data set
start_data = torch.max(start_data, 0)[1] # convert one hot vectors to integers

def prog_callback(step, total_steps):
    print(str(100 * step // total_steps) + "% generated")

model.cpu()
generated = model.generate_fast(num_samples=160000,
                                 first_samples=start_data,
                                 progress_callback=prog_callback,
                                 progress_interval=1000,
                                 temperature=1.0,
                                 regularize=0.)


import IPython.display as ipd

ipd.Audio(generated, rate=16000)

##Sources

[Deep Minds Blog Article about WaveNet](https://deepmind.com/blog/article/wavenet-generative-model-raw-audio)

[WaveNet Paper](https://arxiv.org/pdf/1609.03499.pdf)

[Github tensorflow WaveNet Project](https://github.com/ibab/tensorflow-wavenet)

[Google Assistants Voice Synthesizer, overview of WaveNet](https://towardsdatascience.com/wavenet-google-assistants-voice-synthesizer-a168e9af13b1)

There is another Jupyter Notebook in the respository, which was realized with tensorflow. With this Notebook you can train the WaveNet on voice