## In this notebook, we will fine-tune the Denoising Autoencoder (DAE) on our dataset of fake and real voices.

Again, see this repo for the autoencoder architecture: https://github.com/vbelz/Speech-enhancement

In [1]:
# tensorflow and GPU is very buggy on Windows
# however, this cell solves the problem of tensorflow not detecting any GPUs
# https://github.com/tensorflow/tensorflow/issues/48868 provides the solution
import os
os.add_dll_directory("C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v11.2/bin") # tf can now see the CUDA directory

import tensorflow as tf
tf.config.list_physical_devices() # GPU should now appear

[PhysicalDevice(name='/physical_device:CPU:0', device_type='CPU'),
 PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]

In [2]:
import denoising_AE as DAE # contains useful functions from repo

In [3]:
# grab the model to fine tune
dae = DAE.load_pretrained_model(display_summary=True)

Model: "model_2"
__________________________________________________________________________________________________
 Layer (type)                   Output Shape         Param #     Connected to                     
 input_3 (InputLayer)           [(None, 128, 128, 1  0           []                               
                                )]                                                                
                                                                                                  
 conv2d_48 (Conv2D)             (None, 128, 128, 16  160         ['input_3[0][0]']                
                                )                                                                 
                                                                                                  
 leaky_re_lu_46 (LeakyReLU)     (None, 128, 128, 16  0           ['conv2d_48[0][0]']              
                                )                                                           

### To fine-tune this model, we want to denoise our real voices.

So, load the real voices from the hard drive, then add noise to the dataset.

In [None]:
AUDIO_LENGTH = 64000 # 16000 per second, since that is the sample rate

In [None]:
# get the training data (only use real voices)
# need to look at reconstruction errors
train_ds = tf.keras.utils.audio_dataset_from_directory(
    directory="for-norm/training/real",
    batch_size=64,
    validation_split=0,
    seed=0,
    labels=None,
    output_sequence_length=AUDIO_LENGTH # 16000 => all truncated to 1 second, None => all padded to length of longest file
)