<a href="https://colab.research.google.com/github/allanpichardo/vae_synth/blob/main/Audio_Generation_with_Autoencoders.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Audio Synthesis with Machine Learning

<a target="_blank" href="https://github.com/allanpichardo/vae_synth"><img src="https://www.tensorflow.org/images/GitHub-Mark-32px.png" /><br>View source on GitHub</a>

<p>In this workshop, we will cover the basic intuition behind the variational autoencoder–a basic generative model. We will use a variety of synth samples as training data and use the final trained decoder to output a range of random wav files of unique sounds that can be used in a sampler or DAW.

# Overview

<ul>
  <li>Variational Autoencoders: Basic Intuition</li>
</ul>

In [None]:
!pip install kapre==0.3.4

In [None]:
!gdown --id 1dHQqjYClND3fLHjM--cOWQh_ucS480QI
!mkdir /content/samples
!unzip /content/samples.zip -d /content

In [None]:
!git clone https://github.com/allanpichardo/vae_synth.git

In [None]:
import os
from datetime import datetime
import tensorflow as tf
import librosa

In [6]:
import sys
sys.path.append('/content/vae_synth')

from vae_synth.models import get_model, get_synth_model
from vae_synth.callbacks import SpectrogramCallback
from vae_synth.generators import SoundSequence

In [None]:
# Load the TensorBoard notebook extension
%load_ext tensorboard

In [None]:
path = '/content/samples'
sr = 44100
duration = 1.0
batch_size = 4
spectrogram_shape = (80, 1025)

In [None]:
autoencoder = get_model(latent_dim=8, sr=sr, duration=duration)
autoencoder.stft.summary()
autoencoder.encoder.summary()
autoencoder.decoder.summary()

autoencoder.compile(optimizer=tf.keras.optimizers.Adam())

Model: "stft"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_6 (InputLayer)         [(None, 132300, 1)]       0         
_________________________________________________________________
stft_mag_phase (Functional)  (None, 513, 513, 2)       0         
_________________________________________________________________
tf.image.resize_with_crop_or (None, 513, 513, 2)       0         
Total params: 0
Trainable params: 0
Non-trainable params: 0
_________________________________________________________________
Model: "encoder"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
input_8 (InputLayer)            [(None, 513, 513, 2) 0                                            
____________________________________________________________________________________

In [None]:
sequence = SoundSequence(path, sr=sr, duration=duration, batch_size=batch_size)

In [None]:
epochs = 20
autoencoder.fit(sequence, epochs=epochs)

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


<tensorflow.python.keras.callbacks.History at 0x7f4165db5fd0>

In [None]:
synth = get_synth_model(autoencoder.decoder)

In [None]:
import librosa 

random = tf.random.normal([1, 8], 0, 1)
print(random)
wav = synth.predict_on_batch(random)
wav = librosa.util.normalize(wav[0])

tf.io.write_file('output.wav', tf.audio.encode_wav(wav, 44100))

from IPython.display import Audio
Audio(filename='output.wav', rate=44100)

tf.Tensor(
[[-0.22707355  1.5003572  -0.15913974  0.71210206  0.49416935  0.93992066
   1.4020661   0.3085858 ]], shape=(1, 8), dtype=float32)
