# Generating Audio with LSTMs

This notebook makes use of various audio samples, and models it with an **LSTM** to generate new raw audio.

The first part of this project utilises a youtube downloader to get an audio sample from Björk's [Tabula Rasa](https://www.youtube.com/watch?v=mYbZw04ba78) music video. 

For the second part, I wanted to learn how the model would generate audio with a more upbeat music input, so I went with Björk's [Earth Intruders](https://www.youtube.com/watch?v=j1Q9ppPPHjU) music video as the audio input. 

Finally, for the third part, audio snippets from these three Youtube videos were used to create one single audio file for the model:

1) [ Singing Bowl + Water](https://www.youtube.com/watch?v=PIZFlCE3-eg)

2) [Björk Talking About Her TV](https://www.youtube.com/watch?v=75WFTHpOw8Y&t=32s)

3) [A Mind-Blowing Sitar Player](https://www.youtube.com/watch?v=tTbY_EeC9Wg)

A description of the aim and the creative process for this project can be found in the attached PDF file in the repository, along with the customised audio input file and final audio outputs.

## Anti-Disconnect for Google Colab

In [None]:
#https://colab.research.google.com/github/justinjohn0306/VQGAN-CLIP/blob/main/VQGAN%2BCLIP(Updated).ipynb#scrollTo=g7EDme5RYCrt

import IPython
js_code = '''
function ClickConnect(){
console.log("Working");
document.querySelector("colab-toolbar-button#connect").click()
}
setInterval(ClickConnect,60000)
'''
display(IPython.display.Javascript(js_code))


## Install prerequisites and get code

In [None]:
!pip install -U numpy==1.19.0 # changed it from 1.19.5 to 1.19.0 to avoid getting this error: ValueError: Unexpected result of `train_function` (Empty logs). Please use `Model.compile(..., run_eagerly=True)`, or `tf.config.run_functions_eagerly(True)` for more information of where went wrong, or file a issue/bug to `tf.keras`. site:stackoverflow.com

^^Restart runtime if required!^^

In [None]:
%tensorflow_version 1.x
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'  # less warnings ...
import tensorflow as tf

In [None]:
!git clone https://github.com/ual-cci/music_gen_interaction_RTML.git

In [None]:
# python libraries
!pip install Pillow opencv-python PyWavelets tqdm slugify
!pip install -U Flask
!pip install lws==1.2.6
!pip install tflearn
!pip install librosa==0.7.2
!pip install numba==0.48
!pip install mock

In [None]:
%cd /content/music_gen_interaction_RTML

# Set up training

In [None]:
%cd /content/music_gen_interaction_RTML/

from unittest.mock import Mock, MagicMock
args = MagicMock(name='method')
sample_rate = 44100

args.lstm_layers = 3
args.lstm_units = 128
args.sample_rate = sample_rate
args.sequence_length = 40
args.async_loading = True
args.amount_epochs = -1

## Getting a Youtube Downloader (First and Second Part Only)


In [None]:
!sudo curl -L https://yt-dl.org/downloads/latest/youtube-dl -o /usr/local/bin/youtube-dl
!sudo chmod a+rx /usr/local/bin/youtube-dl

## Picking a Song from YouTube (First and Second Part Only)

In [None]:
youtube_url = "https://www.youtube.com/watch?v=4cIWu5m8UmA"

!youtube-dl -ci -f "bestaudio[ext=m4a]" $youtube_url -o 'youtube_audio.m4a'
!ffmpeg -i 'youtube_audio.m4a' -ac 2 -f wav full.wav

## Adding the Custom Audio File (Third Part Only)



In [None]:
!ffmpeg -i '/content/audio2.wav' -ac 2 -f wav full.wav

## Trimming the Audio File:

In [None]:
# Audio file preparation - this cuts a 1 minute sample from the audio file:
!ffmpeg -y -ss 60 -i full.wav -t 60 -c copy sample.wav

# Epochs for Training

In [None]:
number_of_epochs = 300 # will take cca 4min

song_name = "sample.wav" # < training the model on this wav file
model_name = "my_trained_model" # < and then saving the model under this name 

# Train Model

In [None]:
from IPython.display import clear_output
clear_output()

In [None]:
!python training_handler.py -target_file $song_name -model_name $model_name -amount_epochs $number_of_epochs -batch_size 512 \
                            -lstm_layers $args.lstm_layers  -lstm_units $args.lstm_units -sample_rate $args.sample_rate -sequence_length $args.sequence_length
from IPython.display import clear_output 


## Generate Audio


In [None]:
from google.colab import drive
drive.mount('/content/drive')

In [None]:
load_generative_seeds_from = song_name
load_model_from = model_name+".tfl"

In [None]:
from server_handler import ServerHandler
import settings

my_settings = settings.Settings(args)
print("Important Settings: settings.lstm_layers=", my_settings.lstm_layers, ", settings.lstm_units=", my_settings.lstm_units,
              ", settings.sample_rate=", my_settings.sample_rate)

generation_handler = ServerHandler(my_settings, manual_loading = True)
generation_handler.manual_init_song_model(load_generative_seeds_from, load_model_from)

# experimented with the start position and segment length for all the three inputs, all detailed in the PDF report.

generation_handler.change_impulse(0.2) # set to 20% sharp

sequence = [
  #Starts generating 10% through the song and generates 200 frames 
  [0.1, 200],
  #Moves to 30% through the song and generates 350 frames
  [0.3, 350],
  #Moves to 60% through the song and generates 400 frames
  [0.6, 400],
  #Moves to 90% through the song and generates 150 frames
  [0.9, 150],          
]

output_audio = []

for i in sequence:
  position_in_the_song = i[0]
  requested_length = i[1]
  generation_handler.change_impulse_smoothly_start(position_in_the_song) # allowing interpolation
  audio, predict, reconstruct = generation_handler.generate_audio_sample(requested_length, interactive_i=position_in_the_song)
  output_audio.append(audio)

clear_output()

## Play Audio



In [None]:
import librosa
import numpy as np

In [None]:
from IPython.display import Audio, Image
output_audio = np.concatenate(output_audio)

out_name = 'generated_output_exp_concat.wav'
librosa.output.write_wav(out_name, output_audio, sr=sample_rate)
Audio(out_name)