## Data Augmentation

In [80]:
#load data
import convenience

df_train_val, sample_rates = convenience.load_train()
df_train_val['augmented_data'] = df_train_val.audio
df_train_val.head()

In [None]:
sr = list(sample_rates)[0]
sr

#initialize sr variable (sampling rate) 

In [None]:
import torch

In [None]:
waveform = df_train_val.audio[0]
print(waveform)
waveform.shape

#### Librosa

In this part I apply different types of augmentation techniques. They're programmed in a pipeline, where the output of the previous is the input to the next one. 
If you want to apply the augmentation technique to a 'clean slate', apply it to df_train_val['augmented'].

Code is adapted from: https://www.kaggle.com/code/huseinzol05/sound-augmentation-librosa#apply-hpss 

(Please note it would be a good idea to see for which augmentations the model has the best performance, yet also a good generalizability!)

In [None]:
import numpy as np
import librosa 

import seaborn as sns 
sns.set() #iirc mostly for visuals 
import tensorflow as tf
from IPython.display import Audio

In [None]:
#pitch shifted audio
#This is the begining of the pipeline, so df_train_val is used. 
pitch_shift_audio = [] #I always create a seperate list to avoid overwriting the original variable
for file in df_train_val['augmented_data']:
    audio_pitch = file.numpy() #data has to be a numpy array for Librosa to work
    bins_per_octave = 12 
    pitch_pm = 2 
    pitch_change =  pitch_pm * 2*(np.random.uniform())  #every audiofile has a random pitch change (can be tweaked)
    
    pitch_shift_audio.append(librosa.effects.pitch_shift(audio_pitch, sr = 16000, n_steps=pitch_change, bins_per_octave=bins_per_octave)) 

print(pitch_shift_audio)

In [None]:
#change speed
#notice how pitch_shift_audio is the input for this part (so no .numpy needed anymore)
speed_shift_audio = []
for file in pitch_shift_audio:
    audio_speed = file 
    speed_change = np.random.uniform(low=0.9,high=1.1) #strength of the effect (can be tweaked)
    tmp = librosa.effects.time_stretch(audio_speed, rate = speed_change) 
    minlen = min(audio_speed.shape[0], tmp.shape[0])
    audio_speed *= 0 
    audio_speed [0:minlen] = tmp[0:minlen] 
    
    speed_shift_audio.append(audio_speed)

In [None]:
#distribution noise
noise_dist_audio = []

for file in speed_shift_audio:
    audio_noise = file
    noise_amp = 0.005*np.random.uniform()*np.amax(audio_noise) #random audio noise, can be changed to any distribution from https://docs.scipy.org/doc/numpy-1.13.0/reference/routines.random.html
    audio_noise = audio_noise + noise_amp * np.random.normal(size= audio_noise.shape[0]) #mathy math for noise
    
    noise_dist_audio.append(audio_noise) 

In [None]:
#random shift
rand_shift_audio = []

for file in noise_dist_audio:
    audio_shift = file
    timeshift_fac = 0.2 *2*(np.random.uniform()-0.5)  # up to 20% of length shift (can be tweaked)
    start = int(audio_shift.shape[0] * timeshift_fac)
    if (start > 0): 
        audio_shift = np.pad(audio_shift,(start,0),mode='constant')[0:audio_shift.shape[0]]
    else:
        audio_shift = np.pad(audio_shift,(0,-start),mode='constant')[0:audio_shift.shape[0]]
    Audio(audio_shift, rate= sr)
    
    rand_shift_audio.append(audio_shift)

In [None]:
#stretching
stretch_shift_audio = []

for file in rand_shift_audio:
    input_length = len(file)
    streching = file
    streching = librosa.effects.time_stretch(streching , rate = 1.1) #similar code to speed up due to file needing to fit audio
    if len(streching) > input_length:
        streching = streching[:input_length]
    else:
        streching = np.pad(streching, (0, max(0, input_length - len(streching))), "constant")

    stretch_shift_audio.append(streching)

In [None]:
#supposed to convert the augmented data back into the df, in tensor form, but I was unable to make it work, kept giving dimension errors
#there could be a hidden issue in the functions used where it changes the dimensions, I suspect speed or stretch 
#I would try to run all parts of the pipeline seperately and see from where the issue arrises 

#augmented_data = torch.tensor(stretch_shift_audio)

#df_train_val['augmented_data'] = augmented_data
#df_train_val.head()