## <center>Augmentation methods for audio
Intuitively, lack of data is one of the common issue in actual data science problem. Data augmentation helps to generate synthetic data from existing data set such that generalisation capability of model can be improved.
  <p> <b>Data augmentation definition :</b>
<ul>
  <li>Data augmentation is the process by which we create new synthetic training samples by adding small perturbations on our initial training set.</li>
  <li>The objective is to make our model invariant to those perturbations and enhace its ability to generalize.</li>
  <li>In images data augmention can be performed by shifting the image, zooming, rotating ...</li>
    <li>In our case we will add noise, stretch and roll, pitch shift ...</li>
</ul>


In [None]:
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import librosa
import matplotlib.pyplot as plt
import os
import cv2
import IPython.display as ipd
from IPython.display import Audio, IFrame, display
import plotly.graph_objects as go
import librosa
import librosa.display
plt.style.use("ggplot")


In [None]:
train = pd.read_csv("../input/birdsong-recognition/train.csv")
species=train.species.value_counts()
fig = go.Figure(data=[
    go.Bar(y=species.values, x=species.index,marker_color='deeppink')
])

fig.update_layout(title='Distribution of Bird Species')
fig.show()

In [None]:
file_path='../input/birdsong-resampled-train-audio-04/wooscj2/XC67042.wav'
x , sr = librosa.load(file_path)
librosa.display.waveplot(x, sr=sr)
Audio(x, rate=sr)

## 1. Noise Injection
It simply add some random value into data

In [None]:
def noise(data, noise_factor):
    noise = np.random.randn(len(data))
    augmented_data = data + noise_factor * noise
    # Cast back to same data type
    augmented_data = augmented_data.astype(type(data[0]))
    return augmented_data

In [None]:
n=noise(x,0.01)
librosa.display.waveplot(n, sr=sr)

## 2. Time shifting
slightly shift the starting point of the audio, then pad it to original length.

In [None]:
def shifting_time(data, sampling_rate, shift_max, shift_direction):
    shift = np.random.randint(sampling_rate * shift_max)
    if shift_direction == 'right':
        shift = -shift
    elif self.shift_direction == 'both':
        direction = np.random.randint(0, 2)
        if direction == 1:
            shift = -shift
    augmented_data = np.roll(data, shift)
    # Set to silence for heading/ tailing
    if shift > 0:
        augmented_data[:shift] = 0
    else:
        augmented_data[shift:] = 0
    return augmented_data

In [None]:
s=shifting_time(x,sr,1,'right')
librosa.display.waveplot(s, sr=sr)

## 3. Speed tuning
slightly change the speed of the audio, then pad or slice it.

In [None]:
def speed(data, speed_factor):
    return librosa.effects.time_stretch(data, speed_factor)

In [None]:
v=speed(x,2)
librosa.display.waveplot(v, sr=sr)

## 4. Changing Pitch

In [None]:
def pitch(data, sampling_rate, pitch_factor):
    return librosa.effects.pitch_shift(data, sampling_rate, pitch_factor)

In [None]:
p=pitch(x,sr,2)
librosa.display.waveplot(p, sr)

## Take Away
<ul>
  <li>Data augmentation cannot replace real training data. It just help to generate synthetic data to make the model better.
</li>
  <li>Do not blindly generate synthetic data. You have to understand your data pattern and selecting a appropriate way to increase training data volume.</li>
</ul>


Please do upvote if you find it useful 👍🏼