# Data Processing - Explained
This note book will explain the data processing steps that will be used to generate the spectrogram images from the audio files. See the `AudioUtil` class for the implementation details. The notebook was created with the help of [this article](https://towardsdatascience.com/audio-deep-learning-made-simple-sound-classification-step-by-step-cebc936bbe5).

In [None]:
import IPython.display as ipd
import matplotlib.pyplot as plt
from torchaudio import transforms
import torch

from src.util.AudioUtil import AudioUtil as au

In [None]:
# Load the audio file
path = "..//input/scrape/Common_Cuckoo/XC180920-kukulka.mp3"

In [None]:
audio = au.open(path)

audio[0].shape, audio[1]

In [None]:
audio = au.rechannel(audio, 1)

audio[0].shape, audio[1]

In [None]:
audio = au.resample(audio, 44100)

ipd.display(ipd.Audio(data=audio[0], rate=audio[1]))
audio[0].shape, audio[1]

In [None]:
audio = au.pad_trunc(audio, 10000)
ipd.display(ipd.Audio(data=audio[0], rate=audio[1]))

In [None]:
spec_transform = transforms.MelSpectrogram(audio[1], n_fft=1024, hop_length=None, n_mels=64)
spec = spec_transform(audio[0])

plt.figure(figsize=(15, 10))
plt.imshow(spec[0], cmap='viridis')
plt.show()

Cant see much. Thats because we humans perceive sound in log scale. So we need to convert the spectrogram to log scale.

In [None]:
spectrogram = transforms.AmplitudeToDB(top_db=80)(spec)

plt.figure(figsize=(15, 10))
plt.imshow(spectrogram[0], cmap='viridis')
plt.show()

In [None]:
spectrogram = au.spectrogram(audio)

spectrogram.shape

Looks good. But we can do better. We can normalize the spectrogram

In [None]:
spec_transform = transforms.MelSpectrogram(audio[1], n_fft=1024, hop_length=None, n_mels=64)
spec = spec_transform(audio[0])

# log(1 + spec)
spec = torch.log1p(spec)

spec = transforms.AmplitudeToDB(top_db=80)(spec)

plt.figure(figsize=(15, 10))
plt.imshow(spec[0], cmap='viridis')
plt.show()

In [None]:
# plot the spectrogram
plt.figure(figsize=(15, 10))
plt.imshow(spectrogram[0], cmap='viridis')
plt.show()

In [None]:
aug_spec = au.spectro_augment(spectrogram, max_mask_pct=0.1, n_freq_masks=1, n_time_masks=1)

plt.figure(figsize=(15, 10))
plt.imshow(aug_spec[0], cmap='viridis')
plt.show()