# Data Preprocessing

I begin by cloning the github repository where this dataset is shared.

In [None]:
!mkdir data
!cd data && git clone git@github.com:karolpiczak/ESC-50.git

Let's read the annotations.

In [8]:
import pandas as pd

anno = pd.read_csv('data/ESC-50/meta/esc50.csv')

In [15]:
anno.head()

Unnamed: 0,filename,fold,target,category,esc10,src_file,take
0,1-100032-A-0.wav,1,0,dog,True,100032,A
1,1-100038-A-14.wav,1,14,chirping_birds,False,100038,A
2,1-100210-A-36.wav,1,36,vacuum_cleaner,False,100210,A
3,1-100210-B-36.wav,1,36,vacuum_cleaner,False,100210,B
4,1-101296-A-19.wav,1,19,thunderstorm,False,101296,A


The annotations file contains genuinely useful information. In particular, the data was split into 5 folds, where care was taken to avoid leakage between folds (all parts of a single recording from which multiple audio segments were extracted should fall into a single fold).

But this dataset is relatively small - instead of exposing a set of audio files along with a csv with annotations, let's load everything up into a pandas `DataFrame`, including audio, and let us pickle it.

In [1]:
import librosa
import pandas as pd
import numpy as np
from IPython.lib.display import Audio
from matplotlib import pyplot as plt
import multiprocessing
import scipy.signal
from scipy import signal

In [18]:
%%time

audio, srs = [], []
for idx, row in anno.iterrows():
    x, sr = librosa.load(f'data/ESC-50/audio/{row.filename}', sr=None, mono=False)
    audio.append(x)
    srs.append(sr)

CPU times: user 1.84 s, sys: 3.2 s, total: 5.04 s
Wall time: 17.1 s


In [22]:
for rec in audio:
    assert rec.ndim == 1

In [23]:
set(srs)

{44100}

In [25]:
set([rec.shape[0] for rec in audio])

{220500}

In [28]:
audio[0].shape[0] / srs[0]

5.0

The recordings follow a univorm format. They have all been recorded with a sample rate of 44.1 kHz. They are all single channel and five second long.

In [32]:
anno.drop(columns=['filename', 'target', 'src_file', 'take'], inplace=True)

In [33]:
anno['audio'] = audio

In [6]:
anno.head()

Unnamed: 0,fold,category,esc10,audio
0,1,dog,True,"[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ..."
1,1,chirping_birds,False,"[-0.01184082, -0.10336304, -0.14141846, -0.120..."
2,1,vacuum_cleaner,False,"[-0.006958008, -0.012512207, -0.011260986, -0...."
3,1,vacuum_cleaner,False,"[0.53897095, 0.39627075, 0.26739502, 0.1376648..."
4,1,thunderstorm,False,"[-0.00036621094, -0.0007019043, -0.00079345703..."


Looking good! Let's pickle the `DataFrame` now to prepare it for uploading.

In [7]:
anno.to_pickle('data/anno.pkl')

In [10]:
!cd data && zip -q anno.zip anno.pkl