# Classifying Urban sounds using Deep Learning

## 2 Data Preprocessing and Data Splitting

### Audio properties that will require normalising 

Following on from the previous notebook, we identifed the following audio properties that need preprocessing to ensure consistency across the whole dataset:  

- Audio Channels 
- Sample rate 
- Bit-depth

We will continue to use Librosa which will be useful for the pre-processing and feature extraction. 

### Preprocessing stage 

For much of the preprocessing we will be able to use [Librosa's load() function.](https://librosa.github.io/librosa/generated/librosa.core.load.html) 

We will compare the outputs from Librosa against the default outputs of [scipy's wavfile library](https://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.io.wavfile.read.html) using a chosen file from the dataset. 

#### Sample rate conversion 

By default, Librosa’s load function converts the sampling rate to 22.05 KHz which we can use as our comparison level. 

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
import torch
from torch.utils.data import Dataset, DataLoader
import csv
import pandas as pd
import numpy as np
import librosa
import os,sys
import shutil
from tqdm import tqdm

In [None]:
ROOT_PATH='./drive/MyDrive/ASR_Project_Shared/'

## Creating a softlink to drive root, easy for relative addressing

## Guys 
# train_df = pd.read_csv("metadata_multi_clap_noise_marathi_bingte2.csv")
# test_df = pd.read_csv("metadata_allspeech_noisyspeech_spamfiles.csv")
# train_df.head()


# metadata_file = 'final_metadata/normal/metadata_allspeech_spamfiles.csv'
# nb_path = './'+ metadata_file
# os.symlink(ROOT_PATH+'metadata_allspeech_spamfiles.csv', nb_path)
# sys.path.insert(0, nb_path) 


nb_path = './final_data'
os.symlink(ROOT_PATH+'final_data', nb_path)
sys.path.insert(0, nb_path) 

nb_path = './final_metadata'
os.symlink(ROOT_PATH+'final_metadata', nb_path)
sys.path.insert(0, nb_path) 

nb_path = './final_pkl'
os.symlink(ROOT_PATH+'final_pkl', nb_path)
sys.path.insert(0, nb_path) 

In [None]:
metadata_file = './final_metadata/denoised/test_metadata_speech{all_clap}_noise{all_clap+noiseclips+spam} - test_metadata_speech{all_clap}_noise{all_clap+noiseclips+spam}.csv'

In [None]:
seq_len = 200
def extract_features(file_name):
   
    try:
        audio, sample_rate = librosa.load(file_name, res_type='kaiser_fast') 
        mfccs = librosa.feature.mfcc(y=audio, sr=sample_rate, n_mfcc=40)
        # mfccsscaled = np.mean(mfccs.T,axis=0)
        
        to_pad = mfccs[:,:seq_len]
        v = max(0, (seq_len-to_pad.shape[-1]))
        mfccsscaled = np.pad(to_pad,((0,0),(0,v))).T
        
    except Exception as e:
        print("Error encountered while parsing file: ", file)
        return None 
     
    return mfccsscaled

In [None]:
# del metadata

In [None]:
# Load various imports 
import pandas as pd
import os
import librosa

# Set the path to the full UrbanSound dataset 
#fulldatasetpath = '/Volumes/Untitled/ML_Data/Urban Sound/UrbanSound8K/audio/'

metadata = pd.read_csv(metadata_file)

features = []

unique_dim = set()

# Iterate through each sound file and extract the features 
for i, (index, row) in enumerate(metadata.iterrows()):
    print(i,"/",len(metadata))
    file_name = row['file_path']
    
    class_label = row["label"]
    data = extract_features(file_name)
    
    features.append([data, class_label])

    unique_dim.add(data.shape)

# Convert into a Panda dataframe 
featuresdf = pd.DataFrame(features, columns=['feature','class_label'])

print('Finished feature extraction from ', len(featuresdf), ' files') 

0 / 1492
1 / 1492
2 / 1492
3 / 1492
4 / 1492
5 / 1492
6 / 1492
7 / 1492
8 / 1492
9 / 1492
10 / 1492
11 / 1492
12 / 1492
13 / 1492
14 / 1492
15 / 1492
16 / 1492
17 / 1492
18 / 1492
19 / 1492
20 / 1492
21 / 1492
22 / 1492
23 / 1492
24 / 1492
25 / 1492
26 / 1492
27 / 1492
28 / 1492
29 / 1492
30 / 1492
31 / 1492
32 / 1492
33 / 1492
34 / 1492
35 / 1492
36 / 1492
37 / 1492
38 / 1492
39 / 1492
40 / 1492
41 / 1492
42 / 1492
43 / 1492
44 / 1492
45 / 1492
46 / 1492
47 / 1492
48 / 1492
49 / 1492
50 / 1492
51 / 1492
52 / 1492
53 / 1492
54 / 1492
55 / 1492
56 / 1492
57 / 1492
58 / 1492
59 / 1492
60 / 1492
61 / 1492
62 / 1492
63 / 1492
64 / 1492
65 / 1492
66 / 1492
67 / 1492
68 / 1492
69 / 1492
70 / 1492
71 / 1492
72 / 1492
73 / 1492
74 / 1492
75 / 1492
76 / 1492
77 / 1492
78 / 1492
79 / 1492
80 / 1492
81 / 1492
82 / 1492
83 / 1492
84 / 1492
85 / 1492
86 / 1492
87 / 1492
88 / 1492
89 / 1492
90 / 1492
91 / 1492
92 / 1492
93 / 1492
94 / 1492
95 / 1492
96 / 1492
97 / 1492
98 / 1492
99 / 1492
100 / 1492

In [None]:
unique_dim

{(200, 40)}

In [None]:
#featuresdf = []
import pickle
path = ROOT_PATH
# pickle_out = open(path+ metadata_file+".pkl","wb")
pickle_out = open(path+ './final_pkl/RNN/denoised/test_metadata_speech{all_clap}_noise{all_clap+noiseclips+spam} - test_metadata_speech{all_clap}_noise{all_clap+noiseclips+spam}.pkl',"wb")
pickle.dump( featuresdf, pickle_out)
pickle_out.close()