# Urban Sound Classification - Feature Extraction & KNN
## Context
The automatic classification of environmental sound is a growing research field with multiple applications to largescale, content-based multimedia indexing and retrieval. In particular, the sonic analysis of urban environments is the subject of increased interest, partly enabled by multimedia sensor networks, as well as by large quantities of online multimedia content depicting urban scenes.

## Content
The dataset is called UrbanSound and contains 8732 labeled sound excerpts (<=4s) of urban sounds from 10 classes: Air Conditioner, Car Horn, Children Playing, Dog bark, Drilling, Engine Idling, Gun Shot, Jackhammer, Siren Street, Music. The attributes of data are as follows: ID – Unique ID of sound excerpt Class – type of sound

## Goals
In this notebook we will build a model which classifies each sound into one of the unique categories. To do so, we will use the algorithm *k-nearest neighbors* as our model and the [MFFC](https://en.wikipedia.org/wiki/Mel-frequency_cepstrum) as the extracted feature of each audio.

In [1]:
!pip install soundfile

In [None]:
import os

import numpy as np
import pandas as pd

import librosa
import librosa.display
import soundfile as sf # librosa fails when reading files on Kaggle.

import matplotlib.pyplot as plt
import IPython.display as ipd

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import confusion_matrix

# Audio Processing
First, as good data scientists, let's inspect our data by hearing some of those audios files.

In [None]:
audio_path = '../input/train/Train/2.wav'
ipd.Audio(audio_path)

Thanks to the library we are using for audio processing named *librosa*, we can display the wave plot of our sample.

In [None]:
# Extract the audio data (x) and the sample rate (sr).
x, sr = librosa.load(audio_path)

# Plot the sample.
plt.figure(figsize=(12, 5))
librosa.display.waveplot(x, sr=sr)
plt.show()

As [this article](https://towardsdatascience.com/extract-features-of-music-75a3f9bc265d) exceptionally explains, we can extract different features from our audios. I recommend you to check out the article because it gives a more in depth explanation about where these features come from.

### Crossing Rate
How many times the audio wave crosses the zero line.

In [None]:
plt.figure(figsize=(12, 5))
plt.plot(x[1000:1100]) # Zoom-in for seeing the example.
plt.grid()

n_crossings = librosa.zero_crossings(x[1000:1100], pad=False)
print(f'Number of crosses: {sum(n_crossings)}')

### Spectral Centroids
A weighted mean of audio frequencies.

In [None]:
centroids = librosa.feature.spectral_centroid(x, sr=sr)[0]

print(f'Centroids Shape: {centroids.shape}')
print(f'First 3 centroids: {centroids[:3]}')

### MFCC
This is the feature we will use for our analysis. It's one amongst the most popular features, because it provides data about the overall shape of the audio frequencies. 

In [None]:
mfccs = librosa.feature.mfcc(x, sr=sr)
print(f'MFFCs shape: {mfccs.shape}')
print(f'First mffcs: {mfccs[0, :5]}')

# We can even display an spectogram of the mfccs.
librosa.display.specshow(mfccs, sr=sr, x_axis='time')
plt.show()

# Model Creation
Now that we've learnt how to extract features from our audios, let's create our model.

To start things off, we will compute the mean mfcc of each feature inside the mfcc. This is because the data that the mfcc returns is not always the same length, and since our model expects the data to be the same shape, we have transform it. Then we will parse each audio file and compute the mean mfcc for each one. Once that's done, we can feed our model with the data.

In [None]:
def mean_mfccs(x):
    return [np.mean(feature) for feature in librosa.feature.mfcc(x)]

def parse_audio(x):
    return x.flatten('F')[:x.shape[0]] 

def get_audios():
    train_path = "../input/train/Train/"
    train_file_names = os.listdir(train_path)
    train_file_names.sort(key=lambda x: int(x.partition('.')[0]))
    
    samples = []
    for file_name in train_file_names:
        x, sr = sf.read(train_path + file_name, always_2d=True)
        x = parse_audio(x)
        samples.append(mean_mfccs(x))
        
    return np.array(samples)

def get_samples():
    df = pd.read_csv('../input/train.csv')
    return get_audios(), df['Class'].values

In [None]:
X, Y = get_samples()

# Since the data manufacturer doesn't provide the labels for the test audios,
# we will have do the split for the labeled data.
x_train, x_test, y_train, y_test = train_test_split(X, Y)

Let's see how our data looks like.

In [None]:
print(f'Shape: {x_train.shape}')
print(f'Observation: \n{x_train[0]}')
print(f'Labels: {y_train[:5]}')

Before building our model we should check if we would benefit of some noise reduction using PCA.

In [None]:
scaler = StandardScaler()
scaler.fit(x_train)
x_train_scaled = scaler.transform(x_train)
x_test_scaled = scaler.transform(x_test)

pca = PCA().fit(x_train_scaled)

plt.plot(np.cumsum(pca.explained_variance_ratio_))
plt.xlabel('Number of Components')
plt.ylabel('Variance (%)')
plt.show()

As we can see most of the variance is explained using all the features of the MFCC. This is expected since each feature gives information about the wave shape. Finally we can fit our model.

In [None]:
grid_params = {
    'n_neighbors': [3, 5, 7, 9, 11, 15],
    'weights': ['uniform', 'distance'],
    'metric': ['euclidean', 'manhattan']
}

model = GridSearchCV(KNeighborsClassifier(), grid_params, cv=5, n_jobs=-1)
model.fit(x_train_scaled, y_train)

In [None]:
print(f'Model Score: {model.score(x_test_scaled, y_test)}')

y_predict = model.predict(x_test_scaled)
print(f'Confusion Matrix: \n{confusion_matrix(y_predict, y_test)}')

It scored quite good taking in consideration how simple our approach was. Even so, in other circumstances we would focus more on improving our model. Perhaps we could choose a better algorithm, or maybe we could preprocess our data in a smarter way.
 
Since this is not the goal of the notebook, I'd like to keep things simple. But I invite you to fork this notebook and try to push it beyond belief.
 
<br>
I hope you have learnt something new. See you in the next one!