## Convolutional Neural Network Algorithm Extraction

#### A convolutional neural network algorithm will be tested against the MLP algorithm in order to determine which model may contain better statistical results. Given the fact that we are dealing with Mel Frequency Cepstrum Coefficients, it is best to utilize a CNN algorithm as multiple coefficients will be able to exploit CNN's success in Image Processing capacities.

In [None]:
# Import libraries
import librosa
import numpy as np
import os

In [None]:
# Test MFCC values of Longer Vs. Shorter Samples
# Creating a function that extracts the MFCC features of an audio file
def extract_features(file_name, max_pad_len):
    
    try:
        
        # Librosa extraction of audio array and sampling rate
        audio, sample_rate = librosa.load(file_name, res_type='kaiser_fast') # resampling at a "faster rate as opposed to higher quality"
        # MFCC feature extraction of audio - mfccs is mfcc sequence (array), n_mfcc is number of MFCCs to return
        mfccs = librosa.feature.mfcc(y=audio, sr=sample_rate, n_mfcc=40)
        # If the number of frames is less than the max_pad_len, zero-pad up to max_pad_len
        pad_width = max_pad_len - mfccs.shape[1]
        mfccs = np.pad(mfccs, pad_width=((0, 0), (0, pad_width)), mode='constant')

    except Exception as e:
        print("Error encountered while parsing file ", file_name)
        return None
    
    return mfccs

#### Now, in order to prepare our model, we need features to be extracted from all our audio signals for analysis. These features will be the input to the CNN algorithm that we train the model using. 
#### Before the features can be extracted, it's important to know the maximum number of frames when we extract MFCC's using the Librosa library in Python
#### Below, we will be appending the lenVars list in order to extract the maximum number of frames in the portion of the dataset that we are interested in. We assign this to the variable max_pad_length as this will be directly used to extract features using the function above.

In [None]:
# Load various imports 
import pandas as pd
import os
import librosa
from pathlib import Path

# Set the path to the full UrbanSound dataset
root_path = Path(os.getcwd()).parent.parent # Software Folder
fulldatasetpath = root_path / "Training_Dataset" / "audio"
metadata = pd.read_csv(root_path / "Training_Dataset" / "metadata" / "MasterDataSet.csv")

In [None]:
categories = ['car_horn', 'gun_shot', 'siren']

lenVars = []

# Iterate through each sound file and extract the number of frames 
for index, row in metadata.iterrows():
    
    # Extract filename and category
    #print(row["class_name"])
    category_str = row["class_name"]
    
    # Loop through metadata comparing the categories
    if category_str in categories:
        # Extract MFCCs 
        file_name = os.path.join(os.path.abspath(fulldatasetpath),'fold'+str(row["fold"])+'/',str(row["slice_file_name"]))
        file_name = Path(file_name) #Convert to pathlib object for OS compatibility
        audio, sample_rate = librosa.load(file_name, res_type='kaiser_fast')
        mfccs_pre = librosa.feature.mfcc(y=audio, sr=sample_rate, n_mfcc=40)
        numFrames = mfccs_pre.shape[1]
        lenVars.append(numFrames)
            
    else:
        continue

# Extract max number of frames
max_pad_length = max(lenVars)

features = []

# Iterate through each sound file and extract the features 
for index, row in metadata.iterrows():
    
    # Extract filename and category
    category_str = row["class_name"]
    
    # Loop through metadata comparing the categories
    if category_str in categories:
    
        file_name = os.path.join(os.path.abspath(fulldatasetpath),'fold'+str(row["fold"])+'/',str(row["slice_file_name"]))
        class_label = row["class_name"]
        data = extract_features(file_name, max_pad_length)
        features.append([data, class_label])
        
    else:
        continue
        
# Convert into a Panda dataframe 
featuresdf = pd.DataFrame(features, columns=['feature','class_label'])

print('Finished feature extraction from ', len(featuresdf), ' files')

#### Now that the maximum padding length has been found to be 173, this value can be inserted in the extract_features function as we iterate through the metadata to find the location of the sound files of interest, followed by a feature extraction of those files.
#### The results are appended to a features array along with their respective categories. Ultimately this is translated to a dataframe in order to exploit the Pandas library.

#### At the termination of this point, we have extracted our final feature count (1307 files) using the dataframe for features generated above.

In [None]:
# Normalize the dataframe such that there are the same number of files per class_label
# This ensures that no one category has an advantage when the model is being trained

print(featuresdf.class_label.count()) # 1307

# Create dictionary of dataframes
frames = {}
categories = ['car_horn', 'gun_shot', 'siren']

arr_Size = []

for label in categories:
    frames[label] = featuresdf[featuresdf['class_label'] == label]
    # Extract shape and get number of rows
    rNc = frames[label].shape
    # Gets number of rows
    arr_Size.append(rNc[0])
    print(label, rNc[0])

# Take the minimum size from size array
minSize = min(arr_Size)

# Utilize minimum size to slice rows such that only the minimum size is maintained
for label in frames:
    frames[label] = frames[label].sample(minSize)
    print(frames[label].shape[0])
    
# Concatenate all dataframes in dictionary of dataframes
# Place the concatenated frame back in featuresdf
# Reindex
result = pd.concat(frames)
features_temp = pd.DataFrame()
features_temp = result[["feature", "class_label"]]

# Reindex features_temp
features_temp = features_temp.reset_index(drop=True)

#### At the termination of this point, we have extracted our final feature count using the dataframe for features generated above.
#### Now, we came to realize that the number of files for the 3 categories of interest varied across the dataset: as seen above, car horn has 434 files, gun shot has 449 files and siren has 424 files. 
#### As such, in order to normalize this, we will find the category with the least number of samples and cut down the samples across all categories to that value.
#### The cut down dataframe includes the same number of samples across all categories (424 files) and will be used for the actual model training.

In [None]:
# Send temp features to features df
featuresdf = features_temp
display(featuresdf)

#### In the section below, the features dataframe above was utilized to extract X and Y variables where X is the features and Y corresponds to the category.
#### In order to convert categories into numerical values for encoding, LabelEncoder from the sklearn.preprocessing followed by a binary matrix conversion
#### The resultant data (features and encoded labels) was split into training and testing sets of 75% and 25%, respectively.

In [None]:
# Use sklearn.preprocessing.LabelEncoder to encode the categorical text data into model-understandable numerical data
from sklearn.preprocessing import LabelEncoder
from keras.utils import to_categorical

# Convert features and corresponding classification labels into numpy arrays
X = np.array(featuresdf.feature.tolist())
y = np.array(featuresdf.class_label.tolist())

# This part will convert the categories into their respective numerical value
le = LabelEncoder()
# Fit transform receives categories and assigns numerical value to them. to_categorical converts to binary matrix
yy = to_categorical(le.fit_transform(y))

# Split the dataset - 25% test, 75% train
from sklearn.model_selection import train_test_split 

# X is feature, Y is labels
# 42 is the seed to generating random numbers - starting position, integer required to ensure training and testing are consistent
x_train, x_test, y_train, y_test = train_test_split(X, yy, test_size=0.25, random_state = 42)

In [None]:
# Store data into next notebook
%store x_train
%store x_test
%store y_test
%store y_train
%store yy
%store le
%store max_pad_length