# Audio Classification use CNN
In this notebook I will use all features and train a cnn model to classify audio.

## Get features
Look the following code the code is same as `get_audio_features` notebook

In [None]:
import numpy as np
max_pad_len = 216
def extract_features(file_name):
   
    try:
        audio, sample_rate = librosa.load(file_name, res_type='kaiser_fast') 
        mfccs = librosa.feature.mfcc(y=audio, sr=sample_rate, n_mfcc=120)
    except Exception as e:
        print("Error encountered while parsing file: ", file_name)
        return None
    return mfccs

Error: Failed to connect to remote Jupyter notebook.
Check that the Jupyter Server URI setting has a valid running server specified.
http://114.34.48.196:8888/
TypeError: request to http://114.34.48.196:8888/api/contents/?1583487069511 failed, reason: connect ECONNREFUSED 114.34.48.196:8888

## Read dataset and get all features of all audios
1. Read the csv file

loop

2. get and save features of the audio of the row

3. save the label

Look the following code:

In [35]:
# Load various imports 
import pandas as pd
import os
import librosa

# Set the path to the full UrbanSound dataset 
fulldatasetpath = './ESC-50/audio/'

metadata = pd.read_csv('./ESC-50/meta/esc50.csv')

features = []

# Iterate through each sound file and extract the features 
for index, row in metadata.iterrows():
    
    file_name = os.path.join(os.path.abspath(fulldatasetpath),str(row["filename"]))
    class_label = row["category"]
    data = extract_features(file_name)
    
    features.append([data, class_label])

# Convert into a Panda dataframe 
featuresdf = pd.DataFrame(features, columns=['feature','class_label'])

print('Finished feature extraction from ', len(featuresdf), ' files')

Finished feature extraction from  2000  files


## Convert the data and labels and split the dataset
I will use sklearn.preprocessing.LabelEncoder to encode the categorical text data into model-understandable numerical data.
Here I will use sklearn.model_selection.train_test_split to split the dataset into training and testing sets. The testing set size will be 20% and I will set a random state.

In [36]:
from sklearn.preprocessing import LabelEncoder
from keras.utils import to_categorical

# Convert features and corresponding classification labels into numpy arrays
X = np.array(featuresdf.feature.tolist())
y = np.array(featuresdf.class_label.tolist())

# Encode the classification labels
le = LabelEncoder()
yy = to_categorical(le.fit_transform(y)) 

# split the dataset 
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(X, yy, test_size=0.2, random_state = 42)


## Build CNN model
Look the following code

In [37]:
import numpy as np
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation, Flatten
from keras.layers import Convolution2D, Conv2D, MaxPooling2D, GlobalAveragePooling2D
from keras.optimizers import Adam
from keras.utils import np_utils
from sklearn import metrics 

a, num_rows, num_columns = X.shape
num_channels = 1

x_train = x_train.reshape(x_train.shape[0], num_rows, num_columns, num_channels)
x_test = x_test.reshape(x_test.shape[0], num_rows, num_columns, num_channels)

num_labels = yy.shape[1]
filter_size = 2

# Construct model 
model = Sequential()
model.add(Conv2D(filters=16, kernel_size=2, input_shape=(num_rows, num_columns, num_channels), activation='relu'))
model.add(MaxPooling2D(pool_size=2))
model.add(Dropout(0.2))

model.add(Conv2D(filters=32, kernel_size=2, activation='relu'))
model.add(MaxPooling2D(pool_size=2))
model.add(Dropout(0.2))

model.add(Conv2D(filters=64, kernel_size=2, activation='relu'))
model.add(MaxPooling2D(pool_size=2))
model.add(Dropout(0.2))

model.add(Conv2D(filters=128, kernel_size=2, activation='relu'))
model.add(MaxPooling2D(pool_size=2))
model.add(Dropout(0.2))

model.add(GlobalAveragePooling2D())

model.add(Dense(512))
model.add(Activation('relu'))
model.add(Dropout(0.5))

model.add(Dense(512))
model.add(Activation('relu'))
model.add(Dropout(0.5))

model.add(Dense(512))
model.add(Activation('relu'))
model.add(Dropout(0.5))

model.add(Dense(num_labels, activation='softmax'))

In [38]:
# Compile the model
model.compile(loss='categorical_crossentropy', metrics=['accuracy'], optimizer='adam')

In [39]:
# Display model architecture summary 
model.summary()

# Calculate pre-training accuracy 
score = model.evaluate(x_test, y_test, verbose=1)
accuracy = 100 * score[1]

print("Pre-training accuracy: %.4f%%" % accuracy)

Model: "sequential_5"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_17 (Conv2D)           (None, 119, 215, 16)      80        
_________________________________________________________________
max_pooling2d_17 (MaxPooling (None, 59, 107, 16)       0         
_________________________________________________________________
dropout_29 (Dropout)         (None, 59, 107, 16)       0         
_________________________________________________________________
conv2d_18 (Conv2D)           (None, 58, 106, 32)       2080      
_________________________________________________________________
max_pooling2d_18 (MaxPooling (None, 29, 53, 32)        0         
_________________________________________________________________
dropout_30 (Dropout)         (None, 29, 53, 32)        0         
_________________________________________________________________
conv2d_19 (Conv2D)           (None, 28, 52, 64)       

## Train and save model

In [40]:
from keras.callbacks import ModelCheckpoint 
from datetime import datetime 

num_epochs = 1000
num_batch_size = 256

checkpointer = ModelCheckpoint(filepath='save_models/weights.best.basic_cnn10.hdf5', 
                               verbose=1, save_best_only=True)
start = datetime.now()

model.fit(x_train, y_train, batch_size=num_batch_size, epochs=num_epochs, validation_data=(x_test, y_test), callbacks=[checkpointer], verbose=1)


duration = datetime.now() - start
print("Training completed in time: ", duration)

- val_accuracy: 0.6650

Epoch 00902: val_loss did not improve from 1.68152
Epoch 903/1000

Epoch 00903: val_loss did not improve from 1.68152
Epoch 904/1000

Epoch 00904: val_loss did not improve from 1.68152
Epoch 905/1000

Epoch 00905: val_loss did not improve from 1.68152
Epoch 906/1000

Epoch 00906: val_loss did not improve from 1.68152
Epoch 907/1000

Epoch 00907: val_loss did not improve from 1.68152
Epoch 908/1000

Epoch 00908: val_loss did not improve from 1.68152
Epoch 909/1000

Epoch 00909: val_loss did not improve from 1.68152
Epoch 910/1000

Epoch 00910: val_loss did not improve from 1.68152
Epoch 911/1000

Epoch 00911: val_loss did not improve from 1.68152
Epoch 912/1000

Epoch 00912: val_loss did not improve from 1.68152
Epoch 913/1000

Epoch 00913: val_loss did not improve from 1.68152
Epoch 914/1000

Epoch 00914: val_loss did not improve from 1.68152
Epoch 915/1000

Epoch 00915: val_loss did not improve from 1.68152
Epoch 916/1000

Epoch 00916: val_loss did not improve 

## Evalue training data and testing data

In [41]:
# Evaluating the model on the training and testing set
score = model.evaluate(x_train, y_train, verbose=0)
print("Training Accuracy: ", score[1])

score = model.evaluate(x_test, y_test, verbose=0)
print("Testing Accuracy: ", score[1])

Training Accuracy:  0.9900000095367432
Testing Accuracy:  0.6700000166893005


## Observations
The accuracy from (DNN version)0.48 to (CNN version)0.67