# Inference

In this notebook the saved model will be loaded and used for prediction on sound clip downloaded from the freesound website. I will perform the inference on the ANN model as it's accuracy was good. The random forest model's accuracy was not good and is not a good choice for this type of data. Let's import some important libraries required for inference task. 

In [54]:
import librosa
import numpy as np
import pandas as pd
from tensorflow.keras.models import load_model
import joblib
import IPython.display as ipd
import warnings
warnings.filterwarnings('ignore')

The ANN model is loaded which i saved in the training notebook. 

In [3]:
model = load_model('ann_detector.h5')

2022-04-18 23:48:59.830672: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2022-04-18 23:48:59.876239: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-04-18 23:48:59.876831: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties: 
pciBusID: 0000:01:00.0 name: NVIDIA GeForce RTX 3090 computeCapability: 8.6
coreClock: 1.755GHz coreCount: 82 deviceMemorySize: 23.70GiB deviceMemoryBandwidth: 871.81GiB/s
2022-04-18 23:48:59.876981: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2022-04-18 23:48:59.878080: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2022-04-18 23:48:59.879143: I tensorflow/stream_executor/pl

Below I have written functions that will take the path of the audio file, pre-process the sound clip, and use it for prediction. The Melspectrogram and MFCC features of the sound are extracted and then combined. The combined features are converted to numpy array is used for prediction. The prediction results are then computed with the class name and ID dataframe extracted from the urbansound csv file using Pandas. Together, the functions display the class name and sound clip in jupyter notebook.

In [65]:
def audio_extractor(path):
    # load the sound clip using librosa 
    data, sampling_rate = librosa.load(path)
    # extract the audio to be played in jupyter notebook and save it in a variable
    sound = ipd.Audio(data, rate=sampling_rate)
    
    # extract the spectrogram feature
    spec = librosa.feature.melspectrogram(data, sr=sampling_rate)
    #average column values of the spec array 
    spec = np.mean(spec, axis=1)
    # extract mfcc
    mfcc = librosa.feature.mfcc(data, sr=sampling_rate)
    #average column values of the mfcc array
    mfcc = np.mean(mfcc, axis=1)
    # concatenate the features of spectrogram and mfcc
    new_features = np.concatenate((spec, mfcc), axis=0)
    
    # convert the feature to numpy array
    new_arr = np.array(new_features)
    # reshape the array to be accpeted by the model
    new_arr = new_arr.reshape(1, 148)
    # use model's predict method on the array
    pred = model.predict([new_arr])[0]
    # convert the prediction to argmax
    prediction = np.argmax(pred)
    
    # read the urbansound csv to extract class name and class ID
    df = pd.read_csv('./UrbanSound8K.csv')
    cls_group = df.groupby(['class', 'classID']).nunique()
    cls_id = cls_group.iloc[:, :0]
    cls_name = cls_id.index[prediction][0]
    # return class name and ipd sound variable
    return cls_name, sound

# function to detect, print sound class, and display the audio in jupyter notebook 
def audio_detector(path):
    class_name, sound = audio_extractor(path)
    print(class_name)
    print()
    ipd.display(sound)

The audio file I downloaded from the freesound website. This is a sound of a dog barking

In [66]:
# enter the path of audio file
path = '/home/msc1/Downloads/sounds/155312__jace__dog-barking.wav'

# the detector function will out put the class name and sound
audio_detector(path)

dog_bark



This is a sound of drilling.

In [73]:
path = '/home/msc1/Downloads/sounds/139000__kargul-x__wiertarka.wav'

audio_detector(path)

drilling



## Conclusion

This was a sound classification problem of detecting sound classes in an urban environment. A detector like this can be used for detecting sounds in public places and can be used to prevent crimes in public places. The librosa library was used to extract the features Melspectrogram and MFCC. Both these features were combined to form an array of new features. Artificial Neural Networks and Random Forest were trained using the extracted features. ANN performed better with an accuracy of 93% compared to Random Forest with 66%. The automatic feature extraction of neural networks helps them to learn more features and performs better than tradional machine learning algorithms. In the end, I used some random sound clips downloaded from the internet to check the ANN model performance against raw unseen data. And the model predicted correctly on both the occasions.