<a href="https://colab.research.google.com/github/anantha99/Sound_Classification/blob/main/audio_classifier.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Uploading the data

1. We used a opendatasets library from jovian to load the dataset from the source. 
2. We have unzipped the tar file and extracted all the contents of the file.
3. We removed the file named '.DS_Store' in each of the dataset folder which is threat to the system.



In [None]:
#importing the dataset
!pip install opendatasets --upgrade

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting opendatasets
  Downloading opendatasets-0.1.22-py3-none-any.whl (15 kB)
Installing collected packages: opendatasets
Successfully installed opendatasets-0.1.22


In [None]:
import opendatasets as od
dataset_url = 'https://goo.gl/8hY5ER'
od.download(dataset_url)

Downloading https://goo.gl/8hY5ER to ./UrbanSound8K.tar.gz


6023749632it [1:10:06, 1432064.56it/s]                                


In [None]:

# importing the "tarfile" module
import tarfile
  
# open file
file = tarfile.open('/content/UrbanSound8K.tar.gz')
  
# extracting file
file.extractall('./content/')
  
file.close()

In [None]:
#removed all the .DS_Store files in the dataset 
import os 

dataset_path = '/content/content/UrbanSound8K/audio'
for dirpath, dirnames, filenames in os.walk(dataset_path):
  #ensure we are not at the root level
  file = []
  if dirpath is not dataset_path:
    for f in filenames:
      #print(f)
      if f == '.DS_Store':
        path_of_the_file = os.path.join(dirpath,f)
        os.remove(path_of_the_file)
      else:
        #loading the labels
        filename_components = f.split("-")
        print(filename_components)
        label_component = filename_components[1]
        file.append(label_component)

# Data Preprocessing 

** Data Preprocessing **
1. It is the process of conversion of raw data into numerical features theat can be processed while preserving the original information.
2. Since the data is audio which is collected in the analog form it should be converted into digital form and then analyzed for features.

1. We saw in the EDA part that each signal had different
  1. Bit_depth
  2. Sample Rate 
  3. Coverting everything to mono.
2. We can use librosa librbary which will help us overcome all the above points.For much of the preprocessing we will be able to use Librosa’s load() function, which by default converts the sampling rate to 22.05 KHz, normalise the data so the bit-depth values range between -1 and 1 and flattens the audio channels into mono.
** Feature Extraction **
3. Now we have to extract the features. We have to convert them into visual representation which will allow us to indentify features for classification.
For doing this there are popularly 2 methods:
  1. MFCC -  Mel-Frequency Cepstral Coefficients  
  2. Spectrograms

Spectrograms are a useful technique for visualising the spectrum of frequencies of a sound and how they vary during a very short period of time.

But spectrograms does not take into consideration the quality of the same sound. So we make use mfcc which are much more sensitive and here mfcc uses quasi-logarithmic spaced frequency scale, which is more similar to how the human auditory system processes sounds.

For each audio file in the dataset, we will extract an MFCC (meaning we have an image representation for each audio sample) and store it in a Panda Dataframe along with it’s classification label. For this we will use Librosa’s mfcc() function which generates an MFCC from time series audio data.

In [None]:
import os
import librosa
import math
import json
import numpy as np

DATASET_PATH = '/content/content/UrbanSound8K/audio'



def save_mfcc(dataset_path,num_mfcc=40):
    # dictionary to store mapping, labels, and MFCCs
    extracted_features = []
    for dirpath, dirnames, filenames in os.walk(dataset_path):
        #ensure we are not at the root level
        if dirpath is not dataset_path:
            #save the fold number
            dirpath_components = dirpath.split(os.sep)
            semantic_label = dirpath_components[-1]
            print("\nProcessing: {}".format(semantic_label))

            #process files for a specific genre
            for f in filenames:
                #loading the labels 
                filename_components = f.split("-")
                label_component = filename_components[1]
                #load audio files 
                file_path = os.path.join(dirpath, f)
                #loading the file using librosa
                signal , sr = librosa.load(file_path)
                #extract the mfcc features
                mfcc = np.mean(librosa.feature.mfcc(y=signal, sr=sr, n_mfcc=num_mfcc).T,axis=0)

                #store the mfcc for segment if it has expected length
                #mapping_component = data['mapping']
                extracted_features.append([mfcc,label_component,semantic_label])
    return extracted_features       
                    
extracted_values = save_mfcc(DATASET_PATH)



Processing: fold10

Processing: fold4

Processing: fold3

Processing: fold1


  n_fft, y.shape[-1]
  n_fft, y.shape[-1]
  n_fft, y.shape[-1]



Processing: fold2

Processing: fold9

Processing: fold7

Processing: fold6

Processing: fold8

Processing: fold5


## Save as csv

In [None]:
values = extracted_values.copy()

In [None]:
#converting arrays to list before putting saving it as a csv
for j in values:
  j[0] = j[0].tolist()

In [None]:
save_csv=pd.DataFrame(values,columns=['feature','class','fold_component'])

In [None]:
save_csv.head()

Unnamed: 0,feature,class,fold_component
0,"[-197.47579956054688, 173.50418090820312, -26....",9,fold10
1,"[-322.86224365234375, 139.29617309570312, -10....",3,fold10
2,"[52.46525192260742, 114.9016342163086, -16.365...",5,fold10
3,"[-223.27256774902344, 70.83647155761719, -44.8...",3,fold10
4,"[-112.69658660888672, 128.1402587890625, -44.6...",4,fold10


In [None]:
#saving the file to csv 
save_csv.to_csv('extracted_data.csv')

In [None]:
#copying the file to google drive
!cp extracted_data.csv /content/drive/MyDrive/audio_data

# Load the Data

In [None]:
import pandas as pd
from ast import literal_eval

#extracted_features_df=pd.DataFrame(extracted_values,columns=['feature','class','fold_component'])
#inplace of literal_eval pd.val can also be used but this is 15X faster 
extracted_features_df = pd.read_csv('/content/drive/MyDrive/audio_data/extracted_data.csv',converters={'feature': literal_eval})

In [None]:
extracted_features_df.columns

Index(['Unnamed: 0', 'feature', 'class', 'fold_component'], dtype='object')

In [None]:
extracted_features_df.drop(['Unnamed: 0'],axis=1,inplace=True)

In [None]:
extracted_features_df.shape

(8732, 3)

In [None]:
#replacing class numbers with their class names
extracted_features_df['class'] = extracted_features_df['class'].map({0:'air_conditioner',
                               1:'car_horn',
                               2:'children_playing',
                               3:'dog_bark',
                               4:'drilling',
                               5:'engine_idling',
                               6:'gun_shot',
                               7:'jackhammer',
                               8:'siren',
                               9:'street_music'})

In [None]:
extracted_features_df.head()

Unnamed: 0,feature,class,fold_component
0,"[-197.47579956054688, 173.50418090820312, -26....",street_music,fold10
1,"[-322.86224365234375, 139.29617309570312, -10....",dog_bark,fold10
2,"[52.46525192260742, 114.9016342163086, -16.365...",engine_idling,fold10
3,"[-223.27256774902344, 70.83647155761719, -44.8...",dog_bark,fold10
4,"[-112.69658660888672, 128.1402587890625, -44.6...",drilling,fold10


In [None]:
### Split the dataset into independent and dependent dataset
import numpy as np
X=np.array(extracted_features_df['feature'].tolist())
y=np.array(extracted_features_df['class'].tolist())

In [None]:
X

array([[-1.97475800e+02,  1.73504181e+02, -2.65719948e+01, ...,
         8.78134146e-02, -3.71990299e+00, -2.50093222e+00],
       [-3.22862244e+02,  1.39296173e+02, -1.09721375e+01, ...,
         1.35880268e+00,  1.79998529e+00,  3.27832818e+00],
       [ 5.24652519e+01,  1.14901634e+02, -1.63653412e+01, ...,
         2.37140274e+00,  1.70572448e+00,  1.62329721e+00],
       ...,
       [-3.04419220e+02,  1.25434494e+02, -9.40262508e+00, ...,
        -4.17565250e+00, -4.94980812e+00, -5.25423670e+00],
       [-1.83269470e+02,  4.05924149e+01,  3.99460258e+01, ...,
         3.55715251e+00, -9.44424248e+00, -9.55445766e-02],
       [-1.69769165e+02,  1.01184067e+02, -1.14349079e+01, ...,
         3.00564480e+00, -6.44474649e+00,  2.53110313e+00]])

In [None]:
#y=np.array(pd.get_dummies(y))
from sklearn.preprocessing import LabelEncoder
from tensorflow.keras.utils import to_categorical

In [None]:
labelencoder = LabelEncoder()
y = to_categorical(labelencoder.fit_transform(y))


In [None]:
y.shape

(8732, 10)

In [None]:
X.shape

(8732, 40)

# Model Creation 

In [None]:
from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.2,random_state=0)

In [None]:
X_train.shape

(6985, 40)

In [None]:
X_train[0]

array([-4.85112976e+02,  1.18378418e+02, -2.37864742e+01,  4.47219391e+01,
        8.44127083e+00,  2.11784573e+01,  1.26236734e+01,  1.11395464e+01,
        9.40762615e+00,  1.20282135e+01, -2.23583817e+00,  1.01117783e+01,
       -2.37149167e+00,  3.93047810e+00, -3.79374695e+00,  7.69953918e+00,
       -8.74187499e-02,  9.65779591e+00, -5.78604126e+00,  3.88852501e+00,
       -6.16443348e+00,  6.70675325e+00, -6.48489141e+00,  5.31670380e+00,
       -4.36563635e+00,  1.34347749e+00, -4.65915489e+00,  2.13763380e+00,
       -2.72958136e+00,  2.31922460e+00, -2.75978947e+00,  2.43736053e+00,
       -5.51829910e+00,  2.57631898e+00, -3.82228613e+00,  1.42204297e+00,
       -2.07691550e+00,  2.26053429e+00, -3.69663835e+00,  2.26473510e-01])

In [None]:
import tensorflow as tf
print(tf.__version__)

2.8.2


In [None]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense,Dropout,Activation,Flatten
from tensorflow.keras.optimizers import Adam
from sklearn import metrics

In [None]:
y.shape

(8732, 10)

In [None]:
### No of classes
num_labels=y.shape[1]

In [None]:
model=Sequential()
###first layer
model.add(Dense(100,input_shape=(40,)))
model.add(Activation('relu'))
model.add(Dropout(0.1))
###second layer
model.add(Dense(100))
model.add(Activation('relu'))
model.add(Dropout(0.1))
###final layer
model.add(Dense(num_labels))
model.add(Activation('softmax'))

In [None]:
model.summary()

Model: "sequential_3"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 dense_9 (Dense)             (None, 100)               4100      
                                                                 
 activation_9 (Activation)   (None, 100)               0         
                                                                 
 dropout_6 (Dropout)         (None, 100)               0         
                                                                 
 dense_10 (Dense)            (None, 100)               10100     
                                                                 
 activation_10 (Activation)  (None, 100)               0         
                                                                 
 dropout_7 (Dropout)         (None, 100)               0         
                                                                 
 dense_11 (Dense)            (None, 10)               

In [None]:
model.compile(optimizer='adam',metrics=['accuracy'],loss='categorical_crossentropy')

In [None]:
## Trianing my model
from tensorflow.keras.callbacks import ModelCheckpoint
from datetime import datetime 

num_epochs = 20
num_batch_size = 32

checkpointer = ModelCheckpoint(filepath='saved_models/audio_classification_1.hdf5', 
                               verbose=1, save_best_only=True)
start = datetime.now()

model.fit(X_train, y_train, batch_size=num_batch_size, epochs=num_epochs, validation_data=(X_test, y_test), callbacks=[checkpointer], verbose=1)


duration = datetime.now() - start
print("Training completed in time: ", duration)

Epoch 1/20
Epoch 1: val_loss improved from inf to 0.36380, saving model to saved_models/audio_classification_1.hdf5
Epoch 2/20
Epoch 2: val_loss did not improve from 0.36380
Epoch 3/20
Epoch 3: val_loss did not improve from 0.36380
Epoch 4/20
Epoch 4: val_loss did not improve from 0.36380
Epoch 5/20
Epoch 5: val_loss did not improve from 0.36380
Epoch 6/20
Epoch 6: val_loss did not improve from 0.36380
Epoch 7/20
Epoch 7: val_loss did not improve from 0.36380
Epoch 8/20
Epoch 8: val_loss did not improve from 0.36380
Epoch 9/20
Epoch 9: val_loss improved from 0.36380 to 0.35989, saving model to saved_models/audio_classification_1.hdf5
Epoch 10/20
Epoch 10: val_loss did not improve from 0.35989
Epoch 11/20
Epoch 11: val_loss did not improve from 0.35989
Epoch 12/20
Epoch 12: val_loss did not improve from 0.35989
Epoch 13/20
Epoch 13: val_loss did not improve from 0.35989
Epoch 14/20
Epoch 14: val_loss did not improve from 0.35989
Epoch 15/20
Epoch 15: val_loss did not improve from 0.3598

In [None]:
training_accuracy = model.evaluate(X_train,y_train,verbose=0)
print("Training Accuracy:",training_accuracy[1])

test_accuracy=model.evaluate(X_test,y_test,verbose=0)
print("Testing_accuracy:",test_accuracy[1])

Training Accuracy: 0.972655713558197
Testing_accuracy: 0.9066972136497498


# Prediction

In [None]:
#extracting mfcc values from the given waves
import librosa
import numpy as np


def features_extractor(file_name):
    audio, sample_rate = librosa.load(file_name, res_type='kaiser_fast') 
    mfccs_features = librosa.feature.mfcc(y=audio, sr=sample_rate, n_mfcc=40)
    mfccs_scaled_features = np.mean(mfccs_features.T,axis=0)
    
    return mfccs_scaled_features

In [None]:
#a function that predicts the class taking the file name
def predict_class(file_name, model_name):
  mfccs_scaled_features = features_extractor(file_name)
  mfccs_scaled_features=mfccs_scaled_features.reshape(1,-1)
  #print(mfccs_scaled_features.shape)
  predicted_label=np.argmax(model_name.predict(mfccs_scaled_features,verbose=0), axis=-1)
  print(model_name.predict(mfccs_scaled_features,verbose=0))
  prediction_class = labelencoder.inverse_transform(predicted_label) 
  return prediction_class[0]


In [None]:
print(predict_class('/content/drive/MyDrive/audio_data/drilling_sound.wav',model))

[[6.7306638e-12 1.2583318e-07 1.1102114e-05 2.1251966e-05 2.7124744e-10
  9.9994159e-01 5.2374498e-12 2.5314475e-12 1.1015914e-09 2.5993919e-05]]
engine_idling


# Saving the Model and Reusing the weights.

In [None]:
from tensorflow.keras.models import load_model
model_load = load_model('/content/drive/MyDrive/audio_data/audio_classification_1.hdf5')

In [None]:
model_load.summary()

Model: "sequential_3"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 dense_9 (Dense)             (None, 100)               4100      
                                                                 
 activation_9 (Activation)   (None, 100)               0         
                                                                 
 dropout_6 (Dropout)         (None, 100)               0         
                                                                 
 dense_10 (Dense)            (None, 100)               10100     
                                                                 
 activation_10 (Activation)  (None, 100)               0         
                                                                 
 dropout_7 (Dropout)         (None, 100)               0         
                                                                 
 dense_11 (Dense)            (None, 10)               

In [None]:
print(predict_class('/content/dog bark.wav',model_load))

[[1.0703705e-22 4.9438430e-18 9.8636672e-12 1.0000000e+00 1.9900571e-16
  6.3682492e-14 9.1150010e-10 1.7017466e-21 8.4906711e-14 9.4310974e-12]]
dog_bark


In [None]:
!cp saved_models/audio_classification_1.hdf5 /content/drive/MyDrive/audio_data/