<a href="https://colab.research.google.com/github/Dharmin-Shah/Voice_Recognition/blob/main/Voice_Recognition.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#**Voice Recognition Project**
---

### **Introduction**

In the following project, we are trying to recognize the speaker based on the audio file provided.<br>

> For this particular project I have used the [pyAudioAnalysis](https://github.com/tyiannak/pyAudioAnalysis) repo. It has a detailed explanation on how feature extraction can be performed and has many functions that can be used to achieve our goal.




### **Preprocessing**


> We need to import the libraries to be used, download and install the repo and fetch, download and extract the dataset to be used.



 > Imports include basic python libs and machine learning (sklearn)



In [1]:
import os
import random
import shutil

In [2]:
import joblib
import numpy as np

In [3]:
import sqlite3
import datetime
from google.colab import files

In [4]:
from sklearn.svm import SVC
from sklearn.neighbors import KNeighborsClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import confusion_matrix, classification_report



> Installing the repo pyAudioAnalysis



In [6]:
def repo_fetch_clean_install():
  '''
  In this function we fetch the github repo of pyAudioAnalysis and perform
  some clean up 
  '''
  # Remove any previous installation of pyAudioAnalysis
  os.system("rm -rf pyAudioAnalysis/")

  # Fetch the pyAudioAnalysis repo
  os.system("git clone https://github.com/tyiannak/pyAudioAnalysis.git")

  os.system("mv pyAudioAnalysis/pyAudioAnalysis/* pyAudioAnalysis/")
  os.system("rmdir pyAudioAnalysis/pyAudioAnalysis/")

  # Updating requirements.txt to install the latest libraries
  dep = []
  with open('pyAudioAnalysis/requirements.txt','r') as f:
    dep = f.readlines()
    dep = [lib.split('==')[0] for lib in dep]
    print(dep) 
  f.close()

  # Writing the new requirements
  with open('pyAudioAnalysis/requirements.txt','w') as f:
    f.write('\n'.join(dep))
  f.close()

  os.system("pip install -r pyAudioAnalysis/requirements.txt")

In [7]:
repo_fetch_clean_install()

['matplotlib', 'simplejson', 'scipy', 'numpy', 'hmmlearn', 'eyeD3', 'pydub', 'scikit_learn', 'tqdm', 'plotly', '\n']


> Once the repo is installed, import required libraries from it

In [8]:
from pyAudioAnalysis import MidTermFeatures as aT
from pyAudioAnalysis import audioBasicIO



> Fetching, downloading and extracting the dataset



> Before this function is called, it is important to refer to the following link [Kaggle API](https://github.com/Kaggle/kaggle-api#api-credentials) to get the kaggle.json file. Once the file has been downloaded, the process can continue.

In [61]:
def kaggle_dataset_fetch(kgdata):
  '''
  This function will dowanload and unzip the required dataset.
  Initial lines of code are necessary to install the kaggle api and use the
  kaggle.json file to link the dataset.

  ARGS: Kaggle dataset name that needs to be downloaded. This can be obtained by 
         using the 'Copy API Command' from dataset's page. The last argument in the
         command is the dataset name.
  '''

  #Installing kaggle api
  os.system("pip install -q kaggle")

  # Remove any exiting kaggle.json file
  #os.system("rm -f kaggle.json")
  
  # Cleanup
  os.system("rm -r -f ~/.kaggle/")
  os.system("mkdir ~/.kaggle/")
  
  # Uploading the kaggle.json file
  uploaded = files.upload()
  
  if uploaded.get("kaggle.json") == None:
    raise Exception("kaggle.json was not found. Please upload again")
  
  # Granting permissions
  os.system("cp kaggle.json ~/.kaggle/")
  os.system("chmod 600 /root/.kaggle/kaggle.json")

  #Download voice dataset for testing
  os.system(f"kaggle datasets download -d {kgdata}")

  #Unzipping the dataset
  z = os.popen("ls *.zip").read()
  os.system(f"unzip {z}")
  
  #Removing zip file once we unzip it
  os.system(f"rm -f {z}")

In [63]:
kaggle_dataset_fetch("vjcalling/speaker-recognition-audio-dataset")

Saving kaggle.json to kaggle.json


### Data Preparation



>For the purpose of running this project, we are taking a small dataset from the 50 Speakers available.



In [11]:
  # Selecting the data
  x=['Speaker_0005','Speaker_0014','Speaker_0021','Speaker0035','Speaker0042',
     'Speaker0029','Speaker0037']
  # Main directory where all data is present
  ma_dir = "50_speakers_audio_data"
  # Target directory to get the train, test split for the data
  tr_dir = "speaker_data"



> Creation of train, test folders and selecting the files for test data.



In [12]:
def prepare_train_test(data,ma_dir,tr_dir,test=0.3):
  
  '''
  This function will prepare the train, test folders and copy the data required for
  our experiment.

  ARGS:
    - data:   The data for which the train,test data needs to be prepared
    - ma_dir: The unzipped dataset that has all the data
    - tr_dir: The target directory under which the train and test folder need to be
              created.
    - test:   The size of test data to be generated per folder
  '''


  # Creating the directory for the train and test data and copying subdirectories to train data
  for i in data:
    # Copy the wavefiles with structure
    shutil.copytree(f"{ma_dir}/{i}",f"{tr_dir}/train_data/wavfiles/{i}")
    # Copy the structre only for feature extraction
    shutil.copytree(f"{ma_dir}/{i}",f"{tr_dir}/train_data/features/{i}",ignore=shutil.ignore_patterns('*.*'))
    # Copy the structure for test data
    shutil.copytree(f"{ma_dir}/{i}",f"{tr_dir}/test_data/{i}",ignore=shutil.ignore_patterns('*.*'))
  
  # Randomly choosing files from train data to move them to test data
  for speaker in os.listdir(f"{tr_dir}/train_data/wavfiles"):
    pop = os.listdir(f"{tr_dir}/train_data/wavfiles/{speaker}")
    test_size = len(pop) * test
    random.shuffle(pop)
    test_data = pop[-int(test_size):]
    
    # Moving the test data to test folder
    for f in test_data:
      shutil.move(f"{tr_dir}/train_data/wavfiles/{speaker}/{f}",f"{tr_dir}/test_data/{speaker}")

In [13]:
prepare_train_test(x,ma_dir,tr_dir)



> In this section, we use the pyAudioAnalysis' function to perform feature extraction for each of the files present in each of the speakers.





> Features extracted through the process are 136. For further explaination on how these are selected please [read here
](https://hackernoon.com/intro-to-audio-analysis-recognizing-sounds-using-machine-learning-qy2r3ufl)


> The features once extracted are then stored in indivual pickle files for furture use.

In [14]:
def analyze_store_wav(tr_dir,mid_window=1,mid_step=1,
                      short_window=0.05,short_step=0.05):
    '''
    This function performs featrue extraction using pyAudioAnalysis repo and stores
    the features into indivual pickle files for each wavefile

    ARGS:
      tr_dir:       Target directory under which train,test folders are created
      mid_window:   Mid-term window
      mid_step:     Mid-term step
      short_window: Short-term window
      short_step:   Short-term step

    '''
    for lbl in os.listdir(f"{tr_dir}/train_data/wavfiles"):
      path = f"{tr_dir}/train_data/wavfiles/{lbl}"
      print(lbl)
      # Getting the features for each file present in the directory.
      fn,f,mn = aT.directory_feature_extraction(path,mid_window,mid_step,short_window,short_step,compute_beat=False)

      #Writing extracted features to pickle file
      for i in range(len(fn)):  
        f[i] = f[i].split('/')[-1][:-4]
        path = path.replace("wavfiles","features")
        joblib.dump((fn[i],lbl),f"{path}/{f[i]}.pkl")
        

In [15]:
analyze_store_wav(tr_dir,mid_window=1,mid_step=1,short_window=0.05,short_step=0.05)

Speaker_0005
Analyzing file 1 of 51: speaker_data/train_data/wavfiles/Speaker_0005/Speaker_0005_00001.wav
Analyzing file 2 of 51: speaker_data/train_data/wavfiles/Speaker_0005/Speaker_0005_00002.wav
Analyzing file 3 of 51: speaker_data/train_data/wavfiles/Speaker_0005/Speaker_0005_00003.wav
Analyzing file 4 of 51: speaker_data/train_data/wavfiles/Speaker_0005/Speaker_0005_00004.wav
Analyzing file 5 of 51: speaker_data/train_data/wavfiles/Speaker_0005/Speaker_0005_00006.wav
Analyzing file 6 of 51: speaker_data/train_data/wavfiles/Speaker_0005/Speaker_0005_00009.wav
Analyzing file 7 of 51: speaker_data/train_data/wavfiles/Speaker_0005/Speaker_0005_00010.wav
Analyzing file 8 of 51: speaker_data/train_data/wavfiles/Speaker_0005/Speaker_0005_00011.wav
Analyzing file 9 of 51: speaker_data/train_data/wavfiles/Speaker_0005/Speaker_0005_00013.wav
Analyzing file 10 of 51: speaker_data/train_data/wavfiles/Speaker_0005/Speaker_0005_00014.wav
Analyzing file 11 of 51: speaker_data/train_data/wavfile

> Once we have all the pickle files with extracted features, we can generate the X,y for our machine learning algorithms

In [16]:
def get_X_y(tr_dir):

  '''
  This function fetches the features and label from the pickle files created previously

  ARGS: 
    - tr_dir: Target directory under which train, test folders are present

  RETURNS:
    - X: A numpy array of all the features
    - y: A numpy array of all corresponding labels

  '''
  # Fetch each file, store the features in array X and label in y
  X,y = [],[]
  for lbl in os.listdir(f"{tr_dir}/train_data/features"):
    for f in os.listdir(f"{tr_dir}/train_data/features/{lbl}"):
      fn, lb = joblib.load(f"{tr_dir}/train_data/features/{lbl}/{f}")
      X.append(fn)
      y.append(lb)
  
  # Converting the arrays to numpy arrays for better computational efficiency in machine learning
  X = np.array(X)
  y = np.array(y)

  return X,y

In [17]:
X, y = get_X_y(tr_dir)

### Training the model



> For this project we are using SVM, KNN and Random Forest Classifier to train the model



> We will be using the GridSearchCV lib to determine the best parameters for model specified. 
> We are also using Cross Validation to split the train data into train and validation data.

In [18]:
def train_model(X,y,model='svm',cv=5):
  '''
  This function will perform training on the dataset and return the trained model.
  Currently the parameters for each trainer are hard-coded but can be modified.

  ARGS:
    - X,y:   The featrues and labels
    - model: The classifier to be used (svm, knn, random forest)
    - cv:    Cross Validation Folds

  '''
  params = {}

  ###### SVM MODEL #########
  if model == 'svm':
    
    svc = SVC()
    
    params = {
    'C' : [0.001,0.01,0.1,10,100],
    'kernel' : ['rbf', 'linear', 'poly', 'sigmoid']
    }

    svc_grid = GridSearchCV(svc,params,cv=cv)

    svc_grid.fit(X,y)

    print(f"SVC Best Accuracy: {svc_grid.best_score_}")
    print(f"SVC Best Estimator: {svc_grid.best_estimator_}")

    return svc_grid
  
  ###### KNN MODEL #########
  if model == 'knn':

    knn = KNeighborsClassifier()

    params = {
        'n_neighbors': [3, 6, 12, 15]
    }

    knn_grid = GridSearchCV(knn,params,cv=cv)

    knn_grid.fit(X,y)

    print(f"KNN Best Accuracy: {knn_grid.best_score_}")
    print(f"KNN Best Estimator: {knn_grid.best_estimator_}")

    return knn_grid
    
  ###### RANDOM FOREST MODEL #########  
  if model == 'rf':

    rf = RandomForestClassifier()

    params = {
        'n_estimators': [100, 200, 400, 600]
    }

    rf_grid = GridSearchCV(rf,params,cv=cv)

    rf_grid.fit(X,y)

    print(f"Random Forest Best Accuracy: {rf_grid.best_score_}")
    print(f"Random Best Estimator: {rf_grid.best_estimator_}")
    
    return rf_grid

In [19]:
svm = train_model(X,y)
knn = train_model(X,y,model='knn')
rf = train_model(X,y,model='rf')

SVC Best Accuracy: 0.9916666666666666
SVC Best Estimator: SVC(C=10, break_ties=False, cache_size=200, class_weight=None, coef0=0.0,
    decision_function_shape='ovr', degree=3, gamma='scale', kernel='linear',
    max_iter=-1, probability=False, random_state=None, shrinking=True,
    tol=0.001, verbose=False)
KNN Best Accuracy: 0.9833333333333332
KNN Best Estimator: KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
                     metric_params=None, n_jobs=None, n_neighbors=3, p=2,
                     weights='uniform')
Random Forest Best Accuracy: 0.9958333333333333
Random Best Estimator: RandomForestClassifier(bootstrap=True, ccp_alpha=0.0, class_weight=None,
                       criterion='gini', max_depth=None, max_features='auto',
                       max_leaf_nodes=None, max_samples=None,
                       min_impurity_decrease=0.0, min_impurity_split=None,
                       min_samples_leaf=1, min_samples_split=2,
                      

### Saving the model using SQLite

> For the purpose of this project, we are using the SQLite libraries to save our models.

In [20]:
def createTrainTable():

  '''
  This function will create the database and create the table in which we need to
  persist the models.

  DATABASE NAME = Voice_Models
  TABLE NAME = trained_models

  '''
  
  try:
        sqliteConnection = sqlite3.connect('Voice_Models.db')
        cursor = sqliteConnection.cursor()
        print("Connected to SQLite")
        create_query = """CREATE TABLE IF NOT EXISTS trained_models
                          (id INTEGER PRIMARY KEY AUTOINCREMENT,
                           model_name TEXT NOT NULL UNIQUE,
                           model BLOB NOT NULL,
                           accuracy REAL NOT NULL UNIQUE,
                           date TEXT NOT NULL)""";
        cursor.execute(create_query)
        sqliteConnection.commit()
        print("SQLite table created")

        cursor.close()

  except sqlite3.Error as error:
    print("Error while creating a sqlite table", error)
  finally:
    if sqliteConnection:
      sqliteConnection.close()
      print("sqlite connection is closed")


In [21]:
createTrainTable()

Connected to SQLite
SQLite table created
sqlite connection is closed


> A small function to save the model as file and the convert it to a blob that can be stored in the database

In [22]:
def convertBinary(model,model_name):
  
  blobData = ''
  
  joblib.dump(model,f"{model_name}.pkl")
  
  with open(f"{model_name}.pkl", 'rb') as file:
    blobData = file.read()
  
  os.system(f"rm -f {model_name}.pkl")

  return blobData

> Inserting the model in the database

In [23]:
def insertModel(model_name,model):
  
  '''
  This function will insert the model specified into the trained_model table.
  The following details will be saved: 
   - model_name = The name of the model.
   - model = The model which we need to insert
   - accuracy = The best_score attribute of the model
   - date = Current date and time

  '''
  try:
      sqliteConnection = sqlite3.connect('Voice_Models.db')
      cursor = sqliteConnection.cursor()
      print("Connected to SQLite")
      
      insert_query = """ INSERT INTO trained_models
                                (model_name, model, accuracy, date) VALUES (?, ?, ?, ?)"""

      accuracy = model.best_score_
      # Calling the function to save the model in file and return the converted blob
      model_blob = convertBinary(model,model_name)
      
      data_tuple = (model_name, model_blob, accuracy,datetime.datetime.now())
      cursor.execute(insert_query, data_tuple)
      sqliteConnection.commit()
      print("Successfully Inserted")
      cursor.close()

  except sqlite3.Error as error:
      print("Failed to insert blob data into sqlite table", error)
  finally:
      if sqliteConnection:
          sqliteConnection.close()
          print("the sqlite connection is closed")


In [24]:
insertModel('svm',svm)
insertModel('knn',knn)
insertModel('rf',rf)

Connected to SQLite
Successfully Inserted
the sqlite connection is closed
Connected to SQLite
Successfully Inserted
the sqlite connection is closed
Connected to SQLite
Successfully Inserted
the sqlite connection is closed


> A function to write the blob to file from which the model can be loaded

In [25]:
def writeTofile(data, filename):
  '''
  This function will take the model and a filename and write the blob data into
  the file
  '''
    # Convert binary data to proper format and write it on Hard Disk
  with open(f"{filename}.pkl", 'wb') as file:
      file.write(data)
  print("Stored blob data into: ", filename, "\n")
  return f"{filename}.pkl"

In [26]:
def readBlobData(model_name):
  '''
  This function fetch the model that was saved with the model name specified.

  ARG: model_name - the name of the model to be fetched
  RETURNS: Filename of the file created with the model
  '''
  try:
      sqliteConnection = sqlite3.connect('Voice_Models.db')
      cursor = sqliteConnection.cursor()
      print("Connected to SQLite")

      model_query = """SELECT * from trained_models where model_name = ? order by date desc limit 1"""
      cursor.execute(model_query, (model_name,))
      record = cursor.fetchall()
      for row in record:
          #print("Id = ", row[0], "Name = ", row[1])
          model_name = row[1]
          model = row[2]
          accuracy = row[3]
          lastModified = row[4]

          filename = writeTofile(model, model_name)
          
      cursor.close()

  except sqlite3.Error as error:
      print("Failed to read blob data from sqlite table", error)
  finally:
      if sqliteConnection:
          sqliteConnection.close()
          print("sqlite connection is closed")

  return filename

In [27]:
# Getting the model
loaded_svm = joblib.load(readBlobData('svm'))

Connected to SQLite
Stored blob data into:  svm 

sqlite connection is closed


In [28]:
print(svm)

GridSearchCV(cv=5, error_score=nan,
             estimator=SVC(C=1.0, break_ties=False, cache_size=200,
                           class_weight=None, coef0=0.0,
                           decision_function_shape='ovr', degree=3,
                           gamma='scale', kernel='rbf', max_iter=-1,
                           probability=False, random_state=None, shrinking=True,
                           tol=0.001, verbose=False),
             iid='deprecated', n_jobs=None,
             param_grid={'C': [0.001, 0.01, 0.1, 10, 100],
                         'kernel': ['rbf', 'linear', 'poly', 'sigmoid']},
             pre_dispatch='2*n_jobs', refit=True, return_train_score=False,
             scoring=None, verbose=0)


In [29]:
svm.best_estimator_

SVC(C=10, break_ties=False, cache_size=200, class_weight=None, coef0=0.0,
    decision_function_shape='ovr', degree=3, gamma='scale', kernel='linear',
    max_iter=-1, probability=False, random_state=None, shrinking=True,
    tol=0.001, verbose=False)

### Evaluating the model

> We can now proceed to see how our model performs.

In [30]:
def predict_speaker(test_file,model,mid_window=1,mid_step=1,
                    short_window=0.05,short_step=0.05):
  '''
  This function will take the audio file in the wav format and process it to 
  extract the features. 
  Then the model will predict the speaker based on the features and return the 
  predicted value
  '''

  fs, s = audioBasicIO.read_audio_file(test_file)

  mid_features,short_features,_ = aT.mid_feature_extraction(s, fs, mid_window * fs,
                                          mid_step * fs ,
                                          short_window * fs,
                                          short_step * fs)

  # The midterm features returned will be for each window, we need to average 
  # out the features and reshape it to have the desired shape.
  mid_features = mid_features.mean(axis=1).reshape(1,-1)

  pred = model.predict(mid_features)

  return pred[0]

> Testing our model

In [31]:
y, y_pred = [],[]
for speaker in os.listdir(f"{tr_dir}/test_data/"):
  for f in os.listdir(f"{tr_dir}/test_data/{speaker}"):
    
    #print()
    pred = predict_speaker(f"{tr_dir}/test_data/{speaker}/{f}",svm)
    y.append(speaker)
    y_pred.append(pred)


> Viewing the results in a classification report and a confusion matrix

In [32]:
print(classification_report(y,y_pred))

              precision    recall  f1-score   support

 Speaker0029       1.00      0.89      0.94         9
 Speaker0035       1.00      1.00      1.00         9
 Speaker0037       0.89      1.00      0.94        16
 Speaker0042       1.00      1.00      1.00        12
Speaker_0005       1.00      1.00      1.00        21
Speaker_0014       1.00      1.00      1.00        15
Speaker_0021       1.00      0.94      0.97        17

    accuracy                           0.98        99
   macro avg       0.98      0.98      0.98        99
weighted avg       0.98      0.98      0.98        99



In [33]:
print(confusion_matrix(y,y_pred))

[[ 8  0  1  0  0  0  0]
 [ 0  9  0  0  0  0  0]
 [ 0  0 16  0  0  0  0]
 [ 0  0  0 12  0  0  0]
 [ 0  0  0  0 21  0  0]
 [ 0  0  0  0  0 15  0]
 [ 0  0  1  0  0  0 16]]
