<a href="https://colab.research.google.com/github/Urvashi2311/ML_Projects/blob/main/2_spkr_verification_GMM.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
import os
import numpy as np
import librosa
import sklearn
from sklearn.mixture import GaussianMixture

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


We will first build a GMM for the speaker using the given training samples. We use 39 dimensional MFCCs as features.

In [None]:
read_path = "/content/drive/MyDrive/Speech_Lab/WAV/spk_train/"  # Path of wav files

spkrGMM = GaussianMixture(n_components=1,covariance_type='full')
indir = read_path
os.chdir(indir)

MFCCspkr = np.empty([0,39]) # Create empty array to hold MFCCs

for root, dirs, filenames in os.walk(indir):
    for f in sorted(filenames):
        
        y,sr=librosa.load(f, sr=8000, mono=True) # Reading the wav file
        mfccs = librosa.feature.mfcc(y=y, sr=sr,n_fft=240,hop_length=60, n_mfcc=13) # Compute MFCC
        mfcc_delta = librosa.feature.delta(mfccs, order=1) # Delta
        mfcc_delta2 = librosa.feature.delta(mfccs, order=2) # Delta-delta
        MFCCs=np.concatenate((mfccs,mfcc_delta,mfcc_delta2),axis=0) # Append with original. 13+13+13=39 dimensional
        MFCCs = np.transpose(MFCCs) # Just to get 39 dimensions as coloumns, a standard practice.

        MFCCspkr = np.concatenate((MFCCspkr,MFCCs),axis=0)
        print(MFCCspkr.shape) # Shape after appending current file

spkrGMM.fit(MFCCspkr)

(3864, 39)
(7728, 39)
(11592, 39)
(14537, 39)
(17482, 39)
(20427, 39)
(23925, 39)
(27423, 39)
(30921, 39)
(34132, 39)
(37343, 39)
(40554, 39)
(43331, 39)
(46108, 39)
(48885, 39)
(50802, 39)
(52719, 39)
(54636, 39)
(57548, 39)
(60460, 39)
(63372, 39)
(66733, 39)


GaussianMixture()

Build a GMM for Imposter class. It contains samples from 4 other speakers.

In [None]:
read_path = "/content/drive/MyDrive/Speech_Lab/WAV/Neg_train/"  # Path of wav files

ImposterGMM = GaussianMixture(n_components=1,covariance_type='full')
indir = read_path
os.chdir(indir)

MFCCimposter = np.empty([0,39]) # Create empty array to hold MFCCs

for root, dirs, filenames in os.walk(indir):
    for f in sorted(filenames):
        
        y,sr=librosa.load(f, sr=8000, mono=True) # Reading the wav file
        mfccs = librosa.feature.mfcc(y=y, sr=sr,n_fft=240,hop_length=60, n_mfcc=13) # Compute MFCC
        mfcc_delta = librosa.feature.delta(mfccs, order=1) # Delta
        mfcc_delta2 = librosa.feature.delta(mfccs, order=2) # Delta-delta
        MFCCs=np.concatenate((mfccs,mfcc_delta,mfcc_delta2),axis=0) # Append with original. 13+13+13=39 dimensional
        MFCCs = np.transpose(MFCCs) # Just to get 39 dimensions as coloumns, a standard practice.

        MFCCimposter = np.concatenate((MFCCimposter,MFCCs),axis=0)
        print(MFCCimposter.shape) # Shape after appending current file
        
ImposterGMM.fit(MFCCimposter)

(3063, 39)
(6126, 39)
(9189, 39)
(12241, 39)
(15293, 39)
(17810, 39)
(20327, 39)
(22844, 39)
(24512, 39)
(26180, 39)
(29181, 39)
(32182, 39)
(35183, 39)
(38039, 39)
(40895, 39)
(44031, 39)
(47167, 39)
(50303, 39)
(52949, 39)
(55595, 39)


GaussianMixture()

Now let's test the speaker verification system. We will pass +ve test samples and see if it gets recognized correctly. Also, we will pass -ve test samples (samples from other speakers) to check if they are rejected correctly.
We will take the decision based on the likelyhood given by the GMM.

In [None]:
test_path = "/content/drive/MyDrive/Speech_Lab/WAV/spk_test1/"  # Path of wav files
print('Testing for +ve samples: ')
indir = test_path
os.chdir(indir)

for root, dirs, filenames in os.walk(indir):
    for f in sorted(filenames):
        
        y,sr=librosa.load(f, sr=8000, mono=True) # Reading the wav file
        mfccs = librosa.feature.mfcc(y=y, sr=sr,n_fft=240,hop_length=60, n_mfcc=13) # Compute MFCC
        mfcc_delta = librosa.feature.delta(mfccs, order=1) # Delta
        mfcc_delta2 = librosa.feature.delta(mfccs, order=2) # Delta-delta
        MFCCs=np.concatenate((mfccs,mfcc_delta,mfcc_delta2),axis=0) # Append with original. 13+13+13=39 dimensional
        MFCCs = np.transpose(MFCCs) # Just to get 39 dimensions as coloumns, a standard practice.

        ll1 = np.mean(spkrGMM.score_samples(MFCCs))
        ll2 = np.mean(ImposterGMM.score_samples(MFCCs))
        print('Filename,',f,'   Likelyhood for speaker=',ll1,',    Likelyhood for Imposter=',ll2 )
        if ll1>=ll2:
          print('Recognized as Speaker')
        else:
          print('Recognized as Imposter !!!!!')

Testing for +ve samples: 
Filename, 1kc.wav    Likelyhood for speaker= -115.10947079177323 ,    Likelyhood for Imposter= -121.11675298761767
Recognized as Speaker
Filename, 1la.wav    Likelyhood for speaker= -117.95503330480868 ,    Likelyhood for Imposter= -123.63402332253115
Recognized as Speaker
Filename, 1lb.wav    Likelyhood for speaker= -116.158285651061 ,    Likelyhood for Imposter= -121.1074958831578
Recognized as Speaker
Filename, 1lc.wav    Likelyhood for speaker= -118.51661384018593 ,    Likelyhood for Imposter= -124.07577877431314
Recognized as Speaker
Filename, 1ma.wav    Likelyhood for speaker= -116.9140817235253 ,    Likelyhood for Imposter= -121.64326563868909
Recognized as Speaker
Filename, 1mb.wav    Likelyhood for speaker= -115.99672844929121 ,    Likelyhood for Imposter= -120.91134037707889
Recognized as Speaker
Filename, 1mc.wav    Likelyhood for speaker= -117.45752592872563 ,    Likelyhood for Imposter= -122.24062265941261
Recognized as Speaker
Filename, 1na.wav  

Observe that the likelyhood(+ve samples)>likelyhood(-ve samples). Hence system identifies all test samples correctly !!

In [None]:
test_path = "/content/drive/MyDrive/Speech_Lab/WAV/Neg_test1/"  # Path of wav files
print('Testing for Imposter samples: ')
indir = test_path
os.chdir(indir)

for root, dirs, filenames in os.walk(indir):
    for f in sorted(filenames):
        
        y,sr=librosa.load(f, sr=8000, mono=True) # Reading the wav file
        mfccs = librosa.feature.mfcc(y=y, sr=sr,n_fft=240,hop_length=60, n_mfcc=13) # Compute MFCC
        mfcc_delta = librosa.feature.delta(mfccs, order=1) # Delta
        mfcc_delta2 = librosa.feature.delta(mfccs, order=2) # Delta-delta
        MFCCs=np.concatenate((mfccs,mfcc_delta,mfcc_delta2),axis=0) # Append with original. 13+13+13=39 dimensional
        MFCCs = np.transpose(MFCCs) # Just to get 39 dimensions as coloumns, a standard practice.

        ll1 = np.mean(spkrGMM.score_samples(MFCCs))
        ll2 = np.mean(ImposterGMM.score_samples(MFCCs))
        print('Filename,',f,'   Likelyhood for speaker=',ll1,',    Likelyhood for Imposter=',ll2 )
        if ll1>=ll2:
          print('Recognized as Speaker')
        else:
          print('Recognized as Imposter !!!!!')

Testing for Imposter samples: 
Filename, 2bc.wav    Likelyhood for speaker= -127.99355321192513 ,    Likelyhood for Imposter= -116.16618017709612
Recognized as Imposter !!!!!
Filename, 2ca.wav    Likelyhood for speaker= -129.8810130496317 ,    Likelyhood for Imposter= -117.06760390505507
Recognized as Imposter !!!!!
Filename, 2cb.wav    Likelyhood for speaker= -132.09783452400708 ,    Likelyhood for Imposter= -117.73639290334121
Recognized as Imposter !!!!!
Filename, 2cc.wav    Likelyhood for speaker= -129.30893718787289 ,    Likelyhood for Imposter= -116.8599996677105
Recognized as Imposter !!!!!
Filename, 3bc.wav    Likelyhood for speaker= -125.1645842352801 ,    Likelyhood for Imposter= -121.63888023922718
Recognized as Imposter !!!!!
Filename, 3ca.wav    Likelyhood for speaker= -127.89832297041318 ,    Likelyhood for Imposter= -123.67718713685281
Recognized as Imposter !!!!!
Filename, 3cb.wav    Likelyhood for speaker= -125.77110105332306 ,    Likelyhood for Imposter= -121.35945218