## Speech recognition using MFCC and DTW

In this experiment, mfcc vectors are extracted from the audio signals and are compared using DTW (euclidean distance). Here, it has been assumed that a single audio recording for each word has been stored in a database of the speech recognition system. In this demonstration, a single word, "go"(go1.wav) has been stored. This is compared against the audio recording.

A proof of concept has been demonstrated in "./dtw_mfcc.ipynb".

In [None]:
## Importing necessary libraries

from fastdtw import fastdtw as dtw
from scipy.spatial.distance import euclidean
import numpy as np
from scipy.io import wavfile as wav
import IPython.display as ipd

In [None]:
## load first audio file. This is the pre loaded audio file used for comparison for the word "go"

fs,go1 = wav.read("go1.wav")

In [None]:
## load second audio file

fs,go2 = wav.read("go2.wav")

In [None]:
## load third audio file

fs,right1 = wav.read("right1.wav")

### Play the audio files 

In [None]:
ipd.Audio("go1.wav")

In [None]:
ipd.Audio("go2.wav")

In [None]:
ipd.Audio("right1.wav")

In [None]:
## Normalize the audio sequences

go1 = ((go1 - go1.min()) / (go1.max() - go1.min()))
go2 = ((go2 - go2.min()) / (go2.max() - go2.min()))
right1 = ((right1 - right1.min()) / (right1.max() - right1.min()))

In [None]:
## calculate their euclidean DTW distance

dist1,_ = dtw(go1,go2,dist = euclidean)
print(dist)

In [None]:
## calculate their euclidean DTW distance

dist2,_ = dtw(go1,right1,dist = euclidean)
print(dist)

In [None]:
## set a threshold for comparison

thresh = 1000

### Compare the distances with the threshold

**logic**: 

    if(thresh > DTW_distance) then word_spoken = "go"
    
    else 
    
        word_spoken != "go" 

In [None]:
if(dist1 < thresh):
    word_spoken_go = True
    print("word spoken is go")
else:
    word_spoken_go = False
    print("word spoken is not go")

In [None]:
if(dist2 < thresh):
    word_spoken_go = True
    print("word spoken is go")
else:
    word_spoken_go = False
    print("word spoken is not go")