# Speech recognition

Simple speech recognition system can be implemented using DTW.

This notebook is inspired by [Rouanet DTW library example](http://nbviewer.jupyter.org/github/pierre-rouanet/dtw/blob/master/speech-recognition.ipynb)

We will use a simple [database](https://www.dropbox.com/s/c12fmsctfwwov5d/sounds.zip) composed of 12 french words pronounced about 25 times by different speakers.

In [140]:
import librosa
from dtw import dtw
import matplotlib as plt
import numpy as np
import glob

%matplotlib inline

### Loading Data

In [154]:
%%time

y = []
with open('sounds/wavToTag.txt') as f:
    y = list([l.replace('\n', '') for l in f.readlines()])

X = []
for i in range(len(y)):
    x, sample_rate = librosa.load("sounds/{}.wav".format(i))
    X.append(x)

CPU times: user 7.05 s, sys: 20 ms, total: 7.07 s
Wall time: 7.07 s


### Processing

In [155]:
n_window_samples = int(sample_rate * 2 * 10**(-3))

def reshape_sound(x):
    # reshape into windows of width of 20 ms
    
    new_len = np.floor_divide(x.shape[0], n_window_samples) * n_window_samples
    x = x[0:new_len]
    x = x.reshape((n_window_samples, -1), order='F')
    return x

In [156]:
X = [reshape_sound(x) for x in X]

### Define groundtruth data

In [157]:
gt = dict()

unique_labels = set(labels)
for l in unique_labels:
    idx = labels.index(l)
    labels.pop(idx)
    x = X.pop(idx)
    gt[l] = x

### Testing it!

In [189]:
import operator
from pprint import pprint

idx = 95
x = X[idx]

print("Original label: [{}]".format(labels[idx]))
results = {}
for label, ground in gt.items():
    cost, path = librosa.dtw(x, ground)
    min_cost = 0
    for p in path:
        min_cost += cost[p[0], p[1]]
    results[label] = int(min_cost)
    
pprint(sorted(results.items(), key=operator.itemgetter(1)))

Original label: [gants]
[('manette', 36195),
 ('gants', 37033),
 ('biere', 38784),
 ('chaussure', 44640),
 ('sofoot', 47688),
 ('zidane', 49347),
 ('jeuvideo', 51936),
 ('stade', 53277),
 ('girondins', 55749),
 ('beckham', 56872),
 ('cocacola', 58623),
 ('ballon', 97592)]
