# Keras RNN Speech Recognizer

We will build a deep neural network that functions as part of an end-to-end automatic speech recognition (ASR) pipeline. Our completed pipeline will accept raw audio as input and return a predicted transcription of the spoken language. The full pipeline is summarized in the figure below.

<img src="images/pipeline.png">

- **STEP 1:** a pre-processing step that converts raw audio to one of two feature representations that are commonly used for ASR. 
- **STEP 2:** an acoustic model which accepts audio features as input and returns a probability distribution over all potential transcriptions.
- **STEP 3:** in the pipeline takes the output from the acoustic model and returns a predicted transcription. 

Navigate the notebook:
- [LibriSpeech](#librispeech)
- [**STEP 1**](#step1): Acoustic Features for Speech Recognition
- [**STEP 2**](#step2): Deep Neural Networks for Acoustic Modeling
    - [Model 0](#model0): RNN
    - [Model 1](#model1): RNN + TimeDistributed Dense
    - [Model 2](#model2): CNN + RNN + TimeDistributed Dense
    - [Model 3](#model3): Deeper RNN + TimeDistributed Dense
    - [Model 4](#model4): Bidirectional RNN + TimeDistributed Dense
    - [Models 5+](#model5)
    - [Compare the Models](#compare)
    - [Final Model](#final)
- [**STEP 3**](#step3): Obtain Predictions

## LibriSpeech<a id='librispeech'></a>

We begin by investigating the dataset that will be used to train and evaluate your pipeline.  [LibriSpeech](http://www.danielpovey.com/files/2015_icassp_librispeech.pdf) is a large corpus of English-read speech, designed for training and evaluating models for ASR.  The dataset contains 1000 hours of speech derived from audiobooks.  We will work with a small subset in this project, since larger-scale data would take a long while to train. More  data is provided [online](http://www.openslr.org/12/).

In [2]:
! pip install python_speech_features

Collecting python_speech_features
  Downloading https://files.pythonhosted.org/packages/ff/d1/94c59e20a2631985fbd2124c45177abaa9e0a4eee8ba8a305aa26fc02a8e/python_speech_features-0.6.tar.gz
Building wheels for collected packages: python-speech-features
  Running setup.py bdist_wheel for python-speech-features ... [?25ldone
[?25h  Stored in directory: /home/yungshun/.cache/pip/wheels/3c/42/7c/f60e9d1b40015cd69b213ad90f7c18a9264cd745b9888134be
Successfully built python-speech-features
Installing collected packages: python-speech-features
Successfully installed python-speech-features-0.6
[33mYou are using pip version 18.0, however version 18.1 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.[0m


In [4]:
! pip install librosa

Collecting librosa
[?25l  Downloading https://files.pythonhosted.org/packages/09/b4/5b411f19de48f8fc1a0ff615555aa9124952e4156e94d4803377e50cfa4c/librosa-0.6.2.tar.gz (1.6MB)
[K    100% |████████████████████████████████| 1.6MB 3.4MB/s ta 0:00:01
[?25hCollecting audioread>=2.0.0 (from librosa)
  Downloading https://files.pythonhosted.org/packages/f0/41/8cd160c6b2046b997d571a744a7f398f39e954a62dd747b2aae1ad7f07d4/audioread-2.1.6.tar.gz
Collecting resampy>=0.2.0 (from librosa)
[?25l  Downloading https://files.pythonhosted.org/packages/14/b6/66a06d85474190b50aee1a6c09cdc95bb405ac47338b27e9b21409da1760/resampy-0.2.1.tar.gz (322kB)
[K    100% |████████████████████████████████| 327kB 4.3MB/s ta 0:00:01
[?25hCollecting numba>=0.38.0 (from librosa)
[?25l  Downloading https://files.pythonhosted.org/packages/3c/3f/a63776ed98617c3af6187c4955779a013ad9e2f36280415d23503366c1ba/numba-0.40.1-cp35-cp35m-manylinux1_x86_64.whl (3.2MB)
[K    100% |████████████████████████████████| 3.2MB 2.8MB/s ta 

In [6]:
! pip install soundfile

Collecting soundfile
  Downloading https://files.pythonhosted.org/packages/68/64/1191352221e2ec90db7492b4bf0c04fd9d2508de67b3f39cbf093cd6bd86/SoundFile-0.10.2-py2.py3-none-any.whl
Collecting cffi>=1.0 (from soundfile)
[?25l  Downloading https://files.pythonhosted.org/packages/59/cc/0e1635b4951021ef35f5c92b32c865ae605fac2a19d724fb6ff99d745c81/cffi-1.11.5-cp35-cp35m-manylinux1_x86_64.whl (420kB)
[K    100% |████████████████████████████████| 430kB 2.1MB/s ta 0:00:011
[?25hCollecting pycparser (from cffi>=1.0->soundfile)
[?25l  Downloading https://files.pythonhosted.org/packages/68/9e/49196946aee219aead1290e00d1e7fdeab8567783e83e1b9ab5585e6206a/pycparser-2.19.tar.gz (158kB)
[K    100% |████████████████████████████████| 163kB 1.5MB/s ta 0:00:01
[?25hBuilding wheels for collected packages: pycparser
  Running setup.py bdist_wheel for pycparser ... [?25ldone
[?25h  Stored in directory: /home/yungshun/.cache/pip/wheels/f2/9a/90/de94f8556265ddc9d9c8b271b0f63e57b26fb1d67a45564511
Success

In [9]:
from data_generator import vis_train_features