Deep Neural Network Speech Recognition

In this project we built a deep neural network that functions as part of an end-to-end automatic speech recognition (ASR) pipeline. The full pipeline is summarized in the figure below.

Content

Deep Neural Network Speech Recognition

Description

The pipeline will accept raw audio as input and make a pre-processing step that converts raw audio to one of two feature representations that are commonly used for ASR (Spectrogram or MFCCs) in this project we've used a Convolutional Layer to extract features. Then these features are fed into an acoustic model which accepts audio features as input and returns a probability distribution over all potential transcriptions. The last step is that the pipeline takes the output from the acoustic model and returns a predicted transcription.

What To Improve

We should be able to get better performance on both training and validation set.

Methods to decrease the error :

Try getting larger dataset.
Try adding language model after the acoustic model.
Try training for more epochs >20.
Try deeper neural network or pre-trained network.
Try using another type of RNNs like LSTM, or GRU

Prerequisites

This project uses keras framework follow the commands below to install it appropriately

Install Keras using pip

pip install Keras

Install Keras using conda

conda install -c conda-forge keras

Network Architecture

used a 1D convolutional layer to extract features and added BatchNormalization layer after each layer to speed up learning process, a dropout layers to prevent the model from overfitting then used a combination of Bidirectional + SimpleRNNs; the reason why i chose SimpleRNNs as it was so fast compared to GRU and LSTM.
The output of the acoustic model is connected to a softmax function to predict the probability of transcriptions.

feel free to take a look at final model in sample_models.py

Optimizer and Loss Function

we trained the acoustic model with the CTC loss along with SGD optimizer with learning rate 0.02.

def add_ctc_loss(input_to_softmax):

    the_labels    = Input(name='the_labels',
                          shape=(None,), dtype='float32')
    input_lengths = Input(name='input_length',
                          shape=(1,), dtype='int64')
    label_lengths = Input(name='label_length',
                          shape=(1,), dtype='int64')

    output_lengths = Lambda(input_to_softmax.output_length)(input_lengths)

    # CTC loss is implemented in a lambda layer

    loss_out = Lambda(ctc_lambda_func, output_shape=(1,), name='ctc')(
        [input_to_softmax.output, the_labels, output_lengths, label_lengths])

    model = Model(
        inputs=[input_to_softmax.input, the_labels,
                input_lengths, label_lengths],
        outputs=loss_out)

    return model

Authors

Ahmed Abd-Elbakey Ghonem - Github

Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

Please make sure to update tests as appropriate.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
__pycache__		__pycache__
images		images
results		results
.gitignore		.gitignore
char_map.py		char_map.py
data_generator.py		data_generator.py
readme.md		readme.md
sample_models.py		sample_models.py
train_corpus.json		train_corpus.json
train_utils.py		train_utils.py
utils.py		utils.py
valid_corpus.json		valid_corpus.json
vui_notebook.ipynb		vui_notebook.ipynb
workspace-utils.py		workspace-utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Deep Neural Network Speech Recognition

Content

Description

What To Improve

Methods to decrease the error :

Prerequisites

Install Keras using pip

Install Keras using conda

Network Architecture

Optimizer and Loss Function

Authors

Contributing

About

Releases

Packages

Contributors 2

Languages

3ba2ii/DNN-Speech-Recognition

Folders and files

Latest commit

History

Repository files navigation

Deep Neural Network Speech Recognition

Content

Description

What To Improve

Methods to decrease the error :

Prerequisites

Install Keras using pip

Install Keras using conda

Network Architecture

Optimizer and Loss Function

Authors

Contributing

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages