Speech Recognition Using Deep Learning

This was my project for the Machine Learning course, during my Master, and it consisted of using deep learning for speech recognition. More specifically, recognizing which word is being played on an audio track.

I tried the experiment using the two main audio features: spectrograms and MFCCs (Mel Frequency Cepstral Coefficients). To run the implementation, first download the dataset (more instructions in the dataset folder) and run one of the prepare_dataset.py files, depending on which feature you want to use. This python script will create a file called data.json in which there are the features that will be used to train the model. Then run the corresponding train.py file to train the model. When it has finished, the model will be saved (I provide two models already trained, model_spectograms.h5 and model_mfccs.h5). Finally, put in the test folder the tracks you want to make predictions about and run the corresponding predictions.py file and change the path to the track in the main function.

Here I show the loss and accuracy curves I got when I did the project.

Curves using spectrograms

Curves using MFCCs

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
dataset		dataset
test		test
README.md		README.md
model_mfccs.h5		model_mfccs.h5
model_spectograms.h5		model_spectograms.h5
predictions_MFCC.py		predictions_MFCC.py
predictions_Spectrogram.py		predictions_Spectrogram.py
prepare_dataset_MFCC.py		prepare_dataset_MFCC.py
prepare_dataset_Spectrogram.py		prepare_dataset_Spectrogram.py
train_MFCC.py		train_MFCC.py
train_Spectrogram.py		train_Spectrogram.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Speech Recognition Using Deep Learning

About

Releases

Packages

Languages

FandosA/Speech_Recognition_Keras_TF

Folders and files

Latest commit

History

Repository files navigation

Speech Recognition Using Deep Learning

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages