Tensorflow Speech Recognition

Speech recognition using google's tensorflow deep learning framework, sequence-to-sequence neural networks.

Replaces caffe-speech-recognition, see there for some background.

Ultimate goal

Create a decent standalone speech recognition for Linux etc. Some people say we have the models but not enough training data. We disagree: There is plenty of training data (100GB here and 21GB here on openslr.org , synthetic Text to Speech snippets, Movies with transcripts, Gutenberg, YouTube with captions etc etc) we just need a simple yet powerful model. It's only a question of time...

Sample spectrogram, Karen uttering 'zero' with 160 words per minute.

Getting started

Toy examples: ./number_classifier_tflearn.py ./speaker_classifier_tflearn.py

Some less trivial architectures: ./densenet_layer.py

Later: ./train.sh ./record.py

Partners + collaborators wanted

We are in the process of tackling this project in seriousness. If you want to join the party just drop us an email at info@pannous.com.

Update: Nervana demonstrated that it is possible for 'independents' to build speech recognizers that are state of the art. Update: Mozilla is working on DeepSpeech and just achieved 0% error rate ... on the training set;) Free Speech is in good hands.

Fun tasks for newcomers

Watch video : https://www.youtube.com/watch?v=u9FPqkuoEJ8
Understand and correct the corresponding code: lstm-tflearn.py
Data Augmentation : create on-the-fly modulation of the data: increase the speech frequency, add background noise, alter the pitch etc,...

Extensions

Extensions to current tensorflow which are probably needed:

WarpCTC on the GPU see issue
Incremental collaborative snapshots ('P2P learning') !
Modular graphs/models + persistance

Even though this project is far from finished we hope it gives you some starting points.

Looking for a tensorflow collaboration / consultant / deep learning contractor? Reach out to info@pannous.com

Name		Name	Last commit message	Last commit date
Latest commit History 316 Commits
extra		extra
images		images
layer @ d438cfe		layer @ d438cfe
tensorpeers @ f571827		tensorpeers @ f571827
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
WarpCTC.txt		WarpCTC.txt
__init__.py		__init__.py
bdlstm_utils.py		bdlstm_utils.py
densenet_layer.py		densenet_layer.py
generate_speech_data.py		generate_speech_data.py
lstm-tflearn.py		lstm-tflearn.py
lstm_ctc_to_chars.py		lstm_ctc_to_chars.py
lstm_mfcc_ctc_to_words.py		lstm_mfcc_ctc_to_words.py
lstm_mfcc_to_chars.py		lstm_mfcc_to_chars.py
lstm_to_chars.py		lstm_to_chars.py
mfcc_feature_classifier.py		mfcc_feature_classifier.py
number_classifier_tflearn.py		number_classifier_tflearn.py
number_gan_layer.py		number_gan_layer.py
number_gan_tflearn.py		number_gan_tflearn.py
record-autoencoder.py		record-autoencoder.py
record.py		record.py
requirements.txt		requirements.txt
speaker_classifier_tflearn.py		speaker_classifier_tflearn.py
spectro_gan.py		spectro_gan.py
speech2text-seq2seq.py		speech2text-seq2seq.py
speech2text-tflearn.py		speech2text-tflearn.py
speech_data.py		speech_data.py
speech_encoder.py		speech_encoder.py
subtitle-downloader.py		subtitle-downloader.py
subtitle_srt_parser.py		subtitle_srt_parser.py
wave_GANerate.py		wave_GANerate.py
word_to_phonemes.swift		word_to_phonemes.swift

License

alirezadir/tensorflow-speech-recognition

Folders and files

Latest commit

History

Repository files navigation

Tensorflow Speech Recognition

Ultimate goal

Getting started

Partners + collaborators wanted

Fun tasks for newcomers

Extensions

About

Resources

License

Stars

Watchers

Forks

Languages