Skip to content

Qkvad/SpeechRecognition

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

40 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Speech Recognition

ToDo

  • Jesu li dva zapisa iste/razlicite rijeci jednake duljine?
  • @ivceh - SVD
  • @Qkvad & @sandrolovnicki - TensorFlow
  • Android App (optional)
  • Hidden Markov Models

data

http://download.tensorflow.org/data/speech_commands_v0.02.tar.gz


tf

In this directory, we implement a tensorflow approach to the problem with multiple different models, each susceptible to further tweaks. Source code will be in src/ directory and is the most important and richest part. Additionally, we provide logs/ folder in which training logs and checkpoints are being saved during each training, enabling to proceed training from arbitrary checkpoint step. In models/ directory we provide models thrat we created and could be used to predict certain speech commands.

example usage

preprocess

To visualize what the data is going through during preprocess, before it enters the convolutional neural network, run, for example:

python tf/src/preprocess.py --input_wav=data/house/0a2b400e_nohash_0.wav

cnn

First off, we need to run training script which will download needed dataset in data/ folder if it is not already there and start training the model. We advise that one first inspects tf/src/train.py file, especially possible parser arguments which could be easily understood and tweaked as desired.

To train a simplest model (neural network with one hidden layer) which is currently default --model_architecture parameter, just run the train script from your anaconda environment:

python tf/src/train.py

note: best performing model is convolutional neural network, but its training could last throught the entire day. To train this model, just call the script like this:

python tf/src/train.py --model_architecture=conv

Next, once we are satisfied with the performance of our model (irregardles of whether model finished its training or you interupted it (just be sure you have labels.txt)), we need to "freeze" our tensorflow graph so we could reuse it for predicting.

python tf/src/freeze.py \
--model_architecture=single_fc \
--start_checkpoint=tf/logs/single_fc/train_progress/single_fc.ckpt-2400 \
--output_file=tf/models/single_fc_032.pb

And finnaly, let's predict something with our graph:

python tf/src/label_wav.py \
--graph=tf/models/single_fc_032.pb \
--wav=data/left/a5d485dc_nohash_0.wav \
--labels=tf/logs/single_fc/train_progress/single_fc_labels.txt

which gives us lousy (as expected) values

off (score = 0.44374)
up (score = 0.22972)
_silence_ (score = 0.16772)

Info: In models/, there is also a "frozen" convolutional network model conv0875.pb we trained, which you can test to make sure it really is performing better than rest, without the need for you to train it for days.

Additionally, you can record your own sounds and test them, just make sure they are 1s long and of right format.
Hint: Try with

arecord test.wav -r 16000 -d 1 -f S16_LE

octave

In this directory we implement simple octave function to get spectrogram from the .wav file.

About

ML project for simple speech commands

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages