ThomasDelteil and indhub Updates to several examples (#13068)
* Minor updates to several examples

* fix typo

* update following review
Latest commit 012288f Nov 8, 2018
Permalink
Type Name Latest commit message Commit time
..
Failed to load latest commit information.
resources ADD speech_recognition example (#5954) Apr 24, 2017
README.md Updates to several examples (#13068) Nov 8, 2018
arch_deepspeech.py Add license header (#7379) Aug 8, 2017
config_util.py Add license header (#7379) Aug 8, 2017
deepspeech.cfg Fix speech recognition example (#12291) Aug 30, 2018
default.cfg Fix speech recognition example (#12291) Aug 30, 2018
flac_to_wav.sh Add license header (#7379) Aug 8, 2017
label_util.py Updates to several examples (#13068) Nov 8, 2018
log_util.py Updates to several examples (#13068) Nov 8, 2018
main.py Updates to several examples (#13068) Nov 8, 2018
singleton.py Updates to several examples (#13068) Nov 8, 2018
stt_bi_graphemes_util.py Add license header (#7379) Aug 8, 2017
stt_bucketing_module.py Add license header (#7379) Aug 8, 2017
stt_datagenerator.py Updates to several examples (#13068) Nov 8, 2018
stt_io_bucketingiter.py Add license header (#7379) Aug 8, 2017
stt_io_iter.py Updates to several examples (#13068) Nov 8, 2018
stt_layer_batchnorm.py Add license header (#7379) Aug 8, 2017
stt_layer_conv.py Add license header (#7379) Aug 8, 2017
stt_layer_fc.py Add license header (#7379) Aug 8, 2017
stt_layer_gru.py Add license header (#7379) Aug 8, 2017
stt_layer_lstm.py Add license header (#7379) Aug 8, 2017
stt_layer_slice.py Add license header (#7379) Aug 8, 2017
stt_layer_warpctc.py Add license header (#7379) Aug 8, 2017
stt_metric.py Updates to several examples (#13068) Nov 8, 2018
stt_utils.py Updates to several examples (#13068) Nov 8, 2018
train.py Updates to several examples (#13068) Nov 8, 2018

README.md

deepSpeech.mxnet: Rich Speech Example

This example based on DeepSpeech2 of Baidu helps you to build Speech-To-Text (STT) models at scale using

  • CNNs, fully connected networks, (Bi-) RNNs, (Bi-) LSTMs, and (Bi-) GRUs for network layers,
  • batch-normalization and drop-outs for training efficiency,
  • and a Warp CTC for loss calculations.

In order to make your own STT models, besides, all you need is to just edit a configuration file not actual codes.


Motivation

This example is intended to guide people who want to making practical STT models with MXNet. With rich functionalities and convenience explained above, you can build your own speech recognition models with it easier than former examples.


Environments

  • MXNet version: 0.9.5+
  • GPU memory size: 2.4GB+
  • Install mxboard for logging
pip install mxboard
pip install soundfile
  • Warp CTC: Follow this instruction to compile Baidu's Warp CTC. (Note: If you are using V100, make sure to use this fix)
  • You need to compile MXNet with WarpCTC, follow the instructions here
  • You might need to set LD_LIBRARY_PATH to the right path if MXNet fails to find your libwarpctc.so
  • We strongly recommend that you first test a model of small networks.

How it works

Preparing data

Input data are described in a JSON file Libri_sample.json as followed.

{"duration": 2.9450625, "text": "and sharing her house which was near by", "key": "./Libri_sample/3830-12531-0030.wav"}
{"duration": 3.94, "text": "we were able to impart the information that we wanted", "key": "./Libri_sample/3830-12529-0005.wav"}

You can download two wave files above from this. Put them under /path/to/yourproject/Libri_sample/.

Setting the configuration file

[Notice] The configuration file "default.cfg" included describes DeepSpeech2 with slight changes. You can test the original DeepSpeech2("deepspeech.cfg") with a few line changes to the cfg file:


[common]
...
learning_rate = 0.0003
# constant learning rate annealing by factor
learning_rate_annealing = 1.1
optimizer = sgd
...
is_bi_graphemes = True
...
[arch]
...
num_rnn_layer = 7
num_hidden_rnn_list = [1760, 1760, 1760, 1760, 1760, 1760, 1760]
num_hidden_proj = 0
num_rear_fc_layers = 1
num_hidden_rear_fc_list = [1760]
act_type_rear_fc_list = ["relu"]
...
[train]
...
learning_rate = 0.0003
# constant learning rate annealing by factor
learning_rate_annealing = 1.1
optimizer = sgd
...

Run the example

Train

cd /path/to/your/project/
mkdir checkpoints
mkdir log
python main.py --configfile default.cfg

Checkpoints of the model will be saved at every n-th epoch.

Load

You can (re-) train (saved) models by loading checkpoints (starting from 0). For this, you need to modify only two lines of the file "default.cfg".

...
[common]
# mode can be one of the followings - train, predict, load
mode = load
...
model_file = 'file name of your model saved'
...

Predict

You can predict (or test) audios by specifying the mode, model, and test data in the file "default.cfg".

...
[common]
# mode can be one of the followings - train, predict, load
mode = predict
...
model_file = 'file name of your model to be tested'
...
[data]
...
test_json = 'a json file described test audios'
...

Run the following line after all modification explained above.
python main.py --configfile default.cfg

Train and test your own models

Train and test your own models by preparing two files.

  1. A new configuration file, i.e., custom.cfg, corresponding to the file 'default.cfg'. The new file should specify the items below the '[arch]' section of the original file.
  2. A new implementation file, i.e., arch_custom.py, corresponding to the file 'arch_deepspeech.py'. The new file should implement two functions, prepare_data() and arch(), for building networks described in the new configuration file.

Run the following line after preparing the files.

python main.py --configfile custom.cfg --archfile arch_custom

Further more

You can prepare full LibriSpeech dataset by following the instruction on https://github.com/baidu-research/ba-dls-deepspeech
Change flac_to_wav.sh script of baidu to flac_to_wav.sh in repository to avoid bug

git clone https://github.com/baidu-research/ba-dls-deepspeech
cd ba-dls-deepspeech
./download.sh
cp -f /path/to/example/flac_to_wav.sh ./
./flac_to_wav.sh
python create_desc_json.py /path/to/ba-dls-deepspeech/LibriSpeech/train-clean-100 train_corpus.json
python create_desc_json.py /path/to/ba-dls-deepspeech/LibriSpeech/dev-clean validation_corpus.json
python create_desc_json.py /path/to/ba-dls-deepspeech/LibriSpeech/test-clean test_corpus.json