Performance of deep speech is bad #6418

xinq2016 · 2017-05-24T07:55:57Z

Environment info

Operating System: ubuntu 14.04
GPU: GTX 1080

Package used (Python/R/Scala/Julia): python

MXNet version: 0.9.5

Or if installed from source: git clone https://github.com/dmlc/mxnet.git ~/mxnet --recursive

Error Message:

I just follow the default configure script in the example of speech recognition with the corpus of librispeech (clean-360 subset).

After two epochs, the CER seems to be bad with CER 70%+
It works so slow(it takes a whole week for two epochs of clean-360 subset)

Minimum reproducible example

use the default.cfg to train the deep speech model.

Steps to reproduce

python main.py --python main.py --configfile default.cfg

What have you tried to solve it?

Is there any result/performance you have tested in librispeech?

piiswrong · 2017-05-27T04:37:10Z

@Soonhwan-Kwon

Soonhwan-Kwon · 2017-05-27T05:13:04Z

train only on clean-360 can be bad. I'm training on the whole(train+train_360+train_500) dataset and got results cer 0.362836,0.207333,0.173531,0.156486 on epoch 0,1,2,3 for the test-clean and for the 960 hours of train (sample every 100 batch)0.5152, 0.2408,0.1883,0.1622.And it is slow for now because it doesn't use rnn fused cell or variable length approach, but we are working on for the improvement.

Soonhwan-Kwon · 2017-05-27T05:26:47Z

Soonhwan-Kwon · 2017-05-27T05:29:01Z

And there is deepspeech.cfg for real use and default.cfg is simplified version just for the limited sample.

xinq2016 · 2017-05-31T03:05:11Z

@Soonhwan-Kwon , @piiswrong
Many thanks. I will try it again.
There is a small question still.
In the latest deepspeech.cfg, the parameter is_bi_graphemes = True
while the main.py, the unicodemap_en_baidu_bi_graphemes.csv should be loaded. there is no such file.
if language == "en":
if is_bi_graphemes:
try:
labelUtil.load_unicode_set("resources/unicodemap_en_baidu_bi_graphemes.csv")
except:
raise Exception("There is no resources/unicodemap_en_baidu_bi_graphemes.csv. Please set overwrite_meta_files at train section True")
else:
labelUtil.load_unicode_set("resources/unicodemap_en_baidu.csv")
else:
raise Exception("Error: Language Type: %s" % language)

xinq2016 · 2017-06-27T05:53:05Z

@Soonhwan-Kwon ， after the sorted epoch, the random epochs get no much improvement in my training as shown in the figure(4 epochs including the first for sortagrad epoch). Can you give me some advice, please?

Soonhwan-Kwon · 2017-07-08T02:12:18Z

I found that baidu's flac to wav script has bug when it has short duration, it generates repeated sound file as many 6 times or more. For example sound file was like "Hello is there" then it generates "Hello is there Hello is there Hello is there Hello is there Hello is there Hello is there". And it makes label wrong, and can affect model's performance(both speed and accuracy). So I changed flac to wav converter from avconv to sox, and I found it generates identical outputs but fixes bug. I put fix of fiac_to_wav.sh on ai-adv-lab/deepspeech.mxnet@ee99375 for now. But I am middle of testing because found this bug yesterday with my team member, it may takes time to confirm it doesn't make any unintended effect on model.

Soonhwan-Kwon · 2017-07-11T06:44:55Z

@xinq2016 I pulled request but it is not permitted yet. Please see https://github.com/samsungsds-rnd/deepspeech.mxnet for now. And there is a bi-graphemes option to deal with label in bi-graphemes, and the last result of 15% accuracy was using that option.

szha · 2017-10-29T00:26:29Z

This issue is closed due to lack of activity in the last 90 days. Feel free to ping me to reopen if this is still an active issue. Thanks!
Also, do please check out our forum (and Chinese version) for general "how-to" questions.

megaSpoon · 2018-06-07T22:59:37Z

@Soonhwan-Kwon Could you do me a favor to give me guidance of drawing performance curve like you did?

ThomasDelteil · 2019-02-16T00:42:38Z

@megaSpoon have a look at mxboard https://github.com/awslabs/mxboard

xinq2016 closed this as completed Jun 6, 2017

xinq2016 reopened this Jun 27, 2017

Soonhwan-Kwon mentioned this issue Jul 4, 2017

[example]add bucketing and batchnorm scheme for speech_recognition example #6923

Closed

Soonhwan-Kwon mentioned this issue Jul 10, 2017

[example]add bucketing/batchnorm and improved performance for speech_recognition example #6971

Merged

Some-random mentioned this issue Jul 22, 2017

How many epochs does DeepSpeech2 need to converge on LibriSpeech ai-adv-lab/deepspeech.mxnet#6

Closed

szha closed this as completed Oct 29, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance of deep speech is bad #6418

Performance of deep speech is bad #6418

xinq2016 commented May 24, 2017 •

edited

Loading

piiswrong commented May 27, 2017

Soonhwan-Kwon commented May 27, 2017 •

edited

Loading

Soonhwan-Kwon commented May 27, 2017

Soonhwan-Kwon commented May 27, 2017 •

edited

Loading

xinq2016 commented May 31, 2017 •

edited

Loading

xinq2016 commented Jun 27, 2017 •

edited

Loading

Soonhwan-Kwon commented Jul 8, 2017 •

edited

Loading

Soonhwan-Kwon commented Jul 11, 2017

szha commented Oct 29, 2017

megaSpoon commented Jun 7, 2018

ThomasDelteil commented Feb 16, 2019

Performance of deep speech is bad #6418

Performance of deep speech is bad #6418

Comments

xinq2016 commented May 24, 2017 • edited Loading

Environment info

Error Message:

Minimum reproducible example

Steps to reproduce

What have you tried to solve it?

piiswrong commented May 27, 2017

Soonhwan-Kwon commented May 27, 2017 • edited Loading

Soonhwan-Kwon commented May 27, 2017

Soonhwan-Kwon commented May 27, 2017 • edited Loading

xinq2016 commented May 31, 2017 • edited Loading

xinq2016 commented Jun 27, 2017 • edited Loading

Soonhwan-Kwon commented Jul 8, 2017 • edited Loading

Soonhwan-Kwon commented Jul 11, 2017

szha commented Oct 29, 2017

megaSpoon commented Jun 7, 2018

ThomasDelteil commented Feb 16, 2019

xinq2016 commented May 24, 2017 •

edited

Loading

Soonhwan-Kwon commented May 27, 2017 •

edited

Loading

Soonhwan-Kwon commented May 27, 2017 •

edited

Loading

xinq2016 commented May 31, 2017 •

edited

Loading

xinq2016 commented Jun 27, 2017 •

edited

Loading

Soonhwan-Kwon commented Jul 8, 2017 •

edited

Loading