Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Performance of deep speech is bad #6418

Closed
xinq2016 opened this issue May 24, 2017 · 11 comments
Closed

Performance of deep speech is bad #6418

xinq2016 opened this issue May 24, 2017 · 11 comments

Comments

@xinq2016
Copy link

xinq2016 commented May 24, 2017

Environment info

Operating System: ubuntu 14.04
GPU: GTX 1080

Package used (Python/R/Scala/Julia): python

MXNet version: 0.9.5

Or if installed from source: git clone https://github.com/dmlc/mxnet.git ~/mxnet --recursive

Error Message:

I just follow the default configure script in the example of speech recognition with the corpus of librispeech (clean-360 subset).

  1. After two epochs, the CER seems to be bad with CER 70%+
    image
    image
    image
    image
  2. It works so slow(it takes a whole week for two epochs of clean-360 subset)

Minimum reproducible example

use the default.cfg to train the deep speech model.

Steps to reproduce

  1. python main.py --python main.py --configfile default.cfg

What have you tried to solve it?

  1. Is there any result/performance you have tested in librispeech?
@piiswrong
Copy link
Contributor

@Soonhwan-Kwon

@Soonhwan-Kwon
Copy link
Contributor

Soonhwan-Kwon commented May 27, 2017

train only on clean-360 can be bad. I'm training on the whole(train+train_360+train_500) dataset and got results cer 0.362836,0.207333,0.173531,0.156486 on epoch 0,1,2,3 for the test-clean and for the 960 hours of train (sample every 100 batch)0.5152, 0.2408,0.1883,0.1622.And it is slow for now because it doesn't use rnn fused cell or variable length approach, but we are working on for the improvement.

@Soonhwan-Kwon
Copy link
Contributor

image

@Soonhwan-Kwon
Copy link
Contributor

Soonhwan-Kwon commented May 27, 2017

And there is deepspeech.cfg for real use and default.cfg is simplified version just for the limited sample.

@xinq2016
Copy link
Author

xinq2016 commented May 31, 2017

@Soonhwan-Kwon , @piiswrong
Many thanks. I will try it again.
There is a small question still.
In the latest deepspeech.cfg, the parameter is_bi_graphemes = True
while the main.py, the unicodemap_en_baidu_bi_graphemes.csv should be loaded. there is no such file.
if language == "en":
if is_bi_graphemes:
try:
labelUtil.load_unicode_set("resources/unicodemap_en_baidu_bi_graphemes.csv")
except:
raise Exception("There is no resources/unicodemap_en_baidu_bi_graphemes.csv. Please set overwrite_meta_files at train section True")
else:
labelUtil.load_unicode_set("resources/unicodemap_en_baidu.csv")
else:
raise Exception("Error: Language Type: %s" % language)

@xinq2016 xinq2016 closed this as completed Jun 6, 2017
@xinq2016 xinq2016 reopened this Jun 27, 2017
@xinq2016
Copy link
Author

xinq2016 commented Jun 27, 2017

@Soonhwan-Kwon , after the sorted epoch, the random epochs get no much improvement in my training as shown in the figure(4 epochs including the first for sortagrad epoch). Can you give me some advice, please?
cer

@Soonhwan-Kwon
Copy link
Contributor

Soonhwan-Kwon commented Jul 8, 2017

I found that baidu's flac to wav script has bug when it has short duration, it generates repeated sound file as many 6 times or more. For example sound file was like "Hello is there" then it generates "Hello is there Hello is there Hello is there Hello is there Hello is there Hello is there". And it makes label wrong, and can affect model's performance(both speed and accuracy). So I changed flac to wav converter from avconv to sox, and I found it generates identical outputs but fixes bug. I put fix of fiac_to_wav.sh on ai-adv-lab/deepspeech.mxnet@ee99375 for now. But I am middle of testing because found this bug yesterday with my team member, it may takes time to confirm it doesn't make any unintended effect on model.

@Soonhwan-Kwon
Copy link
Contributor

@xinq2016 I pulled request but it is not permitted yet. Please see https://github.com/samsungsds-rnd/deepspeech.mxnet for now. And there is a bi-graphemes option to deal with label in bi-graphemes, and the last result of 15% accuracy was using that option.

@szha
Copy link
Member

szha commented Oct 29, 2017

This issue is closed due to lack of activity in the last 90 days. Feel free to ping me to reopen if this is still an active issue. Thanks!
Also, do please check out our forum (and Chinese version) for general "how-to" questions.

@szha szha closed this as completed Oct 29, 2017
@megaSpoon
Copy link

@Soonhwan-Kwon Could you do me a favor to give me guidance of drawing performance curve like you did?

@ThomasDelteil
Copy link
Contributor

@megaSpoon have a look at mxboard https://github.com/awslabs/mxboard

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants