-
Notifications
You must be signed in to change notification settings - Fork 6.8k
Performance of deep speech is bad #6418
Comments
train only on clean-360 can be bad. I'm training on the whole(train+train_360+train_500) dataset and got results cer 0.362836,0.207333,0.173531,0.156486 on epoch 0,1,2,3 for the test-clean and for the 960 hours of train (sample every 100 batch)0.5152, 0.2408,0.1883,0.1622.And it is slow for now because it doesn't use rnn fused cell or variable length approach, but we are working on for the improvement. |
And there is deepspeech.cfg for real use and default.cfg is simplified version just for the limited sample. |
@Soonhwan-Kwon , @piiswrong |
@Soonhwan-Kwon , after the sorted epoch, the random epochs get no much improvement in my training as shown in the figure(4 epochs including the first for sortagrad epoch). Can you give me some advice, please? |
I found that baidu's flac to wav script has bug when it has short duration, it generates repeated sound file as many 6 times or more. For example sound file was like "Hello is there" then it generates "Hello is there Hello is there Hello is there Hello is there Hello is there Hello is there". And it makes label wrong, and can affect model's performance(both speed and accuracy). So I changed flac to wav converter from avconv to sox, and I found it generates identical outputs but fixes bug. I put fix of fiac_to_wav.sh on ai-adv-lab/deepspeech.mxnet@ee99375 for now. But I am middle of testing because found this bug yesterday with my team member, it may takes time to confirm it doesn't make any unintended effect on model. |
@xinq2016 I pulled request but it is not permitted yet. Please see https://github.com/samsungsds-rnd/deepspeech.mxnet for now. And there is a bi-graphemes option to deal with label in bi-graphemes, and the last result of 15% accuracy was using that option. |
This issue is closed due to lack of activity in the last 90 days. Feel free to ping me to reopen if this is still an active issue. Thanks! |
@Soonhwan-Kwon Could you do me a favor to give me guidance of drawing performance curve like you did? |
@megaSpoon have a look at mxboard https://github.com/awslabs/mxboard |
Environment info
Operating System: ubuntu 14.04
GPU: GTX 1080
Package used (Python/R/Scala/Julia): python
MXNet version: 0.9.5
Or if installed from source: git clone https://github.com/dmlc/mxnet.git ~/mxnet --recursive
Error Message:
I just follow the default configure script in the example of speech recognition with the corpus of librispeech (clean-360 subset).
Minimum reproducible example
use the default.cfg to train the deep speech model.
Steps to reproduce
What have you tried to solve it?
The text was updated successfully, but these errors were encountered: