-
Notifications
You must be signed in to change notification settings - Fork 620
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AN4 fail to reproduce the reported result #193
Comments
It was trained with 5 layers of 800 node GRUs, probably with the augment flag set, and ran for 97 epochs. |
Thank you for reply! |
Hi, This repository is fantastic!! Worked really well out of the box :) Currently, I'm working on reproducing the pre-trained models results as well. For the an4 dataset we were able to get down to a WER of 12.35 using an LR = 0.0005 and anneal rate of 1.0001. All other hyperparameters (noise, epochs = 70, 5 layer GRU, 800 hidden units, etc.) were kept the same. For the ted dataset we got a WER of somewhere in the 50's using the default hyperparameters (trained for 70 epochs) which is quite far off from the pre-trained model. (can provide more details soon) I'm just getting setup with the librispeech dataset as well so hopefully can try running some training experiments for that as well. Would it be possible for us to put the hyperparameters for the pre-trained or something that gets close to them somewhere for everyone to see? Would be a huge help to everyone! Thank you! |
Sorry for being out the loop here, I'll sync up with @ryanleary and try to solve this issue but may take some time. Worst case I'll retrain models so that we have exact hyper-parameters used! |
Hi, Thanks for helping out with this! Not sure if this helps but here are some results I was able to get:
The numbers for the LibriSpeech other dataset are different than the ones reported on the released models but I'm unclear on why (perhaps we downloaded only a subset of the other dataset?) We basically just ran
Also would it be helpful in terms of reproducibility to fix a default random seed in PyTorch? Just thought that if we are re-training some models it might have fixing that as well (might affect the smaller datasets more than the larger ones). Let me know if there's anything I can do to help! Udit |
@alugupta thanks for your help! The way WER/CER is calculated has changed to match up more with academic standards (but are correct for the release branch and the commit it points to I think). There does seem to be a slight discrepancy between the WER/CER at training time and at testing time, but I'm investigating further. I'll definitely need to update the librispeech script to create separate test scripts for the different test sets libri offer, so any contribution there would be awesome :) I agree with the default random seed, that would definitely help! Will create a ticket to track this. Thanks for your help! |
@SeanNaren As in my issue #200, does current LibriSpeech dataset contain both clean and other? Now, I'm training the network using LibriSpeech in same condition as @alugupta. |
I just set up a new machine and pulled down this repository and the an4 dataset from scratch. GPU was 1x Titan V. Trained with following command: python train.py --train_manifest an4/an4_train_manifest.csv \
--val_manifest an4/an4_val_manifest.csv \
--num_workers 4 \
--cuda \
--learning_anneal 1.01 \
--augment \
--epochs 100 Result:
which beats the previously released model. Be sure to include the I'll leave this open for another day or so, but will probably close since ^^ reproduces the result. |
Also @alugupta I think what you're calling "Librispeech Other" is actually the combined |
@ryanleary Oh right! The numbers I reported earlier for Librispeech other are actually LibriSpeech clean + other. That makes sense then. So model from earlier should more or less be similar to the pretrained model. Thanks for also rerunning the an4 dataset! @Minju-Jung I guess that also answer your question in that by default the dataset is the combined clean and other. If you specified one or the other when pre-processing them then it could be different. @SeanNaren For the separate test-scripts for other and clean are you just imagining that in the pre-processing steps you partition saving the other and clean subsets separately as default and then have 3 sets of manifests: clean, other, combined? I could perhaps contribute this if that helps (it might be a while as I'll be away for the coming week). |
Based on issue #85, I tried to reproduce the reported result of AN4 by following command.
python train.py --cuda --visdom --learning_anneal 1.01 --train_manifest data/an4_train_manifest.csv --val_manifest data/an4_val_manifest.csv
And I got the following result.
![result](https://user-images.githubusercontent.com/20314416/33696369-4e9c135c-db45-11e7-9649-225e36165ee3.png)
The best result is
But, it is still worse than the reported result.
Dataset | WER | CER
AN4 test | 10.52 | 4.78
Could you kindly let me know how can I improve the result?
The text was updated successfully, but these errors were encountered: