Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pre-trained models tracker #85

Closed
3 tasks done
SeanNaren opened this issue Jun 12, 2017 · 24 comments
Closed
3 tasks done

Pre-trained models tracker #85

SeanNaren opened this issue Jun 12, 2017 · 24 comments
Assignees

Comments

@SeanNaren
Copy link
Owner

SeanNaren commented Jun 12, 2017

On each of the datasets provided, we must train a Deepspeech model. The overall architecture is encompassed in this command:

python train.py  --rnn_type gru --hidden_size 800 --hidden_layers 5 --checkpoint --visdom --train_manifest /path/to/train_manifest.csv --val_manifest /path/to/val_manifest.csv --epochs 100 --num_workers $(nproc) --cuda

In the above command you must replace the manifests paths with the correct paths to the dataset. A few notes:

  • No noise injection for the pre-trained models, or augmentations
  • Train till convergence (should get a nice smooth training curve hopefully!)
  • For smaller datasets, you may need to reduce the learning rate annealing by adding the flag --learning anneal and setting it to a smaller value, like 1.01. For larger datasets, the default is fine (up to around 4.5k hours from internal testing on the deepspeech.torch version)

A release will be cut from the DeepSpeech package that will have the models, and a reference to the latest release added to the README to find latest models!

Progress tracker for datasets:

  • AN4
  • TEDLium
  • LibriSpeech

Let me know if you plan on working on running any of these, and I'll update the ticket with details!

@ryanleary
Copy link
Collaborator

I was planning on adding sortagrad back in before the training if that seems reasonable. Definitely seems to help with convergence.

I'll take on an4 and LibriSpeech.

@SeanNaren
Copy link
Owner Author

SeanNaren commented Jun 12, 2017

@ryanleary definitely, does #83 work well for you? Not sure if you had time to test this, it seems like a better solution regarding memory usage! Got some time now to test, so will report back

EDIT: pulling the branch in now, it does a fair job in keeping the memory usage low by bucketing the similar sized utterances and sampling out of this instead! Will update the master branch as soon as changes are addressed.

@ryanleary
Copy link
Collaborator

This model is kind of large for an4. Having difficulty getting it to converge. Were you able to get it to converge in the past?

@SeanNaren
Copy link
Owner Author

@ryanleary I'll check once I'm back home, but I have gotten the full architecture to converge (albeit not the best score possible).

@ryanleary
Copy link
Collaborator

ryanleary commented Jun 12, 2017

That was, presumably, with the torch version that had batch norm though, right?

image

@SeanNaren
Copy link
Owner Author

Thats true... I'll try this as soon as I can!

Just fyi easiest place to contact me directly will probably be the PyTorch slack channel... send me a direct message there if you need me ASAP! If you also need an invite, feel free to send me an email on my github email.

@ryanleary
Copy link
Collaborator

Kicked off a 1000 hr libri training. Will know later tonight if convergence looks promising. Will probably take at least a few days to converge since I only have 2x Titan Xs for it.

@SeanNaren
Copy link
Owner Author

Just handling progress on this, currently blocking this for an updated architecture that is more suited for production environments and the size of datasets that we are dealing with; the current architecture is slightly too large!

@SeanNaren
Copy link
Owner Author

SeanNaren commented Jun 12, 2017

Currently playing at around 40M parameters with these parameters:

python benchmark.py --rnn_type gru --hidden_size 800 --hidden_layers 5

@SeanNaren
Copy link
Owner Author

I've updated the params after speaking to @ryanleary! will try getting in training for the tedlium corpus.

@SiddGururani
Copy link
Contributor

If it's possible, could the people training the models also plot the loss on the validation set? I'm curious to see if it's just me that's getting this negative correlation between the WER and the validation loss (issue #78).

@ryanleary
Copy link
Collaborator

ryanleary commented Jun 17, 2017

AN4 model is complete. Librispeech is still in progress. Below are the current evaluations:

Corpus Test Set Network WER CER
an4 an4-test 5x800gru 10.521 4.772
libri1k libri-val 5x800gru 20.758 7.787
libri1k libri-test 5x800gru 22.088 8.194
libri1k test-clean 5x800gru 11.546 3.538
libri1k test-other 5x800gru 31.813 12.483

@SiddGururani
Copy link
Contributor

@ryanleary Any updates on the librispeech training?

@ryanleary
Copy link
Collaborator

ryanleary commented Jun 24, 2017

I stopped the training after 44 epochs due to diminishing returns. I think the training may have slowed due to #100. Will probably retrain at some point in the future, but the model is good enough for now.

Corpus Test Set Network WER CER
libri1k libri-val 5x800gru 20.512 7.687
libri1k libri-test 5x800gru 21.686 8.064
libri1k test-clean 5x800gru 11.203 3.362
libri1k test-other 5x800gru 31.312 12.286

@SeanNaren
Copy link
Owner Author

@ryanleary thanks! What is libri-val/libri-test? Not sure which test sets these are

@ryanleary
Copy link
Collaborator

ryanleary commented Jun 24, 2017

libri-val is dev-clean.tar.gz and dev-other.tar.gz combined.
libri-test is test-clean.tar.tz and test-other.tar.gz combined.

@SeanNaren SeanNaren removed the Blocked label Jun 24, 2017
@SeanNaren SeanNaren self-assigned this Jun 24, 2017
@slbinilkumar
Copy link

Corpus Test Set Network WER CER
an4 an4-test 5x800gru 10.521 4.772
libri1k libri-val 5x800gru 20.758 7.787
libri1k libri-test 5x800gru 22.088 8.194
libri1k test-clean 5x800gru 11.546 3.538
libri1k test-other 5x800gru 31.813 12.483

for libriik test-clean whats your validation set ? whats your training parameters? for getting this result libri1k | test-clean | 5x800gru | 11.546 | 3.538 .How much time taken for it to converge? How many epochs you trained for achieving this?

@ryanleary
Copy link
Collaborator

They're 5 layers of 800-dim GRU bidirectional RNNs. Everything else is more or less default. The libri1k trained for 44 epochs, which was several days on 2x Titan X GPUs.

The combined dev-clean and dev-other was used for validation, and the results of evaluation on the dev set is listed as libri-val. test-clean is the 'test set, "clean" speech' from http://www.openslr.org/12/.

@SeanNaren
Copy link
Owner Author

Coming to the end of training a model on TEDLium, then will start voxforge!

@SeanNaren
Copy link
Owner Author

I've removed voxforge from the pretrained nets, and propose we do a combination of all open sourced dataset train instead due to not having its own validation dataset.

@ryanleary any chance you could send me your trained model on slack so I can verify that they work on the master branch, then create a release to put the pre-trained networks up?

@ybzhou
Copy link

ybzhou commented Aug 23, 2017

@ryanleary is the result obtained from greedy decoder?

@daksunt
Copy link

daksunt commented Aug 24, 2017

hi, when will the pre-trained networks be released?

@ryanleary
Copy link
Collaborator

@ybzhou yes.

@SeanNaren
Copy link
Owner Author

Models are now provided under releases here. Hopefully a production model based on the formatted data sources in this library will be training shortly.

Huge thanks to @ryanleary for training most of these models :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants