-
Notifications
You must be signed in to change notification settings - Fork 621
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pre-trained models tracker #85
Comments
I was planning on adding sortagrad back in before the training if that seems reasonable. Definitely seems to help with convergence. I'll take on an4 and LibriSpeech. |
@ryanleary definitely, does #83 work well for you? Not sure if you had time to test this, it seems like a better solution regarding memory usage! Got some time now to test, so will report back EDIT: pulling the branch in now, it does a fair job in keeping the memory usage low by bucketing the similar sized utterances and sampling out of this instead! Will update the master branch as soon as changes are addressed. |
This model is kind of large for an4. Having difficulty getting it to converge. Were you able to get it to converge in the past? |
@ryanleary I'll check once I'm back home, but I have gotten the full architecture to converge (albeit not the best score possible). |
Thats true... I'll try this as soon as I can! Just fyi easiest place to contact me directly will probably be the PyTorch slack channel... send me a direct message there if you need me ASAP! If you also need an invite, feel free to send me an email on my github email. |
Kicked off a 1000 hr libri training. Will know later tonight if convergence looks promising. Will probably take at least a few days to converge since I only have 2x Titan Xs for it. |
Just handling progress on this, currently blocking this for an updated architecture that is more suited for production environments and the size of datasets that we are dealing with; the current architecture is slightly too large! |
Currently playing at around 40M parameters with these parameters:
|
I've updated the params after speaking to @ryanleary! will try getting in training for the tedlium corpus. |
If it's possible, could the people training the models also plot the loss on the validation set? I'm curious to see if it's just me that's getting this negative correlation between the WER and the validation loss (issue #78). |
AN4 model is complete. Librispeech is still in progress. Below are the current evaluations:
|
@ryanleary Any updates on the librispeech training? |
I stopped the training after 44 epochs due to diminishing returns. I think the training may have slowed due to #100. Will probably retrain at some point in the future, but the model is good enough for now.
|
@ryanleary thanks! What is libri-val/libri-test? Not sure which test sets these are |
|
for libriik test-clean whats your validation set ? whats your training parameters? for getting this result libri1k | test-clean | 5x800gru | 11.546 | 3.538 .How much time taken for it to converge? How many epochs you trained for achieving this? |
They're 5 layers of 800-dim GRU bidirectional RNNs. Everything else is more or less default. The libri1k trained for 44 epochs, which was several days on 2x Titan X GPUs. The combined |
Coming to the end of training a model on TEDLium, then will start voxforge! |
I've removed voxforge from the pretrained nets, and propose we do a combination of all open sourced dataset train instead due to not having its own validation dataset. @ryanleary any chance you could send me your trained model on slack so I can verify that they work on the master branch, then create a release to put the pre-trained networks up? |
@ryanleary is the result obtained from greedy decoder? |
hi, when will the pre-trained networks be released? |
@ybzhou yes. |
Models are now provided under releases here. Hopefully a production model based on the formatted data sources in this library will be training shortly. Huge thanks to @ryanleary for training most of these models :) |
On each of the datasets provided, we must train a Deepspeech model. The overall architecture is encompassed in this command:
In the above command you must replace the manifests paths with the correct paths to the dataset. A few notes:
--learning anneal
and setting it to a smaller value, like1.01
. For larger datasets, the default is fine (up to around 4.5k hours from internal testing on the deepspeech.torch version)A release will be cut from the DeepSpeech package that will have the models, and a reference to the latest release added to the README to find latest models!
Progress tracker for datasets:
Let me know if you plan on working on running any of these, and I'll update the ticket with details!
The text was updated successfully, but these errors were encountered: