Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pre-trained models #59

Closed
danielhauagge opened this issue May 12, 2017 · 13 comments
Closed

Pre-trained models #59

danielhauagge opened this issue May 12, 2017 · 13 comments

Comments

@danielhauagge
Copy link

Any pre-trained models available?

@SeanNaren SeanNaren changed the title Pre trained model Pre-trained models May 15, 2017
@SeanNaren SeanNaren reopened this May 15, 2017
@SeanNaren
Copy link
Owner

Currently not available, will get to this as soon as I can :)

@ryanleary
Copy link
Collaborator

I have a decent-ish Libri model I can upload somewhere if you'd like.

@SeanNaren
Copy link
Owner

@ryanleary that would be awesome! Has it been trained on the latest checkpoint system? Would make integration easier

@ryanleary
Copy link
Collaborator

ryanleary commented May 18, 2017

Yes. It's definitely a preliminary model, but somewhat functional. Trained on 11 epochs of 1k hrs libri with augmentation.

Model name:          deepspeech_11.pth.tar
DeepSpeech version:  0.0.1

Recurrent Neural Network Properties
  RNN Type:          lstm
  RNN Layers:        4
  RNN Size:          400
  Classes:           29

Model Features
  Labels:            _'ABCDEFGHIJKLMNOPQRSTUVWXYZ
  Sample Rate:       16000
  Window Type:       hamming
  Window Size:       0.02
  Window Stride:     0.01

Training Information
  Epochs:            11
  Min Loss:          15.670
  Min CER:           8.914
  Min WER:           23.752
Test Set WER CER
clean 14.295 4.391
noisy 35.354 14.285
combined 25.302 9.562

Shall we start up a wiki for this kind of thing as well as other documentation?

@SeanNaren
Copy link
Owner

SeanNaren commented May 18, 2017

Really good idea, will get to it ASAP and open a PR to get this together!

EDIT: @ryanleary to keep simple do you think just a new file in the repo under the name PRETRAINED.md would suffice?

@ryanleary
Copy link
Collaborator

Oops missed the edit. That's probably alright. My only thought for wiki was so that there wouldn't need to be PRs every time there are new models. I don't really have a strong preference though. Do you have a preference where I upload that model above?

@SeanNaren
Copy link
Owner

SeanNaren commented May 30, 2017

That's a good idea @ryanleary! I'm going to try push to get the skip_rnn branch merged into pytorch because I want all the models atleast on initial release to be the pure DS2 architecture (which requires skip_rnn to be implemented).

Then I'll open a new issue to keep track of models trained!

@ryanleary
Copy link
Collaborator

Sure thing. Definitely looking forward to getting full batch norm support. Will retrain once we have a build of pytorch that supports it.

@ryanleary
Copy link
Collaborator

Since the skip_input work appears to be stalled, did you want to do this now or continue to wait?

@SeanNaren
Copy link
Owner

SeanNaren commented Jun 11, 2017

@ryanleary, I'll create a new issue with a plan on what needs to be done for the networks; my initial thoughts is that skip input isn't viable long term without cuDNN support. My reasoning on this is that it already takes long to train the DS architecture, and not utilising cuDNN slows this down drastically.

It will be even worse when Volta NVIDIA GPUs come out, and we can't utilise the new architecture. As a result I think the 'vanilla' architecture will have to stray a bit, and be a batch norm on top of the cuDNN RNN (architectures etc will be outlined in the issue!)

@SeanNaren
Copy link
Owner

@ryanleary and whoever else has input into this, does it make sense to train all models regardless of dataset on the full DS2 architecture (as close to this as possible).

@ryanleary
Copy link
Collaborator

I think that's certainly ideal, but we can update models in the future. Having some pretrained models that match what's currently implemented will at least help people experiment with this with a model that is better than "toy".

As an aside, I'm personally looking forward more to getting BatchNorm and lookahead convolutions implemented and moving toward the "Production" DeepSpeech implementation. Should be easier to train and it looks like it only costs about 5% relative performance hit [Spectrogram -> 2d conv -> 2d conv -> GRU -> GRU -> GRU [forward-only] -> 1D Row Conv -> FC]

@SeanNaren
Copy link
Owner

@ryanleary agreed. In my head getting the beam search language model integrated (taking it from the TF fork in the other issue) is the main step towards production DS, and probably the biggest at this stage!

I've opened a new ticket at #85 tracking progress of pre-trained models, will close this one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants