Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Batch RNNs #4

Closed
SeanNaren opened this issue Feb 22, 2017 · 15 comments
Closed

Support Batch RNNs #4

SeanNaren opened this issue Feb 22, 2017 · 15 comments

Comments

@SeanNaren
Copy link
Owner

Typical Deepspeech architecture uses batch normalized BRNNs. Implement this to stay true to the architecture.

@ryanleary
Copy link
Collaborator

Hey Sean-

Thanks for all the great work on this project. Is there a WIP branch for this work?

I was training an4 as a test and got performance a fair bit worse than the torch version, presumably due to the lack of batch norm?

@SeanNaren
Copy link
Owner Author

Yeah I'd definitely say its the lack of the Batch RNNs. Sadly the implementation for the damn CPU version of the RNN modules with the skip_input connections is being very tricky to implement. I'll hopefully get time tomorrow to work on this. I may however opt for a temporary solution for this for people wanting to train the same model (using a cuDNN only version of the branch till it's fixed).

@ryanleary
Copy link
Collaborator

Linking to pytorch/pytorch#894.

@EgorLakomkin
Copy link
Contributor

Are skip_input connections related to downsampling of the output of RNN?

@SeanNaren
Copy link
Owner Author

Honestly BatchRNNs are not critical to training the Deepspeech architecture but they are the correct way to do sequence wise RNN batch normalization. What skip input allows us to do is batch normalize the weight input matrix calculation outside of cuDNN, and pass this into the RNN (hopefully that makes sense)

But sticking a batch norm after the RNN also works and is more cleaner :) but for those who want the pure Deepspeech model, this is for them!

@SeanNaren
Copy link
Owner Author

Currently blocked by a cudnn bug which results in unexpected behaviour.

@SeanNaren
Copy link
Owner Author

An update on this issue, due to discrepancies and arguably bugs in the skip input (all gate-wise multiplications will use the same input matrix, refer here for more information) I'll be re-evaluating the worth of implementing this in a pure RNNCell fashion.

My primary concerns will be additional overhead in time taken and memory usage. If there is a large increase in either my notion will be to close this and assume pure DS2 architecture with the current architecture to utilise cuDNN

@dlmacedo
Copy link

dlmacedo commented Jun 6, 2017

Do you have any idea when this will be working using --cuda option?

@SeanNaren
Copy link
Owner Author

SeanNaren commented Jun 6, 2017

@dlmacedo I haven't started looking into implementation as of yet. I plan to get time to do this soon.

Just some notes:

Using autograd RNNs (pytorch standard) vs cuDNN RNNs on AN4:

cudnn
22 seconds for 1 epoch
9059mb

no cudnn, autograd
63 seconds for 1 epoch
4426mb

Much more memory efficient, however a noticeable slowdown.

@dlmacedo
Copy link

dlmacedo commented Jun 6, 2017

But batch normalized BRNNs (actual Deepspeech architecture) is still planned to come with LSTM anyway soon or latter, right?

@SeanNaren
Copy link
Owner Author

@dlmacedo depends on my availability really, hopefully over the weekend I get time to implement this!

I do want people to note however there will be a slow down in the RNN due to not using cuDNN, but I might be able to do some things more true to the DS2 architecture, like weight sharing in the BRNNs :)

@ryanleary
Copy link
Collaborator

Can you expound on how the weight sharing might be implemented?

@dlmacedo
Copy link

Any news about this?

@SeanNaren
Copy link
Owner Author

Due to speed and optimizations I will not be including this in the repo. We will default to GRUs!

@dlmacedo
Copy link

dlmacedo commented Aug 1, 2017

Could you please explain a bit more about this decision?

Instead of LSTM we will use GRU? What relation does this have with batch normalization?

shuieryin added a commit to shuieryin/deepspeech.pytorch that referenced this issue Jul 23, 2018
…d type torch.cuda.FloatTensor for argument SeanNaren#4 'other'" for File "train.py", line 270, in <module> optimizer.step()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants