-
Notifications
You must be signed in to change notification settings - Fork 621
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support Batch RNNs #4
Comments
Hey Sean- Thanks for all the great work on this project. Is there a WIP branch for this work? I was training an4 as a test and got performance a fair bit worse than the torch version, presumably due to the lack of batch norm? |
Yeah I'd definitely say its the lack of the Batch RNNs. Sadly the implementation for the damn CPU version of the RNN modules with the skip_input connections is being very tricky to implement. I'll hopefully get time tomorrow to work on this. I may however opt for a temporary solution for this for people wanting to train the same model (using a cuDNN only version of the branch till it's fixed). |
Linking to pytorch/pytorch#894. |
Are skip_input connections related to downsampling of the output of RNN? |
Honestly BatchRNNs are not critical to training the Deepspeech architecture but they are the correct way to do sequence wise RNN batch normalization. What skip input allows us to do is batch normalize the weight input matrix calculation outside of cuDNN, and pass this into the RNN (hopefully that makes sense) But sticking a batch norm after the RNN also works and is more cleaner :) but for those who want the pure Deepspeech model, this is for them! |
Currently blocked by a cudnn bug which results in unexpected behaviour. |
An update on this issue, due to discrepancies and arguably bugs in the skip input (all gate-wise multiplications will use the same input matrix, refer here for more information) I'll be re-evaluating the worth of implementing this in a pure RNNCell fashion. My primary concerns will be additional overhead in time taken and memory usage. If there is a large increase in either my notion will be to close this and assume pure DS2 architecture with the current architecture to utilise cuDNN |
Do you have any idea when this will be working using --cuda option? |
@dlmacedo I haven't started looking into implementation as of yet. I plan to get time to do this soon. Just some notes: Using autograd RNNs (pytorch standard) vs cuDNN RNNs on AN4:
Much more memory efficient, however a noticeable slowdown. |
But batch normalized BRNNs (actual Deepspeech architecture) is still planned to come with LSTM anyway soon or latter, right? |
@dlmacedo depends on my availability really, hopefully over the weekend I get time to implement this! I do want people to note however there will be a slow down in the RNN due to not using cuDNN, but I might be able to do some things more true to the DS2 architecture, like weight sharing in the BRNNs :) |
Can you expound on how the weight sharing might be implemented? |
Any news about this? |
Due to speed and optimizations I will not be including this in the repo. We will default to GRUs! |
Could you please explain a bit more about this decision? Instead of LSTM we will use GRU? What relation does this have with batch normalization? |
…d type torch.cuda.FloatTensor for argument SeanNaren#4 'other'" for File "train.py", line 270, in <module> optimizer.step()
Typical Deepspeech architecture uses batch normalized BRNNs. Implement this to stay true to the architecture.
The text was updated successfully, but these errors were encountered: