Support Batch RNNs #4

SeanNaren · 2017-02-22T21:39:29Z

Typical Deepspeech architecture uses batch normalized BRNNs. Implement this to stay true to the architecture.

ryanleary · 2017-04-22T14:37:22Z

Hey Sean-

Thanks for all the great work on this project. Is there a WIP branch for this work?

I was training an4 as a test and got performance a fair bit worse than the torch version, presumably due to the lack of batch norm?

SeanNaren · 2017-04-22T21:47:33Z

Yeah I'd definitely say its the lack of the Batch RNNs. Sadly the implementation for the damn CPU version of the RNN modules with the skip_input connections is being very tricky to implement. I'll hopefully get time tomorrow to work on this. I may however opt for a temporary solution for this for people wanting to train the same model (using a cuDNN only version of the branch till it's fixed).

ryanleary · 2017-04-26T17:03:30Z

Linking to pytorch/pytorch#894.

EgorLakomkin · 2017-05-06T10:30:28Z

Are skip_input connections related to downsampling of the output of RNN?

SeanNaren · 2017-05-06T10:35:40Z

Honestly BatchRNNs are not critical to training the Deepspeech architecture but they are the correct way to do sequence wise RNN batch normalization. What skip input allows us to do is batch normalize the weight input matrix calculation outside of cuDNN, and pass this into the RNN (hopefully that makes sense)

But sticking a batch norm after the RNN also works and is more cleaner :) but for those who want the pure Deepspeech model, this is for them!

SeanNaren · 2017-05-08T09:36:52Z

Currently blocked by a cudnn bug which results in unexpected behaviour.

SeanNaren · 2017-06-06T09:03:00Z

An update on this issue, due to discrepancies and arguably bugs in the skip input (all gate-wise multiplications will use the same input matrix, refer here for more information) I'll be re-evaluating the worth of implementing this in a pure RNNCell fashion.

My primary concerns will be additional overhead in time taken and memory usage. If there is a large increase in either my notion will be to close this and assume pure DS2 architecture with the current architecture to utilise cuDNN

dlmacedo · 2017-06-06T20:08:57Z

Do you have any idea when this will be working using --cuda option?

SeanNaren · 2017-06-06T20:15:54Z

@dlmacedo I haven't started looking into implementation as of yet. I plan to get time to do this soon.

Just some notes:

Using autograd RNNs (pytorch standard) vs cuDNN RNNs on AN4:

cudnn
22 seconds for 1 epoch
9059mb

no cudnn, autograd
63 seconds for 1 epoch
4426mb

Much more memory efficient, however a noticeable slowdown.

dlmacedo · 2017-06-06T20:52:03Z

But batch normalized BRNNs (actual Deepspeech architecture) is still planned to come with LSTM anyway soon or latter, right?

SeanNaren · 2017-06-07T16:01:48Z

@dlmacedo depends on my availability really, hopefully over the weekend I get time to implement this!

I do want people to note however there will be a slow down in the RNN due to not using cuDNN, but I might be able to do some things more true to the DS2 architecture, like weight sharing in the BRNNs :)

ryanleary · 2017-06-08T20:00:48Z

Can you expound on how the weight sharing might be implemented?

dlmacedo · 2017-06-24T03:07:20Z

Any news about this?

SeanNaren · 2017-08-01T07:48:57Z

Due to speed and optimizations I will not be including this in the repo. We will default to GRUs!

dlmacedo · 2017-08-01T23:47:59Z

Could you please explain a bit more about this decision?

Instead of LSTM we will use GRU? What relation does this have with batch normalization?

…d type torch.cuda.FloatTensor for argument SeanNaren#4 'other'" for File "train.py", line 270, in <module> optimizer.step()

SeanNaren added the enhancement label Feb 22, 2017

SeanNaren self-assigned this Feb 22, 2017

SeanNaren added the in progress label Apr 10, 2017

SeanNaren added Blocked and removed in progress labels May 8, 2017

SeanNaren removed the Blocked label May 15, 2017

SeanNaren mentioned this issue May 15, 2017

Convergence speed difference between pytorch and torch #61

Closed

SeanNaren added the Blocked label Jun 6, 2017

dlmacedo mentioned this issue Jun 24, 2017

Validation loss increasing while WER decreases #78

Closed

SeanNaren closed this as completed Aug 1, 2017

dlmacedo mentioned this issue Aug 10, 2017

Is there a difference in the performance of the LibriSpeech dataset between deepspeech.pytorch and deepspeech.torch #130

Closed

spakhomov mentioned this issue Sep 21, 2018

Error when using multi-GPU training #324

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support Batch RNNs #4

Support Batch RNNs #4

SeanNaren commented Feb 22, 2017

ryanleary commented Apr 22, 2017

SeanNaren commented Apr 22, 2017

ryanleary commented Apr 26, 2017

EgorLakomkin commented May 6, 2017

SeanNaren commented May 6, 2017

SeanNaren commented May 8, 2017

SeanNaren commented Jun 6, 2017

dlmacedo commented Jun 6, 2017

SeanNaren commented Jun 6, 2017 •

edited

Loading

dlmacedo commented Jun 6, 2017

SeanNaren commented Jun 7, 2017

ryanleary commented Jun 8, 2017

dlmacedo commented Jun 24, 2017

SeanNaren commented Aug 1, 2017

dlmacedo commented Aug 1, 2017 •

edited

Loading

Support Batch RNNs #4

Support Batch RNNs #4

Comments

SeanNaren commented Feb 22, 2017

ryanleary commented Apr 22, 2017

SeanNaren commented Apr 22, 2017

ryanleary commented Apr 26, 2017

EgorLakomkin commented May 6, 2017

SeanNaren commented May 6, 2017

SeanNaren commented May 8, 2017

SeanNaren commented Jun 6, 2017

dlmacedo commented Jun 6, 2017

SeanNaren commented Jun 6, 2017 • edited Loading

dlmacedo commented Jun 6, 2017

SeanNaren commented Jun 7, 2017

ryanleary commented Jun 8, 2017

dlmacedo commented Jun 24, 2017

SeanNaren commented Aug 1, 2017

dlmacedo commented Aug 1, 2017 • edited Loading

SeanNaren commented Jun 6, 2017 •

edited

Loading

dlmacedo commented Aug 1, 2017 •

edited

Loading