Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added BRNN support #3

Closed
wants to merge 49 commits into from
Closed

Added BRNN support #3

wants to merge 49 commits into from

Conversation

SeanNaren
Copy link

Hey I've added BLSTM support, hopefully it looks good! I've had to expose some methods in RNN to inherit. Hopefully it's useful, let me know of any feedback.

@borisfom
Copy link
Owner

Looks good. Can you add some tests, too ?

@SeanNaren
Copy link
Author

Good idea but not sure how to test the RNN script, is there a way that we could get the bias and weights of the gate? How would you suggest testing the two modules?

@ngimel
Copy link
Collaborator

ngimel commented Apr 12, 2016

cudnn provides cudnnGetRNNLinLayerMatrixParams and cudnnGetRNNLinLayerBiasParams functions for getting pointers to biases and weights.

@SeanNaren
Copy link
Author

I'm trying to replicate the sample as to act as the test however I can't figure out how to create the double pointer required for the cudnnGetRNNLinLayerMatrixParams method for the last arg:

float *linLayerMat;

         cudnnErrCheck(cudnnGetRNNLinLayerMatrixParams( cudnnHandle,
                                                        rnnDesc,  
                                                        layer,
                                                        xDesc, 
                                                        wDesc, 
                                                        w,
                                                        linLayerID,  
                                                        linLayerMatDesc, 
                                                        (void**)&linLayerMat));

How would I create the double pointer? My attempt may also help:

local ptr = ffi.new("float*")

local outputDesc = rnn:createFilterDescriptors(1)
errcheck('cudnnGetRNNLinLayerMatrixParams',
    cudnn.getHandle(),
    rnn.rnnDesc[0],
    0,
    rnn.xDescs,
    rnn.wDesc[0],
    rnn.weight:data(),
    0,
    outputDesc[0],
    ptr) -- How to cast float* to void**?

@ngimel
Copy link
Collaborator

ngimel commented Apr 12, 2016

@SeanNaren
Copy link
Author

Thanks! Managed to cast it using ffi to get the pointer like such, but not sure how to access the actual data from the pointer. In the C RNN example all values in the matrix/bias are initialised to a set value, would I be able to somehow replicate that in lua?

 local linLayerMatDesc = rnn:createFilterDescriptors(1)
        local matrixPointer = ffi.new("float*[1]")
        errcheck('cudnnGetRNNLinLayerMatrixParams',
            cudnn.getHandle(),
            rnn.rnnDesc[0],
            layer,
            rnn.xDescs,
            rnn.wDesc[0],
            rnn.weight:data(),
            layerId,
            linLayerMatDesc[0],
            ffi.cast("void**", matrixPointer)) -- How to access data stored at this pointer?

@ngimel
Copy link
Collaborator

ngimel commented Apr 12, 2016

You'll have to get weight size from linLayerMatDesc[0], calculate offset as
offset = matrixPointer[0] - rnn.weight:data(), and then do something like

wTensor = torch.Tensor(rnn.weight:storage(), offset,size)
wTensor:fill(1)

@SeanNaren
Copy link
Author

Awesome, and for the bias values would the same approach work? The command is nearly identical:

 local linLayerBiasDesc = rnn:createFilterDescriptors(1)
        local biasPointer = ffi.new("float*[1]")
        errcheck('cudnnGetRNNLinLayerBiasParams',
            cudnn.getHandle(),
            rnn.rnnDesc[0],
            layer,
            rnn.xDescs,
            rnn.wDesc[0],
            rnn.weight:data(),
            layerId,
            linLayerBiasDesc[0],
            ffi.cast("void**", biasPointer))

@ngimel
Copy link
Collaborator

ngimel commented Apr 13, 2016

Yes, this should work for biases too.

@SeanNaren
Copy link
Author

Added a test for RNN that uses the checksums taken from the cudnn RNN C sample. The checksums for a few of the backward passes are not correct, any help would be great! Will try to fix them as well.

@SeanNaren
Copy link
Author

The test now works for RELU/TANH/LSTM/GRU. One failing test (the weight checksum for ReLU is slightly off, not entirely sure why).

@ngimel
Copy link
Collaborator

ngimel commented Apr 13, 2016

Great, thanks!

@SeanNaren
Copy link
Author

Just as a side note would it make more sense to add the LSTM/GRU RNNs as separate modules like a BLSTM?

local dataType = 'CUDNN_DATA_FLOAT'
local format = 'CUDNN_TENSOR_NCHW'
local nbDims = torch.IntTensor(1)
local filterDimA = torch.IntTensor({ 1, 1, 1 })
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can remove hard-coded 3 dimensions by

local minDim=3
local filterDimA=torch.ones(minDim)

replace '3' argument to GetFilterNdDescriptor call with minDim, and replace filterDimA[1] * filterDimA[2] * filterDimA[3] with filterDimA:prod()

@SeanNaren
Copy link
Author

SeanNaren commented Apr 15, 2016

Added batchFirst to the RNN, let me know what you guys think!

EDIT: should we support RNN ReLU/Tanh as separate modules? Only thing I'm concerned is about is that after setting self.mode to either, you have to reset the rnndescriptor (a call to reset()). Does this warrant a separate module?

@ngimel
Copy link
Collaborator

ngimel commented Apr 16, 2016

@SeanNaren, Tanh, ReLu and Sigmoid activations all have their own separate modules, so I don't see why Tanh or ReLU RNNs shouldn't. Great work and thanks for your help!
@elbamos, @SeanNaren, do you guys think we should handle saving hidden outputs/using them as hidden inputs somewhat more consistently than it is done now in the base RNN class? Say, have functions for copying hy to hx, and forgetting hx? Have an option to not output hy (cudnn allows this)? That would provide some (small) memory savings if you don't have to unroll beyond a single call to cudnn.

@elbamos
Copy link

elbamos commented Apr 16, 2016

@ngimel Maybe other people would, but I do not need the ability to drill-down into hx and hy. I would, though, benefit if we had a choice between keeping all hidden outputs versus only the last output from the sequence, but not in a B-RNN. Is that what you meant?

@elbamos
Copy link

elbamos commented Apr 16, 2016

I'm seeing this with the current version when I try to create a new BLSTM layer. The BLSTM layer I created last night continues to function and train.

th> cnn:add(cudnn.BLSTM(7040, 7040 * 16, 2, true))
cuda runtime error (77) : an illegal memory access was encountered at /tmp/luarocks_cutorch-scm-1-3384/cutorch/lib/THC/generic/THCStorage.c:147
stack traceback:
        [C]: at 0x7f5cfb1f3210
        [C]: at 0x7f5cfb1f8170
        [C]: in function 'CudaTensor'
        /usr/local/share/lua/5.1/cudnn/RNN.lua:114: in function 'resetDropoutDescriptor'
        /usr/local/share/lua/5.1/cudnn/RNN.lua:47: in function 'reset'
        /usr/local/share/lua/5.1/cudnn/RNN.lua:33: in function '__init'
        /usr/local/share/lua/5.1/cudnn/BLSTM.lua:4: in function '__init'
        /usr/local/share/lua/5.1/torch/init.lua:91: in function </usr/local/share/lua/5.1/torch/init.lua:87>
        [C]: in function 'BLSTM'
        [string "_RESULT={cnn:add(cudnn.BLSTM(7040, 7040 * 16,..."]:1: in main chunk
        [C]: in function 'xpcall'
        /usr/local/share/lua/5.1/trepl/init.lua:651: in function 'repl'
        /usr/local/lib/luarocks/rocks/trepl/scm-1/bin/th:199: in main chunk
        [C]: at 0x00405ea0

@SeanNaren
Copy link
Author

SeanNaren commented Apr 16, 2016

@elbamos that's a big hidden dim you have there, I don't think I can fit that onto my memory in the first place to test! It works alright with a smaller inputdim on my end however.

Added separate modules for Tanh and for ReLU. @ngimel Sounds great but I'm sure you are much more informed than me, so any guidance on how to approach this would be great :)

EDIT: should've done this some time ago, added you as as collab ngimel feel free to change anything you see fit!

EDIT2: Don't want to spam too much but tests are now much faster. Should've realised earlier it was the manual for loops that were taking up all our time...

@SeanNaren
Copy link
Author

SeanNaren commented Apr 17, 2016

@borisfom I think the support is mostly done and tests are looking good, how would you suggest us merging the stuff here and RNN into main? Shall I rebase onto R5 into a new branch and open a new request with that branch into R5?

@borisfom
Copy link
Owner

Please hold on - I am rebasing my R5 to upstream R5 (nontrivial exercise due to selective merge they did earlier). This will likely result in a new branch - then I will ask you to do a PR against that branch.

@ngimel
Copy link
Collaborator

ngimel commented Apr 18, 2016

@SeanNaren Thanks! Looks like neither you nor @elbamos need more flexibility for hidden layers, so we can leave that till later date. You guys serve as proxy for real RNN users ;-)
@elbamos - currently you can not keep all your hidden outputs if you are processing your whole sequence in a single call to cudnn. If you really need it, you need to construct your network as a series of calls to cudnn, one for every time step - pretty much like its done in other torch rnn packages. Then you'll have all your hidden outputs (and you'll also lose performance optimizations beyond "Optimizing a single iteration" listed in the blogpost. Though "Pre-Transposing the Weight Matrix" possibly can be used even for this case, but it will require changing APIs, so not in this cudnn version:-). Let me know if you think this is an important case, may be we should have an API call for changing weight matrix layout).

@borisfom
Copy link
Owner

@SeanNaren : actually, please nevermind, I will merge the PR before rebase.
I will have to do some squashing (rebase -i HEAD~n) before merging it back to soumith's branch (ideally down to 2 or 3 commits) : please let me know if you would rather do it yourself on your branch now.

@SeanNaren
Copy link
Author

@ngimel, I think I see what you mean now, if it is a feature people want once the PR is merged then I'll be more than happy to take a look :)
@borisfom I'll help you out on the part, give me some time and I'll rebase to a new branch ASAP.

@borisfom
Copy link
Owner

@SeanNaren: yes the best course would be for you to create a new branch from current one, and then squash that one to very few commits, then send a new PR from that branch (unless you can change the branch in the PR directly)

@SeanNaren
Copy link
Author

SeanNaren commented Apr 18, 2016

@borisfom, Right here is the rebased branch, hopefully its alright just being 1 commit, let me know what you think!

If its good let me know what branch you want the PR to.

@borisfom
Copy link
Owner

SeanNaren: yes one commit is perfect! Please do PR against branch R5-re.

@SeanNaren SeanNaren closed this Apr 23, 2016
@SeanNaren
Copy link
Author

@borisfom @ngimel thought this may be a suitable place to ask, how difficult would it be to support summation of the output layers rather than concat for BRNNs?

@ngimel
Copy link
Collaborator

ngimel commented Apr 26, 2016

You can always add your outputs outside of cuDNN. We went with concatenation rather than addition because it preserves the information and leaves the user free to postprocess the output the way user sees fit. There is no overlap between layers in a bidirectional network, so if you split layers and add the outputs yourself, you should not loose much performance.

@SeanNaren
Copy link
Author

Awesome thanks!

borisfom added a commit that referenced this pull request Aug 9, 2017
Spatial/Volumetric dilated convolution added with tests.
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants