nn.BiSequencer, cudnn and non-contiguous input #178

willfrey · 2016-03-26T23:06:29Z

I'm having a problem using nn.BiSequencer() with cudnn.

Here's a simple example:

require 'rnn'
require 'cunn'
require 'cudnn'

batch_size = 16
maxLen = 100
nFeat = 201
hiddenSize = 256

net = nn.Sequential()
net:add(nn.SplitTable(1,2))
net:add(nn.BiSequencer(nn.FastLSTM(nFeat,256),nn.FastLSTM(nFeat,256),nn.CAddTable(true)))
net:cuda()
cudnn.convert(net,cudnn)

inputs = torch.randn(batch_size,maxLen,nFeat):cuda()
outputs = net:forward(inputs)

Here's the error I get: cudnn/Pointwise.lua:11: Non-contiguous inputs not supported yet

I read through some closed issues and tried these variations for the network. None of them work, unfortunately.

net = nn.Sequential()
net:add(nn.Transpose({1,2})
net:add(nn.SplitTable(1)) -- trying to maintain contiguous inputs for the BiSequencer
net:add(nn.Copy(nil,nil,false)) -- really trying to maintain contiguous data for the BiSequencer
net:add(nn.BiSequencer(nn.FastLSTM(nFeat,256),nn.FastLSTM(nFeat,256),nn.CAddTable(true)))
net:cuda()
cudnn.convert(net,cudnn)

Any help would be appreciated.

The text was updated successfully, but these errors were encountered:

northanapon · 2016-04-15T15:31:09Z

I found similar problem in nn.FastLSTM for cuDNN R5. Here an example:

lstm = nn.FastLSTM(2,10)
lstm:cuda()
cudnn.convert(lstm, cudnn)

Non-batch mode works fine

lstm:forget()
lstm:forward(torch.rand(1, 2):cuda())

Batch mode does not work

lstm:forget()
lstm:forward(torch.rand(4, 2):cuda())

Here is the error message:

In 1 module of nn.Sequential:
In 1 module of nn.ConcatTable:
In 6 module of nn.Sequential:
In 1 module of nn.ParallelTable:
... Non-contiguous inputs not supported yet

So that is the cudnn.Sigmoid (and possibly the other 3 cudnn activations) when computing LSTM gates.

Is there a plan to adopt cuDNN R5's LSTM and GRU API? I heard they much faster from NVIDIA's recent talk at my university.

nicholas-leonard · 2016-04-15T20:11:46Z

@northanapon Yes there is a plan : borisfom/cudnn.torch#3 . In the mean time, you can use Justin's super fast SeqLSTM : #207 . Not sure if it will solve your bug though.

northanapon · 2016-04-15T20:37:03Z

Thanks @nicholas-leonard, I tried SeqLSTM. It is faster than FastLSTM on GPU.
cudnn.convert(seqlstm, cudnn) works as well, but this does not give any speed up to basic SeqLSTM.

ngimel · 2016-04-15T22:06:32Z

@northanapon cudnn is not expected to give speed-up over basic SeqLSTM - most of the work is in the Linear layers that are mapped to cublas, not cudnn. The only thing that gets mapped to cudnn is activations which are a small fraction of computation and nn implementation of those is reasonable. Same would be true for FastLSTM - even if you could convert it to cudnn, you wouldn't see a speedup. But stay tuned for torch bindings for cudnn LSTM implementation.

nicholas-leonard closed this as completed Apr 15, 2016

hashbangCoder mentioned this issue Sep 16, 2016

Compatilibity with CuDNN R5 #239

Open

This issue was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

nn.BiSequencer, cudnn and non-contiguous input #178

nn.BiSequencer, cudnn and non-contiguous input #178

willfrey commented Mar 26, 2016

northanapon commented Apr 15, 2016

nicholas-leonard commented Apr 15, 2016

northanapon commented Apr 15, 2016

ngimel commented Apr 15, 2016

nn.BiSequencer, cudnn and non-contiguous input #178

nn.BiSequencer, cudnn and non-contiguous input #178

Comments

willfrey commented Mar 26, 2016

northanapon commented Apr 15, 2016

nicholas-leonard commented Apr 15, 2016

northanapon commented Apr 15, 2016

ngimel commented Apr 15, 2016