Issues to Run on GPU #4

boleamol · 2016-04-15T05:45:42Z

Hi,
Thanks for your support up to now, We are simultaneously running on GPU also. We are using entry level NVIDIA GPU, Quadro K420, which is having 192 CUDA Cores and Total Memory 1024MB. I installed all the dependencies which is mentioned by you in README.md file. I am facing the following error. After this error also I checked the dependencies but no change.

"Training Epoch: 1
lua: /root/torch/install/share/lua/5.1/nn/Container.lua:67:
In 1 module of nn.Sequential:
In 1 module of nn.Sequential:
/root/torch/install/share/lua/5.1/cudnn/init.lua:58: Error in CuDNN: CUDNN_STATUS_BAD_PARAM (cudnnSetFilterNdDescriptor)
stack traceback:
[C]: in function 'error'
/root/torch/install/share/lua/5.1/cudnn/init.lua:58: in function 'errcheck'
...h/install/share/lua/5.1/cudnn/SpatialConvolution.lua:45: in function 'resetWeightDescriptors'
...h/install/share/lua/5.1/cudnn/SpatialConvolution.lua:358: in function <...h/install/share/lua/5.1/cudnn/SpatialConvolution.lua:357>
(tail call): ?
[C]: in function 'xpcall'
/root/torch/install/share/lua/5.1/nn/Container.lua:58: in function 'rethrowErrors'
/root/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function </root/torch/install/share/lua/5.1/nn/Sequential.lua:41>
(tail call): ?
[C]: in function 'xpcall'
/root/torch/install/share/lua/5.1/nn/Container.lua:58: in function 'rethrowErrors'
/root/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function </root/torch/install/share/lua/5.1/nn/Sequential.lua:41>
(tail call): ?
./Network.lua:95: in function 'opfunc'
/root/torch/install/share/lua/5.1/optim/sgd.lua:44: in function 'sgd'
./Network.lua:111: in function 'trainNetwork'
AN4CTCTrain.lua:40: in main chunk
[C]: ?"

Please support ...

SeanNaren · 2016-04-15T09:47:32Z

Hm this is strange it is working fine on my end. Just as a test in the Network.lua class could you replace all cudnn.withnn. in the createSpeechNetwork() method and try running again? We can find out if it is just a cudnn problem or if there is something within the code.

boleamol · 2016-04-15T11:44:44Z

As per your guidance I modified createSpeechNetwork() method and now it is running, but GPU memory is less so it is giving error
"Training Epoch: 1
THCudaCheck FAIL file=/tmp/luarocks_cutorch-scm-1-2631/cutorch/lib/THC/generic/THCStorage.cu line=41 error=2 : out of memory
lua: .../speech/torch/install/share/lua/5.1/nn/Container.lua:69:
In 1 module of nn.Sequential:
In 5 module of nn.Sequential:
/home/speech/torch/install/share/lua/5.1/nn/THNN.lua:109: cuda runtime error (2) : out of memory at /tmp/luarocks_cutorch-scm-1-2631/cutorch/lib/THC/generic/THCStorage.cu:41
stack traceback:
[C]: in function 'v'
/home/speech/torch/install/share/lua/5.1/nn/THNN.lua:109: in function 'SpatialConvolutionMM_updateOutput'
...orch/install/share/lua/5.1/nn/SpatialConvolution.lua:104: in function <...orch/install/share/lua/5.1/nn/SpatialConvolution.lua:100>
"
I also observed memory usage its reached to 97%..
Now waiting for New GPU with high configuration...
Anyhow Thanks for support ....

SeanNaren · 2016-04-15T11:49:52Z

Ah so it is a cudnn issue, have you installed cudnn via the nvidia library (copying the .so files etc to the /usr/local/cuda install location, and adding to the ~/.bashrc?

And yeah because of the batching it might use a bit of memory, I'm running a GTX 970 with 4gb and it fits alright onto mem.

Hopefully in the coming weeks I completely redo the master branch with whats coming in the voxforge update branch which will allow the minibatch size to be customised (put a max minibatch size) which will reduce memory overhead.

boleamol · 2016-04-15T12:01:17Z

Yes, I installed cudnn via nvidia library also copied .so files to the /usr/local/cuda install location, and added to the ~/.bashrc.. Then also issue was there.. Lets I will also try again..
If you reducing batch size then that is good for me.. Thank you..

SeanNaren · 2016-04-16T12:07:27Z

Hopefully once I merge branches it will allow you to run the model on your PC, I'll close the issue for now!

boleamol · 2016-04-16T19:49:59Z

Ok, fine Thank you sir...

slbinilkumar · 2016-04-21T06:48:06Z

CUDNN_STATUS_BAD_PARAM this issue can be solved by using cudnn 4 version and put it in LD_Library path

boleamol changed the title ~~Issue to Run on GPU~~ Issues to Run on GPU Apr 15, 2016

SeanNaren closed this as completed Apr 16, 2016

boleamol mentioned this issue May 3, 2016

To run on small size GPU memory #16

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issues to Run on GPU #4

Issues to Run on GPU #4

boleamol commented Apr 15, 2016 •

edited

SeanNaren commented Apr 15, 2016

boleamol commented Apr 15, 2016

SeanNaren commented Apr 15, 2016

boleamol commented Apr 15, 2016

SeanNaren commented Apr 16, 2016

boleamol commented Apr 16, 2016

slbinilkumar commented Apr 21, 2016

Issues to Run on GPU #4

Issues to Run on GPU #4

Comments

boleamol commented Apr 15, 2016 • edited

SeanNaren commented Apr 15, 2016

boleamol commented Apr 15, 2016

SeanNaren commented Apr 15, 2016

boleamol commented Apr 15, 2016

SeanNaren commented Apr 16, 2016

boleamol commented Apr 16, 2016

slbinilkumar commented Apr 21, 2016

boleamol commented Apr 15, 2016 •

edited