Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues to Run on GPU #4

Closed
boleamol opened this issue Apr 15, 2016 · 7 comments
Closed

Issues to Run on GPU #4

boleamol opened this issue Apr 15, 2016 · 7 comments

Comments

@boleamol
Copy link

boleamol commented Apr 15, 2016

Hi,
Thanks for your support up to now, We are simultaneously running on GPU also. We are using entry level NVIDIA GPU, Quadro K420, which is having 192 CUDA Cores and Total Memory 1024MB. I installed all the dependencies which is mentioned by you in README.md file. I am facing the following error. After this error also I checked the dependencies but no change.

"Training Epoch: 1
lua: /root/torch/install/share/lua/5.1/nn/Container.lua:67:
In 1 module of nn.Sequential:
In 1 module of nn.Sequential:
/root/torch/install/share/lua/5.1/cudnn/init.lua:58: Error in CuDNN: CUDNN_STATUS_BAD_PARAM (cudnnSetFilterNdDescriptor)
stack traceback:
[C]: in function 'error'
/root/torch/install/share/lua/5.1/cudnn/init.lua:58: in function 'errcheck'
...h/install/share/lua/5.1/cudnn/SpatialConvolution.lua:45: in function 'resetWeightDescriptors'
...h/install/share/lua/5.1/cudnn/SpatialConvolution.lua:358: in function <...h/install/share/lua/5.1/cudnn/SpatialConvolution.lua:357>
(tail call): ?
[C]: in function 'xpcall'
/root/torch/install/share/lua/5.1/nn/Container.lua:58: in function 'rethrowErrors'
/root/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function </root/torch/install/share/lua/5.1/nn/Sequential.lua:41>
(tail call): ?
[C]: in function 'xpcall'
/root/torch/install/share/lua/5.1/nn/Container.lua:58: in function 'rethrowErrors'
/root/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function </root/torch/install/share/lua/5.1/nn/Sequential.lua:41>
(tail call): ?
./Network.lua:95: in function 'opfunc'
/root/torch/install/share/lua/5.1/optim/sgd.lua:44: in function 'sgd'
./Network.lua:111: in function 'trainNetwork'
AN4CTCTrain.lua:40: in main chunk
[C]: ?
"

Please support ...

@boleamol boleamol changed the title Issue to Run on GPU Issues to Run on GPU Apr 15, 2016
@SeanNaren
Copy link
Owner

Hm this is strange it is working fine on my end. Just as a test in the Network.lua class could you replace all cudnn.withnn. in the createSpeechNetwork() method and try running again? We can find out if it is just a cudnn problem or if there is something within the code.

@boleamol
Copy link
Author

As per your guidance I modified createSpeechNetwork() method and now it is running, but GPU memory is less so it is giving error
"Training Epoch: 1
THCudaCheck FAIL file=/tmp/luarocks_cutorch-scm-1-2631/cutorch/lib/THC/generic/THCStorage.cu line=41 error=2 : out of memory
lua: .../speech/torch/install/share/lua/5.1/nn/Container.lua:69:
In 1 module of nn.Sequential:
In 5 module of nn.Sequential:
/home/speech/torch/install/share/lua/5.1/nn/THNN.lua:109: cuda runtime error (2) : out of memory at /tmp/luarocks_cutorch-scm-1-2631/cutorch/lib/THC/generic/THCStorage.cu:41
stack traceback:
[C]: in function 'v'
/home/speech/torch/install/share/lua/5.1/nn/THNN.lua:109: in function 'SpatialConvolutionMM_updateOutput'
...orch/install/share/lua/5.1/nn/SpatialConvolution.lua:104: in function <...orch/install/share/lua/5.1/nn/SpatialConvolution.lua:100>
"

I also observed memory usage its reached to 97%..
Now waiting for New GPU with high configuration...
Anyhow Thanks for support ....

@SeanNaren
Copy link
Owner

Ah so it is a cudnn issue, have you installed cudnn via the nvidia library (copying the .so files etc to the /usr/local/cuda install location, and adding to the ~/.bashrc?

And yeah because of the batching it might use a bit of memory, I'm running a GTX 970 with 4gb and it fits alright onto mem.

Hopefully in the coming weeks I completely redo the master branch with whats coming in the voxforge update branch which will allow the minibatch size to be customised (put a max minibatch size) which will reduce memory overhead.

@boleamol
Copy link
Author

Yes, I installed cudnn via nvidia library also copied .so files to the /usr/local/cuda install location, and added to the ~/.bashrc.. Then also issue was there.. Lets I will also try again..
If you reducing batch size then that is good for me.. Thank you..

@SeanNaren
Copy link
Owner

Hopefully once I merge branches it will allow you to run the model on your PC, I'll close the issue for now!

@boleamol
Copy link
Author

Ok, fine Thank you sir...

@slbinilkumar
Copy link

CUDNN_STATUS_BAD_PARAM this issue can be solved by using cudnn 4 version and put it in LD_Library path

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants