New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cuDNN bug in Caffe with "group" in conv layer (misaligned address) #5729
Comments
Could you please post the error log? |
F0630 15:37:53.939421 12138 benchmark.cpp:92] Check failed: error == cudaSuccess (74 vs. 0) misaligned address |
i met the same question, the group size 2 is ok, 3 or larger is some wrong |
I tried it, get the same error even on removing group. The network converges well without cudNN but slowly. Did you manage any fix to the problem? @svobora . |
same here. with group=output it requires a huge amount of mem. reducing batch size to the minimum crashes after 2000-3000 iteration as out of memory |
Consider use ConvolutionDepthwise (#5665) to replace convolution with group parameters. |
I got the same error with the following layer
I don't know why, with output num 128 or kernel_size 5 there will be no problem...... |
I'm unable to reproduce the problem; more specific instructions are needed. |
@Noiredd |
@douzsh I just ran this network with no problems - both in Python and |
@Noiredd |
@douzsh I'm pretty sure you need at least CUDA 7.5 to run cuDNN 6 - see the download page for Nvidia cuDNN for a list of compatible releases. I ran my test on CUDA 9.0.176 and cuDNN 7.0.5. |
@svobora this is a bug of Caffe, I solved it by modifying cudnn_conv_layer.cpp and aligning the address to be multiples of 32. You can insert tow lines of code before
BTW, I think there is another bug, these lines should be put in else block:
|
@hoszbh Just wanted to confirm that you fix is working thanks a lot. Do you know why this fix not on the master yet? |
See also #6548 |
…. This fixes the error by aligning by the address to be multiple of m (32). Fixes also another bug of not correctly grouped if..else see: BVLC#5729
after fix the code ,should compile caffe again? |
Issue summary
Using "group" parameter in any convolution layer, with CUDNN, I get "misaligned address" error when the training phase starts. The (first?) test phase is not affected. The error disappears when I build caffe with CUDA but without CUDNN. However such a training is 2x slower...
Steps to reproduce
Checkout repo, build with CUDNN, use "group" parameter of Convolution layer in some net and run training.
Your system configuration
Operating system: Ubuntu 16.04
Compiler: gcc 5.4
CUDA version (if applicable): 8
CUDNN version (if applicable): 5.1
BLAS: open
The text was updated successfully, but these errors were encountered: