Incompatible with current cudnn 8.0.3 ? #6970

peijason · 2020-09-19T01:18:35Z

Trying to build caffe 1.0.0 but failed against cudnn .

System configuration

Operating system: Ubuntu 20.04
Compiler: 9.3.0
CUDA version (if applicable): 11
CUDNN version (if applicable): 8.0.3
BLAS: 3.9.0
Python version (if using pycaffe): 3.8.2
MATLAB version (if using matcaffe): N/A

Failed with the following ERROR message:

error: ‘CUDNN_CONVOLUTION_FWD_SPECIFY_WORKSPACE_LIMIT’ was not declared in this scope
error: ‘CUDNN_CONVOLUTION_BWD_FILTER_SPECIFY_WORKSPACE_LIMIT’ was not declared in this scope

etc. a lot ...

gembancud · 2020-09-19T18:28:46Z

Apparently so. I have a project study hinging on its use but the repo is stale. Windows works btw, CUDA 11, Cudnn 8.0.3.33

Qengineering · 2020-11-03T12:28:20Z

I've got the same problem with Caffe and cuDNN version 8.

As of version 8, NVIDIA has dropped the cudnnGetConvolutionBackwardFilterAlgorithm.
The other two obsolete API calls, cudnnGetConvolutionForwardAlgorithm and cudnnGetConvolutionBackwardDataAlgorithm, have some replacement.

Because there is no replacement for the cudnnGetConvolutionBackwardFilterAlgorithm I've followed the strategy of the PaddlePaddle framework, by giving the outcome a constant CUDNN_CONVOLUTION_BWD_FILTER_ALGO_1 value and twice the memory earlier found with the cudnnGetConvolutionForwardAlgorithm.

I could request a merge in this repo, but not quite sure if the solution will work at all times, I decided to put it in our own GitHub repo first. If it turns out that it works fine, I will merge.

For now, please use this repo.

astropiu · 2020-11-13T08:27:43Z

I've got the same problem with Caffe and cuDNN version 8.

As of version 8, NVIDIA has dropped the cudnnGetConvolutionBackwardFilterAlgorithm.
The other two obsolete API calls, cudnnGetConvolutionForwardAlgorithm and cudnnGetConvolutionBackwardDataAlgorithm, have some replacement.

Because there is no replacement for the cudnnGetConvolutionBackwardFilterAlgorithm I've followed the strategy of the PaddlePaddle framework, by giving the outcome a constant CUDNN_CONVOLUTION_BWD_FILTER_ALGO_1 value and twice the memory earlier found with the cudnnGetConvolutionForwardAlgorithm.

I could request a merge in this repo, but not quite sure if the solution will work at all times, I decided to put it in our own GitHub repo first. If it turns out that it works fine, I will merge.

For now, please use this repo.

I have followed their tutorial and used their repo, but I'm having this issue

CXX src/caffe/layers/softmax_layer.cpp
src/caffe/layers/cudnn_conv_layer.cpp: In member function ‘virtual void caffe::CuDNNConvolutionLayer::Reshape(const std::vector<caffe::Blob>&, const std::vector<caffe::Blob>&)’:
src/caffe/layers/cudnn_conv_layer.cpp:300:1: error: a template declaration cannot appear at block scope
300 | template
| ^~~~~~~~
In file included from ./include/caffe/blob.hpp:8,
from ./include/caffe/layers/cudnn_conv_layer.hpp:6,
from src/caffe/layers/cudnn_conv_layer.cpp:5:
src/caffe/layers/cudnn_conv_layer.cpp:331:1: error: expected primary-expression before ‘template’
331 | INSTANTIATE_CLASS(CuDNNConvolutionLayer);
| ^~~~~~~~~~~~~~~~~
src/caffe/layers/cudnn_conv_layer.cpp:331:1: error: expected primary-expression before ‘template’
331 | INSTANTIATE_CLASS(CuDNNConvolutionLayer);
| ^~~~~~~~~~~~~~~~~
src/caffe/layers/cudnn_conv_layer.cpp: At global scope:
src/caffe/layers/cudnn_conv_layer.cpp:333:1: error: expected ‘}’ at end of input
333 | } // namespace caffe
| ^
src/caffe/layers/cudnn_conv_layer.cpp:7:17: note: to match this ‘{’
7 | namespace caffe {
| ^
make: *** [Makefile:586: .build_release/src/caffe/layers/cudnn_conv_layer.o] Error 1
make: *** Waiting for unfinished jobs....

Qengineering · 2020-11-13T10:09:12Z

First guess, you are missing a brace somewhere. The first error only occurs when a template declaration appears within a function. For instance, there is no closing brace before the declaration starts. The latest 'mistakes' point in the same direction. The expected brace is missing here. Best to download the repo again.

mgomez0 · 2020-12-02T23:11:54Z

Hi, @Qengineering, I am having the exact same issue as @astropiu. I followed your instructions and also cloned the latest version of your repo. It is very strange, I inspected cudnn_conv_layer.cpp myself, and the braces seem to be fine. I'm wondering if we should continue this discussion here, or perhaps open a new issue on your repo.

src/caffe/layers/cudnn_conv_layer.cpp: In member function ‘virtual void caffe::CuDNNConvolutionLayer<Dtype>::Reshape(const std::vector<caffe::Blob<Dtype>*>&, const std::vector<caffe::Blob<Dtype>*>&)’: src/caffe/layers/cudnn_conv_layer.cpp:300:1: error: a template declaration cannot appear at block scope 300 | template <typename Dtype> | ^~~~~~~~ In file included from ./include/caffe/blob.hpp:8, from ./include/caffe/layers/cudnn_conv_layer.hpp:6, from src/caffe/layers/cudnn_conv_layer.cpp:5: src/caffe/layers/cudnn_conv_layer.cpp:331:1: error: expected primary-expression before ‘template’ 331 | INSTANTIATE_CLASS(CuDNNConvolutionLayer); | ^~~~~~~~~~~~~~~~~ src/caffe/layers/cudnn_conv_layer.cpp:331:1: error: expected primary-expression before ‘template’ 331 | INSTANTIATE_CLASS(CuDNNConvolutionLayer); | ^~~~~~~~~~~~~~~~~ src/caffe/layers/cudnn_conv_layer.cpp: At global scope: src/caffe/layers/cudnn_conv_layer.cpp:333:1: error: expected ‘}’ at end of input 333 | } // namespace caffe | ^ src/caffe/layers/cudnn_conv_layer.cpp:7:17: note: to match this ‘{’ 7 | namespace caffe { | ^ make: *** [Makefile:586: .build_release/src/caffe/layers/cudnn_conv_layer.o] Error 1 make: *** Waiting for unfinished jobs....

Qengineering · 2020-12-03T09:09:03Z

@mgomez0 You are more than welcome on my repo. I will review the code now and get back to you asap.

Qengineering · 2020-12-03T10:47:20Z

Solved the problem.
In cudnn_conv_layer.cpp line 235:

} 
#endif

should be

#endif
 }

borisgribkov · 2021-10-06T13:49:56Z

@Qengineering Thanks for your caffe patch! I have applied it, but sometimes I observed strange behavior, for some models memory usage is about twice larger comparing to CUDA10-cudnn7 environment, has you observed something like this?

Qengineering · 2021-10-07T08:52:24Z

Indeed, in certain situations, the memory consumption is sustainably large than with cuDNN 7.
It all has to do with the removed cudnnGetConvolutionBackwardDataAlgorithm and cudnnGetConvolutionBackwardFilterAlgorithm in version 8.
These heuristic algorithms look for the best layout in memory and performance in CUDA memory. Since cuDNN version 8 no longer supports the routines, I had to generate some dummy output so that the following routines still work after the heuristic test. There are only four possible outcomes. I have selected the most common outcome (CUDNN_CONVOLUTION_BWD_FILTER_ALGO_1). At the same time, the required amount of memory is determined. For this purpose, I allocated twice the amount as determined in a previous (forward) determination. See line 169 in src/caffe/layers/cudnn_conv_layer.cpp. Twice, for safety reasons.

borisgribkov · 2021-10-07T10:45:25Z

I see, thank you!

borisgribkov · 2021-10-14T08:52:46Z

@Qengineering Thanks for your answer again! I agree with the backward pass. But as I see forward pass needs more memory too. I have tried a model with single conv layer and ( 20 * 3 * 1280 * 720 ) input, it's "head" of ResNet used for detection task. With cuda10 and cudnn7.6 I observed about 1.7Gb usage for a forward pass, for cuda 11 and cudnn8 ~ 2.6Gb. Maybe this comparison is not fully correct, because different GPUs were used, Titan XP in the first case and 3060 for the second.

Qengineering · 2021-10-14T09:13:52Z

Also in the forward pass, I had to make an educated guess about memory usage, as the cudnnGetConvolutionForwardAlgorithm is also missing in cuDNN 8. (see line 141 src/caffe/layers/cudnn_conv_layer.cpp)

Qengineering mentioned this issue Dec 3, 2020

Unable to build with CUDA-10.2 + CuDNN-8.0.1 #6982

Open

moncio mentioned this issue Feb 2, 2021

OOM v1.7.0 with any example (demo, example with multi GPU...) CMU-Perceptual-Computing-Lab/openpose#1864

Open

borisgribkov mentioned this issue Oct 14, 2021

Support of CuDNN8 #7000

Open

joshanderson-kw mentioned this issue Nov 10, 2021

Iqr web demo Kitware/SMQTK-IQR#10

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incompatible with current cudnn 8.0.3 ? #6970

Incompatible with current cudnn 8.0.3 ? #6970

peijason commented Sep 19, 2020

gembancud commented Sep 19, 2020

Qengineering commented Nov 3, 2020

astropiu commented Nov 13, 2020 •

edited

Loading

Qengineering commented Nov 13, 2020 •

edited

Loading

mgomez0 commented Dec 2, 2020

Qengineering commented Dec 3, 2020

Qengineering commented Dec 3, 2020

borisgribkov commented Oct 6, 2021

Qengineering commented Oct 7, 2021

borisgribkov commented Oct 7, 2021

borisgribkov commented Oct 14, 2021

Qengineering commented Oct 14, 2021

Incompatible with current cudnn 8.0.3 ? #6970

Incompatible with current cudnn 8.0.3 ? #6970

Comments

peijason commented Sep 19, 2020

System configuration

gembancud commented Sep 19, 2020

Qengineering commented Nov 3, 2020

astropiu commented Nov 13, 2020 • edited Loading

Qengineering commented Nov 13, 2020 • edited Loading

mgomez0 commented Dec 2, 2020

Qengineering commented Dec 3, 2020

Qengineering commented Dec 3, 2020

borisgribkov commented Oct 6, 2021

Qengineering commented Oct 7, 2021

borisgribkov commented Oct 7, 2021

borisgribkov commented Oct 14, 2021

Qengineering commented Oct 14, 2021

astropiu commented Nov 13, 2020 •

edited

Loading

Qengineering commented Nov 13, 2020 •

edited

Loading