-
Notifications
You must be signed in to change notification settings - Fork 18.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cuDNN v3 can be much slower than v2 #3239
Comments
I've recently bumped into @jdemouth who appears to be one of the main devs of cuDNN, maybe he can tell us more. EDIT: I remember he told me there will be ways to make things faster, by using batches of 64 and by recoding some stuff... |
Hi, I'd be happy to help if I can. I need to know the GPU that you are issues with and I need to know the layer size. CuDNN v3 adds new algorithms and there's a heuristic to select the "best" given a convolution size. The more we know, the more we can tune the heuristic to get good perf. In general, a given algorithm is either as fast in v3 as it was in v2 or it is faster. No matter which GPU architecture you target. Of course, there may be counter examples but I'm not aware of them. Cheers, |
The GPU is K20. The convolutional layers are defined in Caffe proto format as shown in the snippet.
|
The problem has been solved using the library released in 2015/08/21 . And I have updated my repo. |
Running with the same model on the same dataset, the latest master with v3 took 25 minutes to finish 1000 iterations while v2 took 10 minutes. The kernels of the first three convolutional layers are of 9, 7 and 5 pixels. This problem was also reported in a fork of Caffe and many other sources.
Maybe cuDNN v3 only works well on the Maxwell architecture GPUs or just can't choose the most efficient convolution algorithm. The simplest solution is to revert to an older version of Caffe since v2 was no longer supported before NVIDIA fix the problem. But we still want to keep Caffe up to date to experiment with some new features such as batch normalization.
The second way is to specify the convolution algorithm explicitly which is not flexible enough for various model configurations.
The last resort is to keep backwards compatibility with cuDNN v2 in Caffe.
The text was updated successfully, but these errors were encountered: