-
Notifications
You must be signed in to change notification settings - Fork 263
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cuDNN based implementation for Deconv (former: doing group in one call is needed) #386
Comments
I have try to use cuDNNv7 in original caffe, but I found it was slower than the original implementation in group mode. |
@fengziyong could you paste v6 and v7 logs here? |
This is the log of caffe time with cuDNNv7 I0808 10:10:09.390646 30966 caffe.cpp:406] Average time per layer: ====================================================================== I0808 10:30:20.463733 15472 caffe.cpp:406] Average time per layer: I run caffe on GTX1080, and my caffe is based on https://github.com/BVLC/caffe |
Hi @fengziyong , I can't help with BVLC Caffe, sorry. Have you tried NVCaffe with v7? How are the timings? |
OK. I will try later. |
NVCaffe should have launched the conv of a group through for_loop. But it might dispatch all the call to one CUDA stream. @drnikolaev could you provide a quick fix for that. |
@ChenFengAndy I'm working on new grouping. Current implementation runs for_loop, correct (inherited from BVLC) but it does use different streams: How do I reproduce accuracy issue? Could you provide some minimalist example please? |
It will be quite helpful if the cases with large number of groups can be accelerated. Currently its too slow. I haven't completed the training, so I don't know if there is any accuracy issue. Here is an example of mobilenet1.0: Since it takes too much time to train, I started the training from the pre-trained model given at: But you can change the scripts to not use the pre-trained model. (But increase the max_iter and base_lr to about 10 times of what I used in that case). |
Hi @mathmanu yes, it's my current project. And thanks for the model example! |
So far I can not make sure it's a bug of NVCaffe or issue caused by float-point behaviour difference introduced by NVCaffe optimization. |
@ChenFengAndy to run non-cuDNN implementation please set |
@drnikolaev Sergi, it's not a cuDNN issue. BVLC.Caffe use cuDNN and the performance is still useable. The using of engine:Caffe will lost 10x perf. |
Got it, thank you. Still working on this. Looks promising so far. |
Hi Sergi, it might be an issue introduced start from 0.16. I tested 0.15.13+cuDNNv5 or cuDNNv6 and both fine. But start from the 0.16.1, the training loss issue exists. Shall I open a sperate issue and mark it as a bug? The learning rate is 0.01. |
Hi Andy, please open a bug, thank you. Please also attach a model to reproduce this. And what particular dataset you use. |
@drnikolaev I have an interesting observation. If I add a few Crop layers to the mobilenet model, it becomes quite slow - almost 4x slower. I have a multi GPU setup. Is this happening because Crop layers are not CUDNN accelerated and there are overheads in data movement between GPUs? I can't think of any other reason. |
@mathmanu @drnikolaev , I also confirm the problem with the crop layers. that is 1.5x slower... |
Setting that parameter for Crop layer had no impact on speed - its still slow. |
@mathmanu actually yes, this setting has nothing to do with CropLayer. Could you please open a request for this? Thank you! |
@CFAndy Supported since 0.16.4. |
@drnikolaev looks like the CUDNN_GROUPING is only supported from cudnn7.0.2 on wards: in include/caffe/layers/cudnn_conv_layer.hpp: However I can see only cudnn version 7.0.1 in the cudnn website. Is 7.0.2 going to be released soon? |
Correct. Also, please note this: |
Waiting for CUDNN 7.0.2 to be available to try this out. |
@drnikolaev Is it possible to integrate this CUDNN_GROUPING mode for deconvolution as well? Doing channel wise separate deconvolution is a useful feature - typically used for upsampling. |
@drnikolaev Looking through the code, I found that CUDNN layer itself is missing for deconvolution. That might be an opportunity for enhancement. |
upvote for deconv layer!! |
fyi |
Agreed. Reopened for visibility. |
@CFAndy Hi, Andy, I want to use center_loss in nvcaffe, but the version 0.16 doesn't have this layer, could you please tell me haw to add center_loss layer in nvcaffe, or share the version you use? |
v0.17.0 |
cuDNN supports Grouped Convolutions start from cuDNNv7.
Please enable this feature to boost the training of mobile net.
The text was updated successfully, but these errors were encountered: