[MXNET-491] Use depthwise convolution by cuDNNv7 if available, updated version #11076

nihui · 2018-05-28T08:09:48Z

this pull request is based on #10804
with the following further changes:

reduce ident changes
prefer cudnn depthwise convolution over mxnet implementation

still use the explicit #if #else #endif statement over
the new variable effective_num_group solution for backward code path compability
because the new variable effective_num_group may confuse readers with standard group convolution

some feedback about the speed

hardware: tesla-m40 24G x 2
system: centos-7
nvidia-387.26
cuda-9.1
cudnn-v7.1

model: mobilenet-v2
batchsize 256 (128 per gpu)

mxnet implementation: 68s/10iter
cudnnv7 implementation: 9.5s/10iter

piiswrong · 2018-05-29T18:31:15Z

I still think this is too much duplicated code.
We can add comments to explain what effective_num_group is.

piiswrong · 2018-05-29T18:32:12Z

Also please correct indentation.
All '#if/endif' statements should not be indented

piiswrong · 2018-05-29T18:35:31Z

Actually, I guess this is fine since we'll eventually remove all the logic for older versions of CUDNN.

piiswrong · 2018-05-29T18:39:22Z

@austingg I merged this before seeing your comment. Do you have any concerns?

austingg · 2018-05-30T02:41:52Z

@piiswrong need more speed benchmark on different architecture gpu and more accurate cudnn version macro like nvidia-caffe.

…d version (apache#11076) * Use group convolution by cuDNNv7 if available * Fix coding style * ident-- for #if statements * more ident-- * more ident-- * prefer cudnnv7 depthwise convolution

BiranLi · 2018-06-05T09:37:07Z

I have tested mobilenetv2 in V100.

hardware: tesla-v100 32G
system: centos-7.2
nvidia-396.26
cuda-9.2
cudnn-v7.1

model: mobilenet-v2(only forward)
batchsize 1
with tf depthwise conv 194 samples/s
with cudnn group conv 235 samples/s

austingg · 2018-06-05T09:55:02Z

@BiranLi do you do some more benchmark, like more batch size and take backward into consideration.

BiranLi · 2018-06-05T12:37:15Z

@austingg Yes, I have tested the same case with batchsize 128.

model: mobilenet-v2(only forward)
batchsize 128
with tf depthwise conv 2000 samples/s
with cudnn group conv 2335 samples/s

haojin2 · 2018-06-08T01:00:03Z

@BiranLi Can you share some more details on how you're doing this benchmark? Thanks!

haojin2 · 2018-06-11T22:54:21Z

Did some extra benchmarks and verified multi-precision training speed improvement on single V100 GPU with mobilenet + ImageNet dataset:
before:
INFO:root:Epoch[0] Batch [20] Speed: 95.60 samples/sec accuracy=0.013765
INFO:root:Epoch[0] Batch [40] Speed: 95.73 samples/sec accuracy=0.148047
INFO:root:Epoch[0] Batch [60] Speed: 95.73 samples/sec accuracy=0.865234
INFO:root:Epoch[0] Batch [80] Speed: 95.75 samples/sec accuracy=1.000000
INFO:root:Epoch[0] Batch [100] Speed: 95.72 samples/sec accuracy=1.000000
after:
INFO:root:Epoch[0] Batch [20] Speed: 1011.35 samples/sec accuracy=0.013765
INFO:root:Epoch[0] Batch [40] Speed: 1032.15 samples/sec accuracy=0.112109
INFO:root:Epoch[0] Batch [60] Speed: 1038.41 samples/sec accuracy=0.832812
INFO:root:Epoch[0] Batch [80] Speed: 1034.26 samples/sec accuracy=1.000000
INFO:root:Epoch[0] Batch [100] Speed: 1032.14 samples/sec accuracy=1.000000
@anirudh2290

…d version (apache#11076) * Use group convolution by cuDNNv7 if available * Fix coding style * ident-- for #if statements * more ident-- * more ident-- * prefer cudnnv7 depthwise convolution

…d version (#11076) (#11233) * Use group convolution by cuDNNv7 if available * Fix coding style * ident-- for #if statements * more ident-- * more ident-- * prefer cudnnv7 depthwise convolution

…d version (apache#11076) * Use group convolution by cuDNNv7 if available * Fix coding style * ident-- for #if statements * more ident-- * more ident-- * prefer cudnnv7 depthwise convolution

shesung · 2018-08-28T12:46:01Z

I observed barely no improvement when using mxnet 1.2.1 + cuda8 + cudnn 7.2.1 on 1080ti
When setting MXNET_CUDNN_AUTOTUNE_DEFAULT=0, performance drop rapidly.

kice and others added 6 commits May 28, 2018 15:21

Use group convolution by cuDNNv7 if available

9ea4c85

Fix coding style

f1195af

ident-- for #if statements

9657d46

more ident--

20aa290

more ident--

492dc86

prefer cudnnv7 depthwise convolution

11ff611

nihui changed the title ~~Use depthwise convolution by cuDNNv7 if available, updated version~~ [MXNET-491] Use depthwise convolution by cuDNNv7 if available, updated version May 28, 2018

piiswrong merged commit 805a71a into apache:master May 29, 2018

piiswrong mentioned this pull request May 29, 2018

Use depthwise convolution(group convolution) by cuDNNv7 if available #10804

Closed

anirudh2290 mentioned this pull request Jun 11, 2018

[MXNET-491] Use depthwise convolution by cuDNNv7 if available, updated version #11076 #11233

Merged

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MXNET-491] Use depthwise convolution by cuDNNv7 if available, updated version #11076

[MXNET-491] Use depthwise convolution by cuDNNv7 if available, updated version #11076

nihui commented May 28, 2018

piiswrong commented May 29, 2018

piiswrong commented May 29, 2018

piiswrong commented May 29, 2018

piiswrong commented May 29, 2018

austingg commented May 30, 2018

BiranLi commented Jun 5, 2018

austingg commented Jun 5, 2018

BiranLi commented Jun 5, 2018 •

edited

Loading

haojin2 commented Jun 8, 2018

haojin2 commented Jun 11, 2018

shesung commented Aug 28, 2018 •

edited

Loading

[MXNET-491] Use depthwise convolution by cuDNNv7 if available, updated version #11076

[MXNET-491] Use depthwise convolution by cuDNNv7 if available, updated version #11076

Conversation

nihui commented May 28, 2018

piiswrong commented May 29, 2018

piiswrong commented May 29, 2018

piiswrong commented May 29, 2018

piiswrong commented May 29, 2018

austingg commented May 30, 2018

BiranLi commented Jun 5, 2018

austingg commented Jun 5, 2018

BiranLi commented Jun 5, 2018 • edited Loading

haojin2 commented Jun 8, 2018

haojin2 commented Jun 11, 2018

shesung commented Aug 28, 2018 • edited Loading

BiranLi commented Jun 5, 2018 •

edited

Loading

shesung commented Aug 28, 2018 •

edited

Loading