add ConvolutionDepthwise layer #5665

Open
wants to merge 9 commits into
from

Conversation

Projects
None yet

sp2823 commented Jun 2, 2017

https://arxiv.org/pdf/1704.04861v1.pdf
MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications

convolution depthwise layer
faster and less memory than the "convolution layer with group" (with CuDNN and without CuDNN)

sp2823 closed this Jun 2, 2017

sp2823 reopened this Jun 2, 2017

willyd referenced this pull request Jun 7, 2017

Open

depth wise convolution #5649

+ weight_multiplier_shape.push_back(top[0]->height());
+ weight_multiplier_shape.push_back(top[0]->width());
+ weight_multiplier_.Reshape(weight_multiplier_shape);
+ caffe_set(weight_multiplier_.count(), Dtype(1),
@fengziyong

fengziyong Jun 9, 2017

caffe_set just for cpu_data @sp2823

@sp2823

sp2823 Jun 9, 2017

We only need to set mutable_cpu_data or mutable_gpu_data once.
There is a similar implementation of batch_sum_multiplier_ in BatchNormLayer.
If it is necessary, we should use caffe_set in Forward_cpu and caffe_gpu_set in Forward_gpu.

@fengziyong

fengziyong Jun 9, 2017

I mean caffe_set is just for pointer of cpu_data, and set data to pointer of gpu_data would crash.

sp2823 abc 013377e

zj19921221 commented Jun 19, 2017 edited

请问,两个问题请教下:
1、您这个实现和caffe中用group实现有什么优点吗?
2、模糊理解了group不能够并行的,并不是很理解为什么加了for循环就不能并行?

NHZlX commented Jun 19, 2017

cpu下有待于优化

zjchuyp commented Jun 20, 2017

@sp2823
Is it faster by using forloop than using gemm in CPU mode?

Great to see this work - I hope it gets merged soon. The correct name for this should be "DepthwiseSeparable". Just "Depthwise" gives almost the opposite meaning.

sp2823 commented Jun 27, 2017

I didn't optimize the CPU mode because the Convolution layer with group is slow in GPU mode. You can use this code for training and use Convolution layer for prediction.

Could you share your .prototxt which show how to set parameters? or test_examples?

mathmanu commented Jul 5, 2017 edited

I have attached the files required to train the popular mobilenet model:

imagenet_mobilenet1.0_2017-07-04_10-44-00.zip

I added the following code in layer_factory.cpp, GetConvolutionLayer() so that this layer will be called whever its appropriate to use:
if(conv_param.num_output() == conv_param.group()) {
return shared_ptr< Layer< Dtype > >(new ConvolutionDepthwiseLayer(param));
}

There is a speedup when using the proposed ConvolutionDepthwise layer is used instead of Convolution layer. But it is not as much as I expected.

If fact if I just comment the group parameter in all convolution layers in both train.prototxt and test.prototxt, so that the 3x3 convolution becomes are traditional 3x3 convolution instead of DepthWise seperable, it becomes slightly faster! This was not what I was expecting.

Is there something that I am missing? Please try the files that I shared.

sp2823 commented Jul 9, 2017

You only need to edit the .prototxt file like this.
type: "Convolution"
type: "ConvolutionDepthwise"

@sp2823 How do I merge your implementation into my CAFFE? Just download the hpp/cpp is OK? Thanks :)

sp2823 commented Jul 10, 2017

download the .hpp/.cpp/.cu file and compile

Hi @sp2823 ,
I am new to caffe. I got an error saying like "class caffe::ConvolutionParameter’ has no member named ‘kernel_size_size’" in conv_dw_layer.cpp when I'm trying to compile.
Any idea of this error?

Hi, @sp2823
I have trained mobilenet using your code for 20 epoches, and the top1 about 52%, and about 76% for top5. Do you have any experiments results?
By the way, the code is very well. The speed for training is much faster than group convolution way. And I hope the results are well too.

@SophieZhou 你好,请问下在cpu下训练速度有变很快吗?性能达到什么程度。

birdwcp commented Jul 28, 2017

up

birdwcp commented Aug 1, 2017

you did not implement CuDNNConvolutionDepthWiseLayer. Isn't it necessary?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment