Join GitHub today
GitHub is home to over 20 million developers working together to host and review code, manage projects, and build software together.
add ConvolutionDepthwise layer #5665
Conversation
added some commits
Jun 2, 2017
sp2823
closed this
Jun 2, 2017
sp2823
reopened this
Jun 2, 2017
| + weight_multiplier_shape.push_back(top[0]->height()); | ||
| + weight_multiplier_shape.push_back(top[0]->width()); | ||
| + weight_multiplier_.Reshape(weight_multiplier_shape); | ||
| + caffe_set(weight_multiplier_.count(), Dtype(1), |
sp2823
Jun 9, 2017
We only need to set mutable_cpu_data or mutable_gpu_data once.
There is a similar implementation of batch_sum_multiplier_ in BatchNormLayer.
If it is necessary, we should use caffe_set in Forward_cpu and caffe_gpu_set in Forward_gpu.
fengziyong
Jun 9, 2017
I mean caffe_set is just for pointer of cpu_data, and set data to pointer of gpu_data would crash.
zj19921221
commented
Jun 19, 2017
•
|
请问,两个问题请教下: |
NHZlX
commented
Jun 19, 2017
|
cpu下有待于优化 |
zjchuyp
commented
Jun 20, 2017
|
@sp2823 |
mathmanu
commented
Jun 20, 2017
|
Great to see this work - I hope it gets merged soon. The correct name for this should be "DepthwiseSeparable". Just "Depthwise" gives almost the opposite meaning. |
sp2823
commented
Jun 27, 2017
|
I didn't optimize the CPU mode because the Convolution layer with group is slow in GPU mode. You can use this code for training and use Convolution layer for prediction. |
youngwanLEE
commented
Jul 5, 2017
|
Could you share your .prototxt which show how to set parameters? or test_examples? |
mathmanu
commented
Jul 5, 2017
•
|
I have attached the files required to train the popular mobilenet model: imagenet_mobilenet1.0_2017-07-04_10-44-00.zip I added the following code in layer_factory.cpp, GetConvolutionLayer() so that this layer will be called whever its appropriate to use: There is a speedup when using the proposed ConvolutionDepthwise layer is used instead of Convolution layer. But it is not as much as I expected. If fact if I just comment the group parameter in all convolution layers in both train.prototxt and test.prototxt, so that the 3x3 convolution becomes are traditional 3x3 convolution instead of DepthWise seperable, it becomes slightly faster! This was not what I was expecting. Is there something that I am missing? Please try the files that I shared. |
sp2823
commented
Jul 9, 2017
|
You only need to edit the .prototxt file like this. |
ryusaeba
commented
Jul 10, 2017
|
@sp2823 How do I merge your implementation into my CAFFE? Just download the hpp/cpp is OK? Thanks :) |
sp2823
commented
Jul 10, 2017
|
download the .hpp/.cpp/.cu file and compile |
leolee96
commented
Jul 18, 2017
|
Hi @sp2823 , |
SophieZhou
commented
Jul 21, 2017
|
Hi, @sp2823 |
zj19921221
commented
Jul 25, 2017
|
@SophieZhou 你好,请问下在cpu下训练速度有变很快吗?性能达到什么程度。 |
birdwcp
commented
Jul 28, 2017
|
up |
birdwcp
commented
Aug 1, 2017
|
you did not implement CuDNNConvolutionDepthWiseLayer. Isn't it necessary? |
sp2823 commentedJun 2, 2017
https://arxiv.org/pdf/1704.04861v1.pdf
MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications
convolution depthwise layer
faster and less memory than the "convolution layer with group" (with CuDNN and without CuDNN)