depth wise convolution #5649

zjchuyp · 2017-05-26T07:40:40Z

Caffe training depth wise convolution is very slow. Is there has plan to reimplement the depth wise convolution?

lolongcovas · 2017-05-26T07:42:24Z

Do you mean the parameter group in conv layer ? On 26 May 2017 9:40 a.m., "zjchuyp" <notifications@github.com> wrote: Caffe train depth wise convolution is very slow. Is there have plan to reimplement the depth wise convolution? — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#5649>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ADfjjE3ju8Qv8kSVwdXbe2MygHJWK0vDks5r9oIIgaJpZM4NnRlG> .

zjchuyp · 2017-05-26T09:01:21Z

Yes, depth wise convolution is in paper "MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications" https://arxiv.org/abs/1704.04861
Caffe can train this net by set group number ==input channel number, but train speed is very slow because Caffe use "for" to do group number times im2col+sgemm. TF has new implement to depth wise conv.

lolongcovas · 2017-05-26T09:16:17Z

I also tried it several weeks ago, you are right, low speed and high memory consuming.

ccJia · 2017-06-07T07:10:12Z

I met this problem either. I saw the TF function called "DepthwiseConv2DKernel" , I didn't find any difference except TF uses EIGEN. Do you solve this problem?

willyd · 2017-06-07T12:22:28Z

You may be interested in this #5665

lolongcovas · 2017-06-10T09:31:54Z

@zjchuyp

Yes, depth wise convolution is in paper "MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications" https://arxiv.org/abs/1704.04861
Caffe can train this net by set group number ==input channel number, but train speed is very slow because Caffe use "for" to do group number times im2col+sgemm. TF has new implement to depth wise conv.

I think Caffe doesnt perform im2col #group times:

void BaseConvolutionLayer<Dtype>::forward_cpu_gemm(const Dtype* input,
    const Dtype* weights, Dtype* output, bool skip_im2col) {
  const Dtype* col_buff = input;
  if (!is_1x1_) {
    if (!skip_im2col) {
      conv_im2col_cpu(input, col_buffer_.mutable_cpu_data());
    }
    col_buff = col_buffer_.cpu_data();
  }
  for (int g = 0; g < group_; ++g) {
    caffe_cpu_gemm<Dtype>(CblasNoTrans, CblasNoTrans, conv_out_channels_ /
			  group_, conv_out_spatial_dim_, kernel_dim_,
			  (Dtype)1., weights + weight_offset_ * g, col_buff + col_offset_ * g,
			  (Dtype)0., output + output_offset_ * g);
  }
}

zjchuyp · 2017-06-13T00:56:51Z

@lolongcovas
You are right! thx.

zjchuyp · 2017-06-13T01:57:29Z

@willyd
thanks a lot, I'll try it.

mathmanu · 2017-06-20T10:45:01Z

@lolongcovas, @willyd
Can you please share your commit/code if you have for this? Thanks.

ccJia · 2017-06-21T09:01:25Z

@zjchuyp
Hi, TF also use a convert to combine the continue memory for gemm. And depend on it's data structure ( traveled by the channel ), it has more continued memory which can use SIMD to get a high speed . And it also has a process to combine the data like "Im2col" in Caffe. So why use this way , it can faster several times than caffe?

winggan · 2017-07-24T13:58:30Z

It is still slow using cudnn implementation? According to the code, cudnn convolution calls w.r.t to all groups are all asynchronous at different cuda streams and will be synchronized at the end of forward/backward. Therefore GPU should be make use of as much as possible.

birdwcp · 2017-07-28T08:09:13Z

@gzygzy9211
I turn off cudnn or it will crash (Check failed: status == CUDNN_STATUS_SUCCESS).

birdwcp · 2017-07-28T08:36:09Z

@willyd
thanks a lot

winggan · 2017-07-28T12:13:32Z

@birdwcp I think you should digging into it to find the reason

alialbawi · 2017-09-25T13:28:26Z

hi
how are u all
pls i am looking about conv layer with out im2col
i want it to take input from im2col output

mprat · 2018-10-02T22:16:46Z

To get faster depthwise convolutions there is a separate gemm call that needs to be implemented. As far as I know, no one against this version of Caffe has submitted a PR to do so.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

depth wise convolution #5649

depth wise convolution #5649

zjchuyp commented May 26, 2017 •

edited

Loading

lolongcovas commented May 26, 2017 via email

zjchuyp commented May 26, 2017 •

edited

Loading

lolongcovas commented May 26, 2017

ccJia commented Jun 7, 2017

willyd commented Jun 7, 2017

lolongcovas commented Jun 10, 2017

zjchuyp commented Jun 13, 2017

zjchuyp commented Jun 13, 2017

mathmanu commented Jun 20, 2017

ccJia commented Jun 21, 2017

winggan commented Jul 24, 2017

birdwcp commented Jul 28, 2017

birdwcp commented Jul 28, 2017

winggan commented Jul 28, 2017

alialbawi commented Sep 25, 2017

mprat commented Oct 2, 2018

depth wise convolution #5649

depth wise convolution #5649

Comments

zjchuyp commented May 26, 2017 • edited Loading

lolongcovas commented May 26, 2017 via email

zjchuyp commented May 26, 2017 • edited Loading

lolongcovas commented May 26, 2017

ccJia commented Jun 7, 2017

willyd commented Jun 7, 2017

lolongcovas commented Jun 10, 2017

zjchuyp commented Jun 13, 2017

zjchuyp commented Jun 13, 2017

mathmanu commented Jun 20, 2017

ccJia commented Jun 21, 2017

winggan commented Jul 24, 2017

birdwcp commented Jul 28, 2017

birdwcp commented Jul 28, 2017

winggan commented Jul 28, 2017

alialbawi commented Sep 25, 2017

mprat commented Oct 2, 2018

zjchuyp commented May 26, 2017 •

edited

Loading

zjchuyp commented May 26, 2017 •

edited

Loading