Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

depth wise convolution #5649

Open
zjchuyp opened this issue May 26, 2017 · 16 comments
Open

depth wise convolution #5649

zjchuyp opened this issue May 26, 2017 · 16 comments

Comments

@zjchuyp
Copy link

zjchuyp commented May 26, 2017

Caffe training depth wise convolution is very slow. Is there has plan to reimplement the depth wise convolution?

@lolongcovas
Copy link

lolongcovas commented May 26, 2017 via email

@zjchuyp
Copy link
Author

zjchuyp commented May 26, 2017

Yes, depth wise convolution is in paper "MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications" https://arxiv.org/abs/1704.04861
Caffe can train this net by set group number ==input channel number, but train speed is very slow because Caffe use "for" to do group number times im2col+sgemm. TF has new implement to depth wise conv.

@lolongcovas
Copy link

I also tried it several weeks ago, you are right, low speed and high memory consuming.

@ccJia
Copy link

ccJia commented Jun 7, 2017

I met this problem either. I saw the TF function called "DepthwiseConv2DKernel" , I didn't find any difference except TF uses EIGEN. Do you solve this problem?

@willyd
Copy link
Contributor

willyd commented Jun 7, 2017

You may be interested in this #5665

@lolongcovas
Copy link

@zjchuyp

Yes, depth wise convolution is in paper "MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications" https://arxiv.org/abs/1704.04861
Caffe can train this net by set group number ==input channel number, but train speed is very slow because Caffe use "for" to do group number times im2col+sgemm. TF has new implement to depth wise conv.

I think Caffe doesnt perform im2col #group times:

void BaseConvolutionLayer<Dtype>::forward_cpu_gemm(const Dtype* input,
    const Dtype* weights, Dtype* output, bool skip_im2col) {
  const Dtype* col_buff = input;
  if (!is_1x1_) {
    if (!skip_im2col) {
      conv_im2col_cpu(input, col_buffer_.mutable_cpu_data());
    }
    col_buff = col_buffer_.cpu_data();
  }
  for (int g = 0; g < group_; ++g) {
    caffe_cpu_gemm<Dtype>(CblasNoTrans, CblasNoTrans, conv_out_channels_ /
			  group_, conv_out_spatial_dim_, kernel_dim_,
			  (Dtype)1., weights + weight_offset_ * g, col_buff + col_offset_ * g,
			  (Dtype)0., output + output_offset_ * g);
  }
}

@zjchuyp
Copy link
Author

zjchuyp commented Jun 13, 2017

@lolongcovas
You are right! thx.

@zjchuyp
Copy link
Author

zjchuyp commented Jun 13, 2017

@willyd
thanks a lot, I'll try it.

@mathmanu
Copy link

@lolongcovas, @willyd
Can you please share your commit/code if you have for this? Thanks.

@ccJia
Copy link

ccJia commented Jun 21, 2017

@zjchuyp
Hi, TF also use a convert to combine the continue memory for gemm. And depend on it's data structure ( traveled by the channel ), it has more continued memory which can use SIMD to get a high speed . And it also has a process to combine the data like "Im2col" in Caffe. So why use this way , it can faster several times than caffe?

@winggan
Copy link

winggan commented Jul 24, 2017

It is still slow using cudnn implementation? According to the code, cudnn convolution calls w.r.t to all groups are all asynchronous at different cuda streams and will be synchronized at the end of forward/backward. Therefore GPU should be make use of as much as possible.

@birdwcp
Copy link

birdwcp commented Jul 28, 2017

@gzygzy9211
I turn off cudnn or it will crash (Check failed: status == CUDNN_STATUS_SUCCESS).

@birdwcp
Copy link

birdwcp commented Jul 28, 2017

@willyd
thanks a lot

@winggan
Copy link

winggan commented Jul 28, 2017

@birdwcp I think you should digging into it to find the reason

@alialbawi
Copy link

hi
how are u all
pls i am looking about conv layer with out im2col
i want it to take input from im2col output

@mprat
Copy link
Contributor

mprat commented Oct 2, 2018

To get faster depthwise convolutions there is a separate gemm call that needs to be implemented. As far as I know, no one against this version of Caffe has submitted a PR to do so.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants