Shared col buffer between Convolution Layers #520

sguada · 2014-06-19T16:44:27Z

This PR replaces #517.

Inspired by conversations with @forresti about using a shared_col_buffer between different ConvolutionLayers, this PR creates a shared_col_buffer in Net to be shared by the ConvolutionLayers

The size of the shared_col_buffer_ will be maximum of the sizes of the shared col_buffer.

Now by default, any ConvolutionLayer would use a shared col_buffer, but it can have its own col_buffer by setting the corresponding ConvolutionParam.share_col_buffer to false in the corresponding prototxt.

This PR further reduces the memory consumption of Caffe.

Conflicts: include/caffe/vision_layers.hpp src/caffe/layers/conv_layer.cpp src/caffe/net.cpp

Conflicts: src/caffe/layers/conv_layer.cpp

sguada · 2014-06-19T23:00:43Z

Although the memory reduction for inputs of 227x227 is small, for bigger inputs the savings are big.

kloudkl · 2014-06-20T09:45:11Z

include/caffe/vision_layers.hpp

@@ -102,6 +102,8 @@ class ConvolutionLayer : public Layer<Dtype> {
  }
  virtual inline int ExactNumBottomBlobs() const { return 1; }
  virtual inline int ExactNumTopBlobs() const { return 1; }
+  virtual Blob<Dtype>* col_buffer() { return &col_buffer_; }
+  virtual bool share_col_buffer() { return share_col_buffer_; }



share_col_buffer_ looks too similar with shared_col_buffer_ and sounds like a verb rather than a noun. Isn't is_col_buffer_shared_ easier for the users to quickly figure out that it is a boolean variable than share_col_buffer_?

sguada · 2014-07-02T18:12:41Z

@jeffdonahue although in the future we may want to handle buffer_blobs for all layers, let's merge this now and open a new PR for that.

jeffdonahue · 2014-07-02T18:25:22Z

Can you give some details about what kind of memory savings you've seen with what kinds of architectures? I still think it's pretty messy to have the net aware of specific layer type implementations, and it doesn't really seem like it would be much more work to avoid that messiness. Couldn't we just add to the base Layer interface a vector<Blob<Dtype>*> intermediate_results_ (or some less crappy name) and then layers needing storage for intermediate results add to that? Then Net::Init can just check if that vector is non-empty after SetUp and create shared storage blobs if so.

shelhamer · 2014-07-02T18:30:01Z

+1 for doing this nicely now then not having to look at it again by
something along the lines of @jeffdonahue's suggestion.

On Wed, Jul 2, 2014 at 11:25 AM, Jeff Donahue notifications@github.com
wrote:

Can you give some details about what kind of memory savings you've seen
with what kinds of architectures? I still think it's pretty messy to have
the net aware of specific layer type implementations, and it doesn't really
seem like it would be much more work to avoid that messiness. Couldn't we
just add to the base Layer interface a vector<Blob*>
intermediate_results_ (or some less crappy name) and then layers needing
storage for intermediate results add to that? Then Net::Init can just check
if that vector is non-empty and create shared storage blobs if so.

—
Reply to this email directly or view it on GitHub
#520 (comment).

sguada · 2014-07-02T19:03:55Z

For usual input size of 227x227 the savings are small 50MB but for bigger input sizes are big, for example for images of 1723x1723 the savings are 665MB.
This PR allows to feed 7 images of 3446x3446 on a K40.

I like the idea of having a more general way to handle temporary Blobs but either the layers became aware of the Net and request them to it, or they explicitly say so in the prototxt and then Net.Init initialize them after Layer.Setup or in Layer.FurtherSetup

One question is which intermediate_results_ or temporary_blobs could be shared across layers, for instance all the temporay_blobs at the same position in the vector could be shared?

Right now I don't have the time to do that change, but feel free to contribute.

shelhamer · 2014-09-18T07:33:46Z

Although trimming the memory of the Caffe convolution and having a general mechanism for temporaries would be helpful, the cuDNN convolution reduces memory usage too. Note however the Caffe matrix multiplication approach can be faster in certain cases if it fits in memory all the same, so a shared buffer is still valuable.

longjon · 2014-10-14T07:34:04Z

In a post-#594 world, I do believe sharing the column buffer is much simpler:

diff --git a/include/caffe/vision_layers.hpp b/include/caffe/vision_layers.hpp
index f94ac07..8b4373c 100644
--- a/include/caffe/vision_layers.hpp
+++ b/include/caffe/vision_layers.hpp
@@ -61,7 +61,7 @@ class ConvolutionLayer : public Layer<Dtype> {
   bool bias_term_;
   // For the Caffe matrix multiplication convolution.
   int M_, K_, N_;
-  Blob<Dtype> col_buffer_;
+  static Blob<Dtype> col_buffer_;
   Blob<Dtype> bias_multiplier_;
 };

diff --git a/src/caffe/layers/conv_layer.cpp b/src/caffe/layers/conv_layer.cpp
index f4f2b6b..a6649ed 100644
--- a/src/caffe/layers/conv_layer.cpp
+++ b/src/caffe/layers/conv_layer.cpp
@@ -9,6 +9,9 @@
 namespace caffe {

 template <typename Dtype>
+Blob<Dtype> ConvolutionLayer<Dtype>::col_buffer_;
+
+template <typename Dtype>
 void ConvolutionLayer<Dtype>::LayerSetUp(const vector<Blob<Dtype>*>& bottom,
       const vector<Blob<Dtype>*>& top) {
   ConvolutionParameter conv_param = this->layer_param_.convolution_param();

No worries about computing the max size or coordinating through Net, but note that this may not play nicely with parallelization efforts.

sguada · 2014-10-16T01:56:11Z

Solved by #1291

sguada added 6 commits June 19, 2014 08:42

Added param to ConvolutionLayer to enable/disable shared_col_buffer

09ccc71

Conflicts: include/caffe/vision_layers.hpp src/caffe/layers/conv_layer.cpp src/caffe/net.cpp

Fix test_convolution_layer.cpp to set shared_col_buffer to false

9ce398f

Make Net use shared_col_buffer

7dc3fdd

Conflicts: src/caffe/layers/conv_layer.cpp

Rename param to share_col_buffer and missing include

a5c101d

Rename var shared_col_buffer_

114870c

Rename param share_col_buffer_ in test_convolution_layer.cpp

b829d35

sguada mentioned this pull request Jun 19, 2014

Virtual blob and shared_col_buffer [Don't Merge Yet] #517

Closed

kloudkl reviewed Jun 20, 2014
View reviewed changes

Change is_col_buffer_shared

c89ee6d

sguada mentioned this pull request Jun 21, 2014

Large GPU RAM consumption #463

Closed

hosang mentioned this pull request Aug 4, 2014

Convolutions with cudaconv2 to reduce memory consumption #852

Closed

shelhamer force-pushed the dev branch 3 times, most recently from 4278286 to c01f07a Compare August 28, 2014 07:00

shelhamer force-pushed the dev branch from 64258b6 to 403b56b Compare September 19, 2014 04:38

shelhamer force-pushed the dev branch from d8eb4df to 914da95 Compare October 8, 2014 16:36

shelhamer mentioned this pull request Oct 16, 2014

Share convolution buffers to reduce memory usage #1291

Closed

sguada closed this Oct 16, 2014

shelhamer mentioned this pull request Mar 3, 2015

Share convolution buffers to reduce memory usage #2016

Open

naibaf7 mentioned this pull request Jun 26, 2015

OpenCL Backend #2195

Closed

tharun2011 mentioned this pull request Mar 9, 2016

Compatibility with AWS / nvidia Grid GPU torrvision/crfasrnn#4

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Shared col buffer between Convolution Layers #520

Shared col buffer between Convolution Layers #520

sguada commented Jun 19, 2014

sguada commented Jun 19, 2014

kloudkl Jun 20, 2014

sguada commented Jul 2, 2014

jeffdonahue commented Jul 2, 2014

shelhamer commented Jul 2, 2014

sguada commented Jul 2, 2014

shelhamer commented Sep 18, 2014

longjon commented Oct 14, 2014

sguada commented Oct 16, 2014

Shared col buffer between Convolution Layers #520

Shared col buffer between Convolution Layers #520

Conversation

sguada commented Jun 19, 2014

sguada commented Jun 19, 2014

kloudkl Jun 20, 2014

Choose a reason for hiding this comment

sguada commented Jul 2, 2014

jeffdonahue commented Jul 2, 2014

shelhamer commented Jul 2, 2014

sguada commented Jul 2, 2014

shelhamer commented Sep 18, 2014

longjon commented Oct 14, 2014

sguada commented Oct 16, 2014