Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shared col buffer between Convolution Layers #520

Closed
wants to merge 7 commits into from

Conversation

sguada
Copy link
Contributor

@sguada sguada commented Jun 19, 2014

This PR replaces #517.

Inspired by conversations with @forresti about using a shared_col_buffer between different ConvolutionLayers, this PR creates a shared_col_buffer in Net to be shared by the ConvolutionLayers

The size of the shared_col_buffer_ will be maximum of the sizes of the shared col_buffer.

Now by default, any ConvolutionLayer would use a shared col_buffer, but it can have its own col_buffer by setting the corresponding ConvolutionParam.share_col_buffer to false in the corresponding prototxt.

This PR further reduces the memory consumption of Caffe.

@sguada
Copy link
Contributor Author

sguada commented Jun 19, 2014

Although the memory reduction for inputs of 227x227 is small, for bigger inputs the savings are big.

@@ -102,6 +102,8 @@ class ConvolutionLayer : public Layer<Dtype> {
}
virtual inline int ExactNumBottomBlobs() const { return 1; }
virtual inline int ExactNumTopBlobs() const { return 1; }
virtual Blob<Dtype>* col_buffer() { return &col_buffer_; }
virtual bool share_col_buffer() { return share_col_buffer_; }

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

share_col_buffer_ looks too similar with shared_col_buffer_ and sounds like a verb rather than a noun. Isn't is_col_buffer_shared_ easier for the users to quickly figure out that it is a boolean variable than share_col_buffer_?

@sguada sguada mentioned this pull request Jun 21, 2014
@sguada
Copy link
Contributor Author

sguada commented Jul 2, 2014

@jeffdonahue although in the future we may want to handle buffer_blobs for all layers, let's merge this now and open a new PR for that.

@jeffdonahue
Copy link
Contributor

Can you give some details about what kind of memory savings you've seen with what kinds of architectures? I still think it's pretty messy to have the net aware of specific layer type implementations, and it doesn't really seem like it would be much more work to avoid that messiness. Couldn't we just add to the base Layer interface a vector<Blob<Dtype>*> intermediate_results_ (or some less crappy name) and then layers needing storage for intermediate results add to that? Then Net::Init can just check if that vector is non-empty after SetUp and create shared storage blobs if so.

@shelhamer
Copy link
Member

+1 for doing this nicely now then not having to look at it again by
something along the lines of @jeffdonahue's suggestion.

On Wed, Jul 2, 2014 at 11:25 AM, Jeff Donahue notifications@github.com
wrote:

Can you give some details about what kind of memory savings you've seen
with what kinds of architectures? I still think it's pretty messy to have
the net aware of specific layer type implementations, and it doesn't really
seem like it would be much more work to avoid that messiness. Couldn't we
just add to the base Layer interface a vector<Blob*>
intermediate_results_ (or some less crappy name) and then layers needing
storage for intermediate results add to that? Then Net::Init can just check
if that vector is non-empty and create shared storage blobs if so.


Reply to this email directly or view it on GitHub
#520 (comment).

@sguada
Copy link
Contributor Author

sguada commented Jul 2, 2014

For usual input size of 227x227 the savings are small 50MB but for bigger input sizes are big, for example for images of 1723x1723 the savings are 665MB.
This PR allows to feed 7 images of 3446x3446 on a K40.

I like the idea of having a more general way to handle temporary Blobs but either the layers became aware of the Net and request them to it, or they explicitly say so in the prototxt and then Net.Init initialize them after Layer.Setup or in Layer.FurtherSetup

One question is which intermediate_results_ or temporary_blobs could be shared across layers, for instance all the temporay_blobs at the same position in the vector could be shared?

Right now I don't have the time to do that change, but feel free to contribute.

@shelhamer
Copy link
Member

Although trimming the memory of the Caffe convolution and having a general mechanism for temporaries would be helpful, the cuDNN convolution reduces memory usage too. Note however the Caffe matrix multiplication approach can be faster in certain cases if it fits in memory all the same, so a shared buffer is still valuable.

@longjon
Copy link
Contributor

longjon commented Oct 14, 2014

In a post-#594 world, I do believe sharing the column buffer is much simpler:

diff --git a/include/caffe/vision_layers.hpp b/include/caffe/vision_layers.hpp
index f94ac07..8b4373c 100644
--- a/include/caffe/vision_layers.hpp
+++ b/include/caffe/vision_layers.hpp
@@ -61,7 +61,7 @@ class ConvolutionLayer : public Layer<Dtype> {
   bool bias_term_;
   // For the Caffe matrix multiplication convolution.
   int M_, K_, N_;
-  Blob<Dtype> col_buffer_;
+  static Blob<Dtype> col_buffer_;
   Blob<Dtype> bias_multiplier_;
 };

diff --git a/src/caffe/layers/conv_layer.cpp b/src/caffe/layers/conv_layer.cpp
index f4f2b6b..a6649ed 100644
--- a/src/caffe/layers/conv_layer.cpp
+++ b/src/caffe/layers/conv_layer.cpp
@@ -9,6 +9,9 @@
 namespace caffe {

 template <typename Dtype>
+Blob<Dtype> ConvolutionLayer<Dtype>::col_buffer_;
+
+template <typename Dtype>
 void ConvolutionLayer<Dtype>::LayerSetUp(const vector<Blob<Dtype>*>& bottom,
       const vector<Blob<Dtype>*>& top) {
   ConvolutionParameter conv_param = this->layer_param_.convolution_param();

No worries about computing the max size or coordinating through Net, but note that this may not play nicely with parallelization efforts.

@sguada
Copy link
Contributor Author

sguada commented Oct 16, 2014

Solved by #1291

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants