Should pooling regions be identical to convolution regions? #1318

longjon · 2014-10-18T01:14:20Z

Pooling and convolution both aggregate inputs over rectangular regions defined by kernel size, stride, and padding parameters. (In fact, average pooling is convolution with a fixed filter, and max pooling is a special case of "tropical convolution".)

However, Caffe currently uses different sets of regions (and consequently, produces different output sizes) for pooling and convolution. Convolution regions are never allowed outside of the padded input, while pooling is performed when a strided pooling region extends off the ends of the input.

This inconsistency causes some annoyance when the exact sizes of things need to be computed and can vary (due to to #594). (The issue doesn't normally come up when using networks like AlexNet that ensure that their pooling regions don't encounter this edge case.)

Should the behavior be made consistent, or is there a good reason for its current state?

(See also #988, but note that the expression presented there is different from both the current behaviors.)

ronghanghu · 2014-10-18T01:20:53Z

If I understand it correctly, currently for pooling it is top_size = ceil((bottom_size + 2*pad - kernel_size) / stride) + 1 and for conv it is top_size = floor((bottom_size + 2*pad - kernel_size) / stride) + 1

I also have this issue with my fully-convolutional model. I think it should be made consistent.

jeffdonahue · 2014-10-18T01:55:19Z

Yeah, I've run into problems with this as well -- there should be a RectangularTileParameter or some such that both ConvolutionParameter and PoolingParameter understand.

while pooling is performed when a strided pooling region extends off the ends of the input.

Hmm, I don't think I understand; are you saying the behavior of PoolingLayer::Reshape (https://github.com/BVLC/caffe/blob/dev/src/caffe/layers/pooling_layer.cpp#L81) is different for the case of stride_h_ == 1 vs stride_h_ > 1 (and/or for stride_w_)? If so that's not obvious to me from the code, but maybe it's more subtle than an if (stride > 1) kind of statement.

shelhamer · 2014-10-18T01:58:03Z

I vote yes on standardizing the regions for convolution and pooling. This was raised earlier in #473 (comment) and although @jeffdonahue suggested the "cover" mode be kept for compatibility at the time I'd like to hear if he's still into it.

shelhamer · 2014-10-18T01:59:55Z

Above all, I raise my glass to tropical pools.

longjon · 2014-10-18T02:02:01Z

@jeffdonahue, all I meant was that in the stride 1 case, the behavior of pooling and convolution is the same; the pooling and convolution regions will always cover the entire input without padding.

longjon · 2014-10-18T02:27:31Z

@shelhamer and I have discussed, and concluded that it's reasonable to change the behavior of pooling to match that of convolution. In particular:

in the unpadded case, allowing pooling regions to extend off the end of the input may change the statistics of the input in undesirable ways;
padding of the bottom-most input can always ensure that any clipped input cells do not cover real data, if needed; and
padding can be specified explicitly to compute all of the inputs available in the "cover" mode.

If a reason for keeping the "cover" mode still exists, comment here. Otherwise I'll implement the change eventually... others are welcome to implement it soon if desired.

jeffdonahue · 2014-10-18T03:39:17Z

I don't particularly care about having the "cover" mode in terms of wanting to use it for future training, but I'd think it would cause backwards compatibility issues to change the default pooling behavior -- e.g. if you have a saved net with a pooling layer and its output is fed into an inner product layer, and now the pooling layer has a different output dimension, the inner product layer's weights will have a different input dimension and the weights would thus be incompatible. And even if you go fully convolutional it might cause undesired effects on the sampling behavior if the dimensions were carefully planned (e.g. to have exactly 1x1 outputs from some pooling layer)?

Maybe we should keep the current behavior for the current set of proto parameters specifying the tiling behavior and deprecate them. Then add a new common proto message with the new behavior used by convolution and pooling.

longjon · 2014-10-18T10:58:10Z

@jeffdonahue, yeah, that's a good point. I doubt many (any?) are relying on edge-pooling in networks that include inner product layers, but it's desirable to have a common message anyway. So your suggestion makes sense as a nice way to avoid some potential backwards compatibility issues, while getting the new behavior and a common message that can simplify code at the same time.

futurely · 2014-10-21T17:54:16Z

The padding stuff is also one of the trickiest parts of #560. It is mixing the padding logic with the others that caused so many confusions. Padding is more likely a member of the data transformation family, rather than an inherent part of any layer.

The padding of MATLAB conv2 has three modes: full, same and valid. All of them have some use cases. When an independent class is created to manage the padding, it is not very hard to support any mode.

futurely · 2014-10-21T18:08:45Z

As a side note, the sizes compatibility validations of the trained models and the network definitions are mixed with many other things in the layer setup methods currently. It would be much easier to reason and maintain by separating them out.

longjon · 2014-10-21T21:59:41Z

@futurely, it might be nice to factor out padding (thus allowing it to be used by itself, even when not followed by a conv or pooling layer), but I don't see a way to do that with a non-negligible cost. The data transformation stuff works because the data is first read into host memory, and only the transformed version goes to GPU. For an intermediate layer, one can't pad in-place, so a naive approach has a memory cost (and I don't see a less naive approach).

futurely · 2014-10-22T01:15:32Z

class InPlacePadding {
  public:
    void pad(const PaddingParameter& param, const vector<Blob<Dtype>*>& bottom, 
                   vector<Blob<Dtype>*>* top);
  protected:
    void pad_mode1(vector<Blob<Dtype>*>* top);
    void pad_mode2(vector<Blob<Dtype>*>* top);
    void pad_mode3(vector<Blob<Dtype>*>* top);
}

class PoolingLayer {
  public:
     void Method(const vector<Blob<Dtype>*>& bottom, vector<Blob<Dtype>*>* top) {
         ...
         padding_.pad(this->layer_param_.padding_param(), bottom, top);
         ...
    }
  protected:
    InPlacePadding padding_;
}

Using composition instead of adding a layer above the conv or pooling layer, no extra memory has to be allocated.

longjon · 2014-10-22T03:59:16Z

@futurely I can't tell what you're trying to get at above... what is padding_.pad supposed to do? Presumably it does puts the padded version of its bottom in top, but then one has to allocate extra memory for the padded version (note that PoolingLayer can't use its own top to the store the padded bottom, it's too small). But your claim is that no extra memory has to be allocated, so one of us is confused.

There is a way to abstract away the padding, which is to have an indexing routine which indirects from direct access to blob memory, performing bounds checking and sometimes returning zero rather than actual blob contents. But I worry that would be overengineering.

futurely · 2014-10-23T18:11:27Z

The above sketch tried to share padding codes without using inheritance. I thought the Blob* bottom could be reshaped in place and the memory was automatically managed by the layers Reshape methods. To see whether there is any commonality to extract, the padding logic of the convolution and pooling layer could be refactored into separate methods first.

wenwei202 · 2016-06-21T22:33:29Z

Thanks for details. Wired dimension of outputs of pooling is confusing.

drcege · 2017-05-10T15:00:00Z

Hey guys, it has been a very long time since the issue was opened.
When can we solve this problem？

longjon added the enhancement label May 9, 2015

longjon mentioned this issue Sep 10, 2015

Add parameter for pooling layer to specify "ceil" or "floor" #3057

Closed

shelhamer mentioned this issue Feb 1, 2016

Python/net spec coordinate map and crop computation #3613

Merged

8 tasks

seanbell mentioned this issue Feb 9, 2016

Standardizing calculation of output activation size (for conv and pooling layers) #3656

Closed

shelhamer mentioned this issue Jun 7, 2016

Maybe a bug in calculating the shape of output blob after pooling with stride #4252

Open

sbodenstein mentioned this issue Sep 23, 2016

Googlenet: InferShape Error in pooling4 kernel size exceed input apache/mxnet#2923

Closed

This was referenced Sep 27, 2016

Inconsistency in output shape calculation between convolution and pooling layers #4773

Closed

Caffe Pooling Layer Output Volume Calculation #4781

Closed

cypof closed this as completed Apr 14, 2017

shelhamer reopened this Apr 14, 2017

RSly mentioned this issue Aug 16, 2017

uncoherent pooling and conv behaviours NVIDIA/caffe#404

Closed

chaosmail mentioned this issue Dec 13, 2017

Add caffemodel loading and parsing tensorflow/tfjs-core#346

Closed

2 tasks

Noiredd mentioned this issue Mar 7, 2018

PoolingLayer customizable output shape rounding mode #6282

Merged

Noiredd closed this as completed in #6282 Aug 17, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Should pooling regions be identical to convolution regions? #1318

Should pooling regions be identical to convolution regions? #1318

longjon commented Oct 18, 2014

ronghanghu commented Oct 18, 2014

jeffdonahue commented Oct 18, 2014

shelhamer commented Oct 18, 2014

shelhamer commented Oct 18, 2014

longjon commented Oct 18, 2014

longjon commented Oct 18, 2014

jeffdonahue commented Oct 18, 2014

longjon commented Oct 18, 2014

futurely commented Oct 21, 2014

futurely commented Oct 21, 2014

longjon commented Oct 21, 2014

futurely commented Oct 22, 2014

longjon commented Oct 22, 2014

futurely commented Oct 23, 2014

wenwei202 commented Jun 21, 2016

drcege commented May 10, 2017

Should pooling regions be identical to convolution regions? #1318

Should pooling regions be identical to convolution regions? #1318

Comments

longjon commented Oct 18, 2014

ronghanghu commented Oct 18, 2014

jeffdonahue commented Oct 18, 2014

shelhamer commented Oct 18, 2014

shelhamer commented Oct 18, 2014

longjon commented Oct 18, 2014

longjon commented Oct 18, 2014

jeffdonahue commented Oct 18, 2014

longjon commented Oct 18, 2014

futurely commented Oct 21, 2014

futurely commented Oct 21, 2014

longjon commented Oct 21, 2014

futurely commented Oct 22, 2014

longjon commented Oct 22, 2014

futurely commented Oct 23, 2014

wenwei202 commented Jun 21, 2016

drcege commented May 10, 2017