Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should pooling regions be identical to convolution regions? #1318

Closed
longjon opened this issue Oct 18, 2014 · 16 comments
Closed

Should pooling regions be identical to convolution regions? #1318

longjon opened this issue Oct 18, 2014 · 16 comments

Comments

@longjon
Copy link
Contributor

longjon commented Oct 18, 2014

Pooling and convolution both aggregate inputs over rectangular regions defined by kernel size, stride, and padding parameters. (In fact, average pooling is convolution with a fixed filter, and max pooling is a special case of "tropical convolution".)

However, Caffe currently uses different sets of regions (and consequently, produces different output sizes) for pooling and convolution. Convolution regions are never allowed outside of the padded input, while pooling is performed when a strided pooling region extends off the ends of the input.

This inconsistency causes some annoyance when the exact sizes of things need to be computed and can vary (due to to #594). (The issue doesn't normally come up when using networks like AlexNet that ensure that their pooling regions don't encounter this edge case.)

Should the behavior be made consistent, or is there a good reason for its current state?

(See also #988, but note that the expression presented there is different from both the current behaviors.)

@ronghanghu
Copy link
Member

If I understand it correctly, currently for pooling it is top_size = ceil((bottom_size + 2*pad - kernel_size) / stride) + 1 and for conv it is top_size = floor((bottom_size + 2*pad - kernel_size) / stride) + 1

I also have this issue with my fully-convolutional model. I think it should be made consistent.

@jeffdonahue
Copy link
Contributor

Yeah, I've run into problems with this as well -- there should be a RectangularTileParameter or some such that both ConvolutionParameter and PoolingParameter understand.

while pooling is performed when a strided pooling region extends off the ends of the input.

Hmm, I don't think I understand; are you saying the behavior of PoolingLayer::Reshape (https://github.com/BVLC/caffe/blob/dev/src/caffe/layers/pooling_layer.cpp#L81) is different for the case of stride_h_ == 1 vs stride_h_ > 1 (and/or for stride_w_)? If so that's not obvious to me from the code, but maybe it's more subtle than an if (stride > 1) kind of statement.

@shelhamer
Copy link
Member

I vote yes on standardizing the regions for convolution and pooling. This was raised earlier in #473 (comment) and although @jeffdonahue suggested the "cover" mode be kept for compatibility at the time I'd like to hear if he's still into it.

@shelhamer
Copy link
Member

tropics

Above all, I raise my glass to tropical pools.

@longjon
Copy link
Contributor Author

longjon commented Oct 18, 2014

@jeffdonahue, all I meant was that in the stride 1 case, the behavior of pooling and convolution is the same; the pooling and convolution regions will always cover the entire input without padding.

@longjon
Copy link
Contributor Author

longjon commented Oct 18, 2014

@shelhamer and I have discussed, and concluded that it's reasonable to change the behavior of pooling to match that of convolution. In particular:

  • in the unpadded case, allowing pooling regions to extend off the end of the input may change the statistics of the input in undesirable ways;
  • padding of the bottom-most input can always ensure that any clipped input cells do not cover real data, if needed; and
  • padding can be specified explicitly to compute all of the inputs available in the "cover" mode.

If a reason for keeping the "cover" mode still exists, comment here. Otherwise I'll implement the change eventually... others are welcome to implement it soon if desired.

@jeffdonahue
Copy link
Contributor

I don't particularly care about having the "cover" mode in terms of wanting to use it for future training, but I'd think it would cause backwards compatibility issues to change the default pooling behavior -- e.g. if you have a saved net with a pooling layer and its output is fed into an inner product layer, and now the pooling layer has a different output dimension, the inner product layer's weights will have a different input dimension and the weights would thus be incompatible. And even if you go fully convolutional it might cause undesired effects on the sampling behavior if the dimensions were carefully planned (e.g. to have exactly 1x1 outputs from some pooling layer)?

Maybe we should keep the current behavior for the current set of proto parameters specifying the tiling behavior and deprecate them. Then add a new common proto message with the new behavior used by convolution and pooling.

@longjon
Copy link
Contributor Author

longjon commented Oct 18, 2014

@jeffdonahue, yeah, that's a good point. I doubt many (any?) are relying on edge-pooling in networks that include inner product layers, but it's desirable to have a common message anyway. So your suggestion makes sense as a nice way to avoid some potential backwards compatibility issues, while getting the new behavior and a common message that can simplify code at the same time.

@futurely
Copy link

The padding stuff is also one of the trickiest parts of #560. It is mixing the padding logic with the others that caused so many confusions. Padding is more likely a member of the data transformation family, rather than an inherent part of any layer.

The padding of MATLAB conv2 has three modes: full, same and valid. All of them have some use cases. When an independent class is created to manage the padding, it is not very hard to support any mode.

@futurely
Copy link

As a side note, the sizes compatibility validations of the trained models and the network definitions are mixed with many other things in the layer setup methods currently. It would be much easier to reason and maintain by separating them out.

@longjon
Copy link
Contributor Author

longjon commented Oct 21, 2014

@futurely, it might be nice to factor out padding (thus allowing it to be used by itself, even when not followed by a conv or pooling layer), but I don't see a way to do that with a non-negligible cost. The data transformation stuff works because the data is first read into host memory, and only the transformed version goes to GPU. For an intermediate layer, one can't pad in-place, so a naive approach has a memory cost (and I don't see a less naive approach).

@futurely
Copy link

class InPlacePadding {
  public:
    void pad(const PaddingParameter& param, const vector<Blob<Dtype>*>& bottom, 
                   vector<Blob<Dtype>*>* top);
  protected:
    void pad_mode1(vector<Blob<Dtype>*>* top);
    void pad_mode2(vector<Blob<Dtype>*>* top);
    void pad_mode3(vector<Blob<Dtype>*>* top);
}

class PoolingLayer {
  public:
     void Method(const vector<Blob<Dtype>*>& bottom, vector<Blob<Dtype>*>* top) {
         ...
         padding_.pad(this->layer_param_.padding_param(), bottom, top);
         ...
    }
  protected:
    InPlacePadding padding_;
}

Using composition instead of adding a layer above the conv or pooling layer, no extra memory has to be allocated.

@longjon
Copy link
Contributor Author

longjon commented Oct 22, 2014

@futurely I can't tell what you're trying to get at above... what is padding_.pad supposed to do? Presumably it does puts the padded version of its bottom in top, but then one has to allocate extra memory for the padded version (note that PoolingLayer can't use its own top to the store the padded bottom, it's too small). But your claim is that no extra memory has to be allocated, so one of us is confused.

There is a way to abstract away the padding, which is to have an indexing routine which indirects from direct access to blob memory, performing bounds checking and sometimes returning zero rather than actual blob contents. But I worry that would be overengineering.

@futurely
Copy link

The above sketch tried to share padding codes without using inheritance. I thought the Blob* bottom could be reshaped in place and the memory was automatically managed by the layers Reshape methods. To see whether there is any commonality to extract, the padding logic of the convolution and pooling layer could be refactored into separate methods first.

@wenwei202
Copy link

Thanks for details. Wired dimension of outputs of pooling is confusing.

@drcege
Copy link

drcege commented May 10, 2017

Hey guys, it has been a very long time since the issue was opened.
When can we solve this problem?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants