New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Should pooling regions be identical to convolution regions? #1318
Comments
If I understand it correctly, currently for pooling it is I also have this issue with my fully-convolutional model. I think it should be made consistent. |
Yeah, I've run into problems with this as well -- there should be a
Hmm, I don't think I understand; are you saying the behavior of |
I vote yes on standardizing the regions for convolution and pooling. This was raised earlier in #473 (comment) and although @jeffdonahue suggested the "cover" mode be kept for compatibility at the time I'd like to hear if he's still into it. |
@jeffdonahue, all I meant was that in the stride 1 case, the behavior of pooling and convolution is the same; the pooling and convolution regions will always cover the entire input without padding. |
@shelhamer and I have discussed, and concluded that it's reasonable to change the behavior of pooling to match that of convolution. In particular:
If a reason for keeping the "cover" mode still exists, comment here. Otherwise I'll implement the change eventually... others are welcome to implement it soon if desired. |
I don't particularly care about having the "cover" mode in terms of wanting to use it for future training, but I'd think it would cause backwards compatibility issues to change the default pooling behavior -- e.g. if you have a saved net with a pooling layer and its output is fed into an inner product layer, and now the pooling layer has a different output dimension, the inner product layer's weights will have a different input dimension and the weights would thus be incompatible. And even if you go fully convolutional it might cause undesired effects on the sampling behavior if the dimensions were carefully planned (e.g. to have exactly 1x1 outputs from some pooling layer)? Maybe we should keep the current behavior for the current set of proto parameters specifying the tiling behavior and deprecate them. Then add a new common proto |
@jeffdonahue, yeah, that's a good point. I doubt many (any?) are relying on edge-pooling in networks that include inner product layers, but it's desirable to have a common message anyway. So your suggestion makes sense as a nice way to avoid some potential backwards compatibility issues, while getting the new behavior and a common message that can simplify code at the same time. |
The padding stuff is also one of the trickiest parts of #560. It is mixing the padding logic with the others that caused so many confusions. Padding is more likely a member of the data transformation family, rather than an inherent part of any layer. The padding of MATLAB conv2 has three modes: full, same and valid. All of them have some use cases. When an independent class is created to manage the padding, it is not very hard to support any mode. |
As a side note, the sizes compatibility validations of the trained models and the network definitions are mixed with many other things in the layer setup methods currently. It would be much easier to reason and maintain by separating them out. |
@futurely, it might be nice to factor out padding (thus allowing it to be used by itself, even when not followed by a conv or pooling layer), but I don't see a way to do that with a non-negligible cost. The data transformation stuff works because the data is first read into host memory, and only the transformed version goes to GPU. For an intermediate layer, one can't pad in-place, so a naive approach has a memory cost (and I don't see a less naive approach). |
Using composition instead of adding a layer above the conv or pooling layer, no extra memory has to be allocated. |
@futurely I can't tell what you're trying to get at above... what is There is a way to abstract away the padding, which is to have an indexing routine which indirects from direct access to blob memory, performing bounds checking and sometimes returning zero rather than actual blob contents. But I worry that would be overengineering. |
The above sketch tried to share padding codes without using inheritance. I thought the Blob* bottom could be reshaped in place and the memory was automatically managed by the layers Reshape methods. To see whether there is any commonality to extract, the padding logic of the convolution and pooling layer could be refactored into separate methods first. |
Thanks for details. Wired dimension of outputs of pooling is confusing. |
Hey guys, it has been a very long time since the issue was opened. |
Pooling and convolution both aggregate inputs over rectangular regions defined by kernel size, stride, and padding parameters. (In fact, average pooling is convolution with a fixed filter, and max pooling is a special case of "tropical convolution".)
However, Caffe currently uses different sets of regions (and consequently, produces different output sizes) for pooling and convolution. Convolution regions are never allowed outside of the padded input, while pooling is performed when a strided pooling region extends off the ends of the input.
This inconsistency causes some annoyance when the exact sizes of things need to be computed and can vary (due to to #594). (The issue doesn't normally come up when using networks like AlexNet that ensure that their pooling regions don't encounter this edge case.)
Should the behavior be made consistent, or is there a good reason for its current state?
(See also #988, but note that the expression presented there is different from both the current behaviors.)
The text was updated successfully, but these errors were encountered: