-
Notifications
You must be signed in to change notification settings - Fork 18.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Spatial Pyramid Pooling Layer #2177
Conversation
Thanks very much! Could you tell me if this implementation and Caffe allow muliti size training? I mean, that learning with images 227x227 and 300x300 using only one database? |
This implementation produces fixed length pooling outputs from variable-sized inputs. I'm not sure about how Caffe handles variable-sized inputs: most of the nets I've seen use center-cropping and/or resizing to produce fixed-size inputs. |
@pgao, nice PR! |
@ducha-aiki I haven't done any benchmarking, but my intuition was that calculating everything from the input (rather than from the previous pooling outputs) was:
|
pooling_top_vecs_.push_back(new vector<Blob<Dtype>*>); | ||
pooling_top_vecs_[i]->push_back(pooling_outputs_[i]); | ||
|
||
// pooling layer setup |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The kernel size and stride logic need to be in Reshape()
. The number of spatial pyramid pooling bins should stay constant but their dimensions will need to change for each input. Inputs can change shape with (1) reshaping data layers #1313 or (2) calls to net or blob reshape()
. When this happens, the kernel size and stride need re-configuring.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a way to change parameters of a layer without having to set it up again? The only way I could figure out the re-configuring of the kernel size and stride height is by constructing a new LayerParameter, resetting the PoolingLayer with that LayerParameter, and calling the PoolingLayer's SetUp.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think there is a way to change parameters without deleting and reinitializing the layer -- you could add a setter to Layer
but I don't think it would really save anything since the constructor itself is probably basically free (SetUp
is probably a little more expensive but you'd have to call that regardless). Do you know if it's an issue in practice?
Nice job @pgao! This is well on the on the way. Once you've done another pass and taken care of the inline comments we can look at merging this. |
@melgor one could include variable size inputs by the reshaping data layer #1313 or calling |
@shelhamer I think it's ready for review now. |
Anyone have any comments on this? |
So how to use the layer? |
hello, i am a student in Taiwan. I0410 16:29:24.782274 20130 solver.cpp:204] Train net output #0: loss = 0.482958 (* 1 = 0.482958 loss) The modified train_val.prototxt is following.. when pyramid_height is 2, the out of memory problem will happen in the 10060 iterations. the other question is that. Is the pyramid_height be the pyramid level height? sorry for asking some question and reading my poor English.. |
@lsy1993311 You use it the same way you would use a regular pooling layer, except you can only specify the pyramid height as a parameter. @kyodaisuki Sounds like a memory leak. I'll take a look at my code and see what's wrong. The pyramid_height is the pyramid level height, that's right. What's the dimensionality of your data? I might have miscalculated the dimensions. |
@kyodaisuki I removed what I think was causing the memory leak, but I can't seem to repro your dimension problem. Could you try it now and see if it works? |
pgao |
I am training network with pyramid_height = 2 now. |
Good to hear. Would you like me to optimise that further? Also, what about the issue with the dimensions? |
[1] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition. The 13th European Conference on Computer Vision (ECCV), 2014 in this paper, they can decide the pyramid level and totally number of bins. |
Well, here I'm choosing a kernel height and width so that the output of the layer is the same size regardless of the input size. I checked this worked by using the unit tests, so I'm not sure what is going wrong with the dimensionality. Check line 50-82 of the test_spp_layer.cpp file for the tests that enforce this. The actual kernel height/width calculation is done in GetPoolingParam, line 17 of spp_layer.cpp. |
may be i have something wrong. |
@shelhamer @jeffdonahue @longjon Any chance of having this pull request reviewed? |
i, bottom_h_, bottom_w_, spp_param); | ||
|
||
delete pooling_layers_[i]; | ||
pooling_layers_[i] = new PoolingLayer<Dtype>(pooling_param); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you make pooling_layers_
a vector<shared_ptr<PoolingLayer<Dtype> > >
and change the above two lines to just pooling_layers_[i].reset(new PoolingLayer...)
? I don't think the last set of PoolingLayer
s you create here will be deleted with your current implementation.
@pgao sorry for the silence -- I added a couple minor comments above. In general, looks great! |
@jeffdonahue No problem! I made the changes you requested. |
hi, pgao
when pyramid_height=3 , num_bins = 2^0 , 2^1 , 2^2. so finally it will produce 1X1 kernal pooling on num_bins =2^2. Could you help me check this problem ? |
@kyodaisuki Sorry about that. I fixed what I think is the issue (line 27 of spp_layer.cpp). I added in your example as a test case, and I'm waiting for Travis to finish running. |
@jeffdonahue Does it look good to merge? |
thanks, pgao |
Hey @pgao -- looks good! Please squash and I'll merge. |
@jeffdonahue Squashed! |
Spatial Pyramid Pooling Layer
Thanks! |
How do I use this ? Can I configure through the prototxt file? Say I am interested in configuring 3 pools in a layer each with a different kernel size and stride (for eg: kernel size (2,3,5), stride (1,3,5)), is there a way to do this through the prototxt file in caffe? |
@siddharthm83 Have you figured out how to use SPP layer? Could you let me know how to set the parameters. Thanks, |
@hermitman Have you figured out how to use SPP layer? can you write an example of it |
@pgao hello~I've tried the spp layer on caffe, my input data is of type ImageData, where images are of different size. But when I train the network, I ran into an error that said check failed of image height. It seems caffe can't accept image data with different size without resizing or cropping to fixed size. |
@mollyStark I am having the same problem. Currently you can either use a batch size of 1 or need to ensure that each batch contains images of the same size, see the comments by Zakum here: http://stackoverflow.com/questions/34697519/caffe-variable-input-image-size. |
@davidstutz I don't know if the result will be good if using batch size of 1, I'll give it a try. Thank you for sharing~ |
@davidstutz @mollyStark I am having the same problem.But,it failed and shows"Check failed: pad_w_<Kernel_w_<1 vs. 1>"if using batch size of 1....Can you help me about it? |
@pgao hello, I've tried spp layer on caffe, I want to fine tune bvlc_reference_caffenet.caffemodel using my own data. my input images are of different size. I only changed the pool5 to spp layer in the prorotxt, like this: |
How could I generate a deploy file for the net with spp layer? Since the input dimensions are various according to the size of test data. Thanks! |
Implements Spatial Pyramid Pooling as described in this paper: http://arxiv.org/pdf/1406.4729.pdf
Implemented using a composition of Pooling, Flatten, and Concat layers. Takes in one required parameter, which is the desired height of the pyramid. Optional parameters include padding amount and pooling method.
Flow is like: (pyramid_height) PoolingLayers -> (pyramid_height) FlattenLayers -> ConcatLayer. End result is a one-dimensional vector containing all the pooling results from different heights of the pyramid.