Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spatial Pyramid Pooling Layer #2177

Merged
merged 1 commit into from
May 15, 2015
Merged

Spatial Pyramid Pooling Layer #2177

merged 1 commit into from
May 15, 2015

Conversation

pgao
Copy link
Contributor

@pgao pgao commented Mar 22, 2015

Implements Spatial Pyramid Pooling as described in this paper: http://arxiv.org/pdf/1406.4729.pdf

Implemented using a composition of Pooling, Flatten, and Concat layers. Takes in one required parameter, which is the desired height of the pyramid. Optional parameters include padding amount and pooling method.

Flow is like: (pyramid_height) PoolingLayers -> (pyramid_height) FlattenLayers -> ConcatLayer. End result is a one-dimensional vector containing all the pooling results from different heights of the pyramid.

@melgor
Copy link

melgor commented Mar 22, 2015

Thanks very much! Could you tell me if this implementation and Caffe allow muliti size training? I mean, that learning with images 227x227 and 300x300 using only one database?

@pgao
Copy link
Contributor Author

pgao commented Mar 22, 2015

This implementation produces fixed length pooling outputs from variable-sized inputs. I'm not sure about how Caffe handles variable-sized inputs: most of the nets I've seen use center-cropping and/or resizing to produce fixed-size inputs.

@ducha-aiki
Copy link
Contributor

@pgao, nice PR!
Have you checked if it is better to compute the most coarse levels of pooling based on previous pooling layers? (13x13 -> 6x6 -> 3x3 -> 1x1) instead of (13x13 -> 1x1, 13x13 -> 6x6, 13x13 -> 3x3 )?

@pgao
Copy link
Contributor Author

pgao commented Mar 23, 2015

@ducha-aiki I haven't done any benchmarking, but my intuition was that calculating everything from the input (rather than from the previous pooling outputs) was:

  • easier to implement because the pooling layers all feed into flatten layers that feed into a concat layer, as opposed to each pooling layer feeding into another pooling layer and also into a flatten layer.
  • more parallelisable since the pooling layers don't need to wait for other pooling layers to complete their calculations.

pooling_top_vecs_.push_back(new vector<Blob<Dtype>*>);
pooling_top_vecs_[i]->push_back(pooling_outputs_[i]);

// pooling layer setup
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The kernel size and stride logic need to be in Reshape(). The number of spatial pyramid pooling bins should stay constant but their dimensions will need to change for each input. Inputs can change shape with (1) reshaping data layers #1313 or (2) calls to net or blob reshape(). When this happens, the kernel size and stride need re-configuring.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way to change parameters of a layer without having to set it up again? The only way I could figure out the re-configuring of the kernel size and stride height is by constructing a new LayerParameter, resetting the PoolingLayer with that LayerParameter, and calling the PoolingLayer's SetUp.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there is a way to change parameters without deleting and reinitializing the layer -- you could add a setter to Layer but I don't think it would really save anything since the constructor itself is probably basically free (SetUp is probably a little more expensive but you'd have to call that regardless). Do you know if it's an issue in practice?

@shelhamer
Copy link
Member

Nice job @pgao! This is well on the on the way. Once you've done another pass and taken care of the inline comments we can look at merging this.

@shelhamer
Copy link
Member

@melgor one could include variable size inputs by the reshaping data layer #1313 or calling net.blobs['data'].reshape(...) or the like in Python. For batching multi-scale inputs together one could make a Python data layer to carry out the pyramid packing of #308 although you might need a companion layer to unpack the multi-scale feature map. That would make a good example.

@pgao
Copy link
Contributor Author

pgao commented Apr 4, 2015

@shelhamer I think it's ready for review now.

@pgao
Copy link
Contributor Author

pgao commented Apr 8, 2015

Anyone have any comments on this?

@lsy1993311
Copy link

So how to use the layer?

@kyodaisuki
Copy link

hello, i am a student in Taiwan.
I try to fine-tune caffe reference model by PASCAL 2007 database and i use SPP layer between conv-5 and fc-6. This SPP code can be compiled. When i fine-tune the model,it got some problem after 2000 iterations. like this:

I0410 16:29:24.782274 20130 solver.cpp:204] Train net output #0: loss = 0.482958 (* 1 = 0.482958 loss)
I0410 16:29:24.782297 20130 solver.cpp:464] Iteration 1980, lr = 0.001
I0410 16:29:27.842135 20130 solver.cpp:266] Iteration 2000, Testing net (#0)
F0410 16:29:30.335808 20130 syncedmem.cpp:51] Check failed: error == cudaSuccess (2 vs. 0) out of memory
*** Check failure stack trace: ***
@ 0x7f04e3ab1daa (unknown)
@ 0x7f04e3ab1ce4 (unknown)
@ 0x7f04e3ab16e6 (unknown)
@ 0x7f04e3ab4687 (unknown)
@ 0x7f04e3e0d9db caffe::SyncedMemory::mutable_gpu_data()
@ 0x7f04e3ec1ef2 caffe::Blob<>::mutable_gpu_data()
@ 0x7f04e3f05717 caffe::PoolingLayer<>::Forward_gpu()
@ 0x7f04e3e97c61 caffe::Layer<>::Forward()
@ 0x7f04e3eaac82 caffe::SPPLayer<>::Forward_cpu()
@ 0x7f04e3dfbe9f caffe::Net<>::ForwardFromTo()
@ 0x7f04e3dfc2c7 caffe::Net<>::ForwardPrefilled()
@ 0x7f04e3eb5f77 caffe::Solver<>::Test()
@ 0x7f04e3eb6836 caffe::Solver<>::TestAll()
@ 0x7f04e3ebe3e9 caffe::Solver<>::Step()
@ 0x7f04e3ebed3f caffe::Solver<>::Solve()
@ 0x4064e6 train()
@ 0x404a21 main
@ 0x7f04e2fc2ec5 (unknown)
@ 0x404fcd (unknown)
@ (nil) (unknown)
Aborted (core dumped)

The modified train_val.prototxt is following..
layer {
name: "spatial_pyramid_pooling"
type: "SPP"
bottom: "conv5"
top: "spatial_pyramid_pooling"
spp_param {
pool: MAX
pyramid_height: 3
}
}

when pyramid_height is 2, the out of memory problem will happen in the 10060 iterations.
when pyramid_height is 3, the out of memory problem will happen in the 2000 iterations.
when pyramid_height is 4, the out of memory problem will happen in the 460 iterations.

the other question is that.

Is the pyramid_height be the pyramid level height?
and i check the SPP-layer dimension.
the pyramid_height = 2 , the dimension will be 5. (1+4 ok)
the pyramid_height = 3 , the dimension will be 21. (1+4+16 ok)
the pyramid_height = 4 , the dimension will be 190 ??. (1+4+16+ 64???)

sorry for asking some question and reading my poor English..
If i have some wrong steps in implement SPP-layer. please tell me.
thanks.

@pgao
Copy link
Contributor Author

pgao commented Apr 10, 2015

@lsy1993311 You use it the same way you would use a regular pooling layer, except you can only specify the pyramid height as a parameter.

@kyodaisuki Sounds like a memory leak. I'll take a look at my code and see what's wrong. The pyramid_height is the pyramid level height, that's right. What's the dimensionality of your data? I might have miscalculated the dimensions.

@pgao
Copy link
Contributor Author

pgao commented Apr 10, 2015

@kyodaisuki I removed what I think was causing the memory leak, but I can't seem to repro your dimension problem. Could you try it now and see if it works?

@kyodaisuki
Copy link

pgao
thanks a lot.
i will try it tomorrow.

@kyodaisuki
Copy link

I am training network with pyramid_height = 2 now.
The memory leak problem do not happen until 96000 iterations.
good~

@pgao
Copy link
Contributor Author

pgao commented Apr 13, 2015

Good to hear. Would you like me to optimise that further? Also, what about the issue with the dimensions?

@kyodaisuki
Copy link

[1] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition. The 13th European Conference on Computer Vision (ECCV), 2014

in this paper, they can decide the pyramid level and totally number of bins.
like this:
4-level spatial pyramid (1×1, 2×2, 3×3, 6×6, totally 50 bins)
Can you optimise in this way ?
thanks

@pgao
Copy link
Contributor Author

pgao commented Apr 14, 2015

Well, here I'm choosing a kernel height and width so that the output of the layer is the same size regardless of the input size. I checked this worked by using the unit tests, so I'm not sure what is going wrong with the dimensionality.

Check line 50-82 of the test_spp_layer.cpp file for the tests that enforce this. The actual kernel height/width calculation is done in GetPoolingParam, line 17 of spp_layer.cpp.

@kyodaisuki
Copy link

may be i have something wrong.
in your test_spp_layer.cpp file.
if pyramid_height=5 ,it is the same as my calculation .(1+2+4+16+64+256 = 341)
it is right .

@pgao
Copy link
Contributor Author

pgao commented May 6, 2015

@shelhamer @jeffdonahue @longjon Any chance of having this pull request reviewed?

i, bottom_h_, bottom_w_, spp_param);

delete pooling_layers_[i];
pooling_layers_[i] = new PoolingLayer<Dtype>(pooling_param);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you make pooling_layers_ a vector<shared_ptr<PoolingLayer<Dtype> > > and change the above two lines to just pooling_layers_[i].reset(new PoolingLayer...)? I don't think the last set of PoolingLayers you create here will be deleted with your current implementation.

@jeffdonahue
Copy link
Contributor

@pgao sorry for the silence -- I added a couple minor comments above. In general, looks great!

@pgao
Copy link
Contributor Author

pgao commented May 6, 2015

@jeffdonahue No problem! I made the changes you requested.

@kyodaisuki
Copy link

hi, pgao
i found some issues on auto-pad process.
when the input kernel map is 7x7 and pyramid_height=3 , it will produce a wrong situation.(the number 54 output,i think it should be from 1 + 4 + 49.)
for example:

  • int num_bins = pow(2, pyramid_level);
  • // find padding and kernel size so that the pooling is
  • // performed across the entire image
  • int kernel_h = ceil(bottom_h / static_cast(num_bins));
  • int pad_h = 0;
  • int remainder_h = kernel_h - bottom_h % num_bins + 1;
  • if (bottom_h % num_bins > 0) {
  • pad_h = (remainder_h + 1) / 2;
  • kernel_h = (bottom_h + 2 * pad_h) / num_bins;
  • }

when pyramid_height=3 , num_bins = 2^0 , 2^1 , 2^2.
if num_bins = 2^2
kernel_h = ceil(7/4) = ceil(1...3) = 2
remainder_h = 2 - 7%4 + 1 = 2-3+1 =0 (<------------------?)
if(3>0){
pad_h = (0+1)/2=0
kernel_h = (7 + 2*0) / 4 = 1.XX (double) = 1 (int)
}

so finally it will produce 1X1 kernal pooling on num_bins =2^2.
and than, i will be 1+4+49 != 21

Could you help me check this problem ?

@pgao
Copy link
Contributor Author

pgao commented May 13, 2015

@kyodaisuki Sorry about that. I fixed what I think is the issue (line 27 of spp_layer.cpp). I added in your example as a test case, and I'm waiting for Travis to finish running.

@pgao
Copy link
Contributor Author

pgao commented May 13, 2015

@jeffdonahue Does it look good to merge?

@kyodaisuki
Copy link

thanks, pgao
I will try it again.

@jeffdonahue
Copy link
Contributor

Hey @pgao -- looks good! Please squash and I'll merge.

@pgao
Copy link
Contributor Author

pgao commented May 15, 2015

@jeffdonahue Squashed!

jeffdonahue added a commit that referenced this pull request May 15, 2015
Spatial Pyramid Pooling Layer
@jeffdonahue jeffdonahue merged commit 35a5df5 into BVLC:master May 15, 2015
@jeffdonahue
Copy link
Contributor

Thanks!

@BlGene BlGene mentioned this pull request Jun 15, 2015
@siddharthm83
Copy link

How do I use this ? Can I configure through the prototxt file? Say I am interested in configuring 3 pools in a layer each with a different kernel size and stride (for eg: kernel size (2,3,5), stride (1,3,5)), is there a way to do this through the prototxt file in caffe?

@hermitman
Copy link

@siddharthm83 Have you figured out how to use SPP layer? Could you let me know how to set the parameters.

Thanks,

@roeiherz
Copy link

roeiherz commented Apr 9, 2016

@hermitman Have you figured out how to use SPP layer? can you write an example of it

@mollyStark
Copy link

@pgao hello~I've tried the spp layer on caffe, my input data is of type ImageData, where images are of different size. But when I train the network, I ran into an error that said check failed of image height. It seems caffe can't accept image data with different size without resizing or cropping to fixed size.
Did I misunderstand the purpose of spp layer that "produces fixed length pooling outputs from variable-sized inputs" as you said? Does it still need to resize images to same size first and then use spp layer?

@davidstutz
Copy link

@mollyStark I am having the same problem. Currently you can either use a batch size of 1 or need to ensure that each batch contains images of the same size, see the comments by Zakum here: http://stackoverflow.com/questions/34697519/caffe-variable-input-image-size.

@mollyStark
Copy link

@davidstutz I don't know if the result will be good if using batch size of 1, I'll give it a try. Thank you for sharing~

@CurtisLi
Copy link

CurtisLi commented May 26, 2016

@davidstutz @mollyStark I am having the same problem.But,it failed and shows"Check failed: pad_w_<Kernel_w_<1 vs. 1>"if using batch size of 1....Can you help me about it?

@ftraining
Copy link

@pgao hello, I've tried spp layer on caffe, I want to fine tune bvlc_reference_caffenet.caffemodel using my own data. my input images are of different size. I only changed the pool5 to spp layer in the prorotxt, like this:
layer {
name:"pool5"
type:"SPP"
bottom:"conv5"
top:"pool5"
spp_param{
pool:MAX
pyramid_height:3
}
}
As I know, there is no use to crop from images when using spp-layer, but when I delete the crop_size parameter in data layer,it got error. Can you help me about it?

@Jinglei5
Copy link

How could I generate a deploy file for the net with spp layer? Since the input dimensions are various according to the size of test data. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.