Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement SpatialPyramidPoolingLayer with the Split, Pooling, Flatten & Concat layers #560

Closed
wants to merge 12 commits into from
Closed

Implement SpatialPyramidPoolingLayer with the Split, Pooling, Flatten & Concat layers #560

wants to merge 12 commits into from

Conversation

kloudkl
Copy link
Contributor

@kloudkl kloudkl commented Jun 30, 2014

The spatial pyramid pooling layer [1] mentioned in #548 is the combination of the existing PoolingLayer and ConcatLayer. It automatically computes the sliding windows sizes and strides for the multiple pyramid levels, applies the PoolingLayer on each level, and finally concatenates the outputs of all the levels into fixed-size vectors to feed into classifiers or fully connected layers.

[1] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition. The 13th European Conference on Computer Vision (ECCV), 2014

@kloudkl kloudkl changed the title Add the layer template changes for the SpatialPyramidPoolingLayer Implement SpatialPyramidPoolingLayer with PoolingLayer and ConcatLayer Jun 30, 2014
@shelhamer
Copy link
Member

I appreciate how quickly this contribution has appeared, but this should almost certainly be done by composition and not copy-paste.

For example, consider the within-channel LRN https://github.com/BVLC/caffe/blob/master/src/caffe/layers/lrn_layer.cpp#L31

@kloudkl
Copy link
Contributor Author

kloudkl commented Jul 1, 2014

Tests passed but the gradient checks were slow.

Cuda number of devices: 0
Current device id: 0
Note: Google Test filter = -*GPU*
[==========] Running 6 tests from 2 test cases.
[----------] Global test environment set-up.
[----------] 3 tests from SpatialPyramidPoolingLayerTest/0, where TypeParam = float
[ RUN      ] SpatialPyramidPoolingLayerTest/0.TestSetup
E0701 17:11:59.761909   963 common.cpp:30] Cannot create Cublas handle. Cublas won't be available.
E0701 17:11:59.762352   963 common.cpp:37] Cannot create Curand generator. Curand won't be available.
E0701 17:11:59.762387   963 common.cpp:61] Curand not available. Skipping setting the curand seed.
[       OK ] SpatialPyramidPoolingLayerTest/0.TestSetup (1 ms)
[ RUN      ] SpatialPyramidPoolingLayerTest/0.TestCPUForwardMax
[       OK ] SpatialPyramidPoolingLayerTest/0.TestCPUForwardMax (1 ms)
[ RUN      ] SpatialPyramidPoolingLayerTest/0.TestCPUGradientMax
[       OK ] SpatialPyramidPoolingLayerTest/0.TestCPUGradientMax (25774 ms)
[----------] 3 tests from SpatialPyramidPoolingLayerTest/0 (25776 ms total)

[----------] 3 tests from SpatialPyramidPoolingLayerTest/1, where TypeParam = double
[ RUN      ] SpatialPyramidPoolingLayerTest/1.TestSetup
[       OK ] SpatialPyramidPoolingLayerTest/1.TestSetup (0 ms)
[ RUN      ] SpatialPyramidPoolingLayerTest/1.TestCPUForwardMax
[       OK ] SpatialPyramidPoolingLayerTest/1.TestCPUForwardMax (0 ms)
[ RUN      ] SpatialPyramidPoolingLayerTest/1.TestCPUGradientMax
[       OK ] SpatialPyramidPoolingLayerTest/1.TestCPUGradientMax (25519 ms)
[----------] 3 tests from SpatialPyramidPoolingLayerTest/1 (25519 ms total)

[----------] Global test environment tear-down
[==========] 6 tests from 2 test cases ran. (51295 ms total)
[  PASSED  ] 6 tests.

@kloudkl kloudkl changed the title Implement SpatialPyramidPoolingLayer with PoolingLayer and ConcatLayer Implement SpatialPyramidPoolingLayer with the Split, Pooling, Flatten & Concat layers Jul 1, 2014
@kloudkl
Copy link
Contributor Author

kloudkl commented Jul 1, 2014

@bhack, would you like to review if the implementation is consistent with the algorithm described in the section 2.3 of the SPP-net paper?

@bhack
Copy link
Contributor

bhack commented Jul 1, 2014

@kloudkl I hope that i can do it this evening or tomorrow.

@Yangqing
Copy link
Member

Yangqing commented Jul 2, 2014

(Sorry posted this before seeing recent changes, please ignore my previous post - deleted)

@bhack
Copy link
Contributor

bhack commented Jul 3, 2014

@kloudkl I've not compiled the code to deeply trying it but seems that you have simply handled the concat between pooling layer and cumulating loss.
How will be handled the Multisize training? We need to wait for transformation layer #569? Seems that actually different feature request "depends" on data and trasformation separation.

@kloudkl
Copy link
Contributor Author

kloudkl commented Jul 4, 2014

I don't think multi-size training is blocked by the transformation layers. In the paper, the authors simulated multi-size training with multiple fixed-size networks. As the output vectors of the conv5 layers are pooled into fixed-length features by the SpatialPyramidPooling layer, the networks of different sizes can share the same fully-connected layers as their last layers.

I prefer to follow the path of @moskewcz's #308 DenseNet feature pyramid computation. But their code seems too heavy weight to integrate with the SPP. More likely, I will implement the Caffe version of Torch7's PyramidPacker and PyramidUnpacker to extract features for multiple scales of an images as discussed in #189.
#189 (comment)
#189 (comment)

@bhack
Copy link
Contributor

bhack commented Jul 4, 2014

@kloudkl Right

const float spatial_bin_size =
static_cast<float>(image_side_length) / spatial_bin;
pooling_param->set_kernel_size(ceil(spatial_bin_size));
pooling_param->set_stride(floor(spatial_bin_size));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While this is written in Kaiming's paper, I guess there will be some problems with this pooling approach. For example, if image_side_length == 17 and spatial_bin == 6, then you have kernel_size == 3 and stride == 2, so you actually get 8_8 bins, instead of 6_6 bins. @kloudkl Could you tell me whether I am right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, @kloudkl I emailed Dr. Kaiming He for details, and he told me that this is how they perform spatial pyramid pooling:

Denote the width and height of the conv5 feature maps (can be the full image or a window) as w and h. For a pyramid level with n_n bins, the (i,j)-th bin is in the range of [floor((i-1)_w/n), ceil(i_w/n)] \* [floor((j-1)_h/n), ceil(j*h/n)].

I copied this PR and currently I am trying to implement a PyramidLevelLayer to implement this pooling behavor, based on the rectangular pooling #614.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you are right. I realized the problem when I wrote the test cases.

Thank you for contacting the authors for clarification!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm solving it now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And I think the range above includes left border but excludes right border, i.e. [0, 3] contains 0, 1, 2 but not 3.

@kloudkl
Copy link
Contributor Author

kloudkl commented Jul 8, 2014

To be more faithful to the implementation of the authors of the SPP network paper, the pooling layer is extended to support floating point height and width of the kernel and stride. The 36 test cases of the pooling layer are all passed.

The spatial pooling layer is also tested on both the CPU and the GPU.

[==========] Running 10 tests from 2 test cases.
[----------] Global test environment set-up.
[----------] 5 tests from SpatialPyramidPoolingLayerTest/0, where TypeParam = float
[ RUN      ] SpatialPyramidPoolingLayerTest/0.TestSetup
[       OK ] SpatialPyramidPoolingLayerTest/0.TestSetup (303 ms)
[ RUN      ] SpatialPyramidPoolingLayerTest/0.TestCPUForwardMax
[       OK ] SpatialPyramidPoolingLayerTest/0.TestCPUForwardMax (5 ms)
[ RUN      ] SpatialPyramidPoolingLayerTest/0.TestGPUForwardMax
[       OK ] SpatialPyramidPoolingLayerTest/0.TestGPUForwardMax (2 ms)
[ RUN      ] SpatialPyramidPoolingLayerTest/0.TestCPUGradientMax
[       OK ] SpatialPyramidPoolingLayerTest/0.TestCPUGradientMax (85298 ms)
[ RUN      ] SpatialPyramidPoolingLayerTest/0.TestGPUGradientMax
[       OK ] SpatialPyramidPoolingLayerTest/0.TestGPUGradientMax (154611 ms)
[----------] 5 tests from SpatialPyramidPoolingLayerTest/0 (240220 ms total)

[----------] 5 tests from SpatialPyramidPoolingLayerTest/1, where TypeParam = double
[ RUN      ] SpatialPyramidPoolingLayerTest/1.TestSetup
[       OK ] SpatialPyramidPoolingLayerTest/1.TestSetup (0 ms)
[ RUN      ] SpatialPyramidPoolingLayerTest/1.TestCPUForwardMax
[       OK ] SpatialPyramidPoolingLayerTest/1.TestCPUForwardMax (1 ms)
[ RUN      ] SpatialPyramidPoolingLayerTest/1.TestGPUForwardMax
[       OK ] SpatialPyramidPoolingLayerTest/1.TestGPUForwardMax (1 ms)
[ RUN      ] SpatialPyramidPoolingLayerTest/1.TestCPUGradientMax
[       OK ] SpatialPyramidPoolingLayerTest/1.TestCPUGradientMax (85730 ms)
[ RUN      ] SpatialPyramidPoolingLayerTest/1.TestGPUGradientMax
[       OK ] SpatialPyramidPoolingLayerTest/1.TestGPUGradientMax (165432 ms)
[----------] 5 tests from SpatialPyramidPoolingLayerTest/1 (251165 ms total)

[----------] Global test environment tear-down
[==========] 10 tests from 2 test cases ran. (491385 ms total)
[  PASSED  ] 10 tests.

@kloudkl
Copy link
Contributor Author

kloudkl commented Jul 10, 2014

Classification accuracy on the VOC 2012 dataset:

Pooling layer after the conv5 layer Accuracy (%)
max pooling 71.5
spatial pyramid max pooling 68.3

The spatial pyramid pooling layer consists of four pyramid levels each of which respectively splits the images into 1 , 2, 3, 6 patches evenly in both the vertical and horizontal directions.

@kloudkl
Copy link
Contributor Author

kloudkl commented Jul 12, 2014

The SPP-net performed worse as the fully connected layer after the last convolution layer has larger dimensions with the reference imagenet model. Its parameters were randomly initialized and caused over-fitting on the relatively small VOC 2012 dataset. If it is first fine-tuned on a much larger dataset, its perfermance will certainly be superior as described in the paper.

@@ -0,0 +1,328 @@
name: "ImagenetSpatialPyramidPoolingNet"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am wondering that, the voc2012 classification has multiple labels, how to do leveldb?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HDF5DataLayer

@sanchom
Copy link

sanchom commented Nov 27, 2014

What's going on with this? Can I help?

@kloudkl
Copy link
Contributor Author

kloudkl commented Dec 1, 2014

This algorithm involves some very complicated corner cases. For example, a candidate region in the original image may be mapped into a very small region with the width or height equal to or smaller than 1 pixel. It's very hard to detect objects whose sizes are small relative to the image.

GoogLeNet combined with RCNN is a much more robust but much slower solution.

In practice, you may find the object detectors included in the latest OpenCV quite handy for most use cases if you are required to quickly complete a project.

@sanchom
Copy link

sanchom commented Dec 1, 2014

@kloudkl I'm interested in helping. Maybe we can chat about what is holding up this PR. How can we do that?

@shelhamer shelhamer added the ES label Mar 10, 2015
@shelhamer
Copy link
Member

Closing since this PR is abandoned and the code is non-compositional. This is better achieved through layer composition with Pooling and Concat layers than duplicating implementations.

There is an expected replacement: spatial pyramid pooling has been given to a student as Caffe practice.

@ghost
Copy link

ghost commented May 8, 2015

@shelhamer Can you please update us on the current status of this?

@shelhamer
Copy link
Member

See #2177 for spatial pyramid pooling.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants