Sliding window vs. Selective search detection #197

rodrigob · 2014-03-10T11:43:59Z

To the best of my understanding there is nothing stopping Caffe to be used as a sliding window detector (instead of using selective search crop proposals as in http://caffe.berkeleyvision.org/imagenet_detection.html).

It is there any technical blocker to implement something in the lines of OverFeat ?
http://arxiv.org/abs/1312.6229

(there is test time code provided, but no training code)

It is there any existing branch exploring such idea ?
(I am lost amongst all the Caffe branches, hard to see who is doing what...)

shelhamer · 2014-03-10T17:16:48Z

Please see #189 to continue the conversation.

It's true that the majority of the work is already accomplished in Caffe as it is. However, there are details for dense, multiscale extraction that need addressing.

Make the model fully-convolutional, that is, replace inner product layers with convolution s.t. a classification map is made as the output, and not a single vector.
Packing/Unpacking images and pyramids into planes for high throughput processing. Running windows separately is highly redundant, so to amortize computation all the convolution should be done at once. There is overhead to constructing the net for a given size too. The solution is to define a large input "plane" in which images or pyramid scales are placed, run through the network in one go, and then features/output are unpacked from the plane back into the image or pyramid it came from.

Progress-wise: 1 is done (we'll try to write it up soon!), and 2 has a private implementation that is planned for release but currently delayed. A BVLC + community implementation of 2 is being thought out at #189.

To me, reference implementations of both OverFeat [1] and R-CNN [2] in Caffe would be helpful in further analyzing these models and comparing the tradeoffs of dense, sliding window detection and attentive, region proposal detection.

Pull requests toward implementing OverFeat in Caffe would be welcomed. R-CNN is already on its way to having a Caffe reference model.

[1] Pierre Sermanet, David Eigen, Xiang Zhang, Michael Mathieu, Rob Fergus, Yann LeCun. OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks. arXiv:1312.6229 [cs.CV].

[2] Ross Girshick, Jeff Donahue, Trevor Darrell, Jitendra Malik. Rich feature hierarchies for accurate object detection and semantic segmentation. arXiv:1311.2524 [cs.CV].

rodrigob · 2014-03-10T21:08:22Z

Sorry I did not see #189.

Yes indeed the idea would be to have something similar to OverFeat available on Caffe.
Let us see how we can push #189 forward.

shelhamer added the question label Mar 10, 2014

rodrigob closed this as completed Mar 10, 2014

ivendrov mentioned this issue Jan 15, 2015

Make a matrix output and ground truth example (segmentation, sliding window detection, etc.) #1698

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sliding window vs. Selective search detection #197

Sliding window vs. Selective search detection #197

rodrigob commented Mar 10, 2014

shelhamer commented Mar 10, 2014

rodrigob commented Mar 10, 2014

Sliding window vs. Selective search detection #197

Sliding window vs. Selective search detection #197

Comments

rodrigob commented Mar 10, 2014

shelhamer commented Mar 10, 2014

rodrigob commented Mar 10, 2014