Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sliding window vs. Selective search detection #197

Closed
rodrigob opened this issue Mar 10, 2014 · 2 comments
Closed

Sliding window vs. Selective search detection #197

rodrigob opened this issue Mar 10, 2014 · 2 comments
Labels

Comments

@rodrigob
Copy link
Contributor

To the best of my understanding there is nothing stopping Caffe to be used as a sliding window detector (instead of using selective search crop proposals as in http://caffe.berkeleyvision.org/imagenet_detection.html).

It is there any technical blocker to implement something in the lines of OverFeat ?
http://arxiv.org/abs/1312.6229

(there is test time code provided, but no training code)

It is there any existing branch exploring such idea ?
(I am lost amongst all the Caffe branches, hard to see who is doing what...)

@shelhamer
Copy link
Member

Please see #189 to continue the conversation.

It's true that the majority of the work is already accomplished in Caffe as it is. However, there are details for dense, multiscale extraction that need addressing.

  1. Make the model fully-convolutional, that is, replace inner product layers with convolution s.t. a classification map is made as the output, and not a single vector.
  2. Packing/Unpacking images and pyramids into planes for high throughput processing. Running windows separately is highly redundant, so to amortize computation all the convolution should be done at once. There is overhead to constructing the net for a given size too. The solution is to define a large input "plane" in which images or pyramid scales are placed, run through the network in one go, and then features/output are unpacked from the plane back into the image or pyramid it came from.

Progress-wise: 1 is done (we'll try to write it up soon!), and 2 has a private implementation that is planned for release but currently delayed. A BVLC + community implementation of 2 is being thought out at #189.

To me, reference implementations of both OverFeat [1] and R-CNN [2] in Caffe would be helpful in further analyzing these models and comparing the tradeoffs of dense, sliding window detection and attentive, region proposal detection.

Pull requests toward implementing OverFeat in Caffe would be welcomed. R-CNN is already on its way to having a Caffe reference model.

[1] Pierre Sermanet, David Eigen, Xiang Zhang, Michael Mathieu, Rob Fergus, Yann LeCun. OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks. arXiv:1312.6229 [cs.CV].

[2] Ross Girshick, Jeff Donahue, Trevor Darrell, Jitendra Malik. Rich feature hierarchies for accurate object detection and semantic segmentation. arXiv:1311.2524 [cs.CV].

@rodrigob
Copy link
Contributor Author

Sorry I did not see #189.

Yes indeed the idea would be to have something similar to OverFeat available on Caffe.
Let us see how we can push #189 forward.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants