Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow images of different sizes as inputs #557

Closed
sguada opened this issue Jun 28, 2014 · 11 comments
Closed

Allow images of different sizes as inputs #557

sguada opened this issue Jun 28, 2014 · 11 comments

Comments

@sguada
Copy link
Contributor

sguada commented Jun 28, 2014

Bases in recent experiments, cropping from images with smallest size = 256 perform better.
http://arxiv.org/pdf/1405.3531v2.pdf

The idea is allowing that images have different sizes before cropping, but became same size after cropping. This would require remove the mean_file and replace it with a mean_value.

LevelDB, LMDB and Image_Data_Layer should not assume that the images have the same size.

@sguada
Copy link
Contributor Author

sguada commented Jun 28, 2014

@Yangqing when you do the data_layers re-design in #407 #244 keep this in mind.

@jamt9000
Copy link
Contributor

Regarding that paper, I believe they will be releasing their source code (and models) soon

http://www.robots.ox.ac.uk/~vgg/research/deep_eval/

@kloudkl
Copy link
Contributor

kloudkl commented Jun 30, 2014

The paper that #548 wants to implement [1] proposed a very natural and general way to extract convolutional features for images of any sizes and then pool the feature maps into fixed length vectors with spatial pyramids. The spatial pyramid pooling(SPP) idea is not new. But until now, most people are only used to doing pooling with sliding windows in CNN. On the other hand, the SPP-net only experimented with max pooling in each spatial bin while sliding windows also used other aggregation methods.

[1] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition. The 13th European Conference on Computer Vision (ECCV), 2014

@kloudkl
Copy link
Contributor

kloudkl commented Jun 30, 2014

This is complementary to #505.

@kloudkl
Copy link
Contributor

kloudkl commented Jul 1, 2014

@sguada, do you think #355 is a prerequisite for this issue?

@sguada
Copy link
Contributor Author

sguada commented Jul 1, 2014

The idea is allow images of different sizes as inputs, but then keep them with fix size after cropping, so the rest of the network will work as usual.

Therefore for now #355 is not needed, although could be combined later on.

@kloudkl
Copy link
Contributor

kloudkl commented Jul 2, 2014

Got your idea. The ImageDataLayer resizes the images before cropping and mirroring them. The convert_imageset tool ensures that the images stored in the Leveldb are of the same size. So there is basically no requirement of the original images sizes. Only LMDB needs to be enhanced.

The mean_value is just a simplification of the mean_file and don't have to replace the latter.

@qingqing01
Copy link

I use the ImageDataLayer and I don't understand what you said "replace mean_file with a mean_value. ". How to compute the mean_value? Does the Caffe have tool to compute the mean_file using the input images? Before I write the tool to compute the mean_file, I want to confirm. Thanks!

@shelhamer
Copy link
Member

@Dcocoa it turns out the spatial mean i.e. the mean over images with dimensions K x H x W is almost everywhere the same in height and width so averaging over spatial dimensions in to a channel mean with dimensions K x 1 x 1 achieves virtually the same network performance while making preprocessing simpler and more flexible.

compute_image_mean computes the mean.

@hayderm
Copy link

hayderm commented Apr 13, 2015

Please, I want to use SPP-Net and it works well. However, when I change the number of layers it givers error !! so do I need to compile caffe ? or just use the given caffe.mex ?

@longjon
Copy link
Contributor

longjon commented May 9, 2015

Closing as we now have per-channel mean so this should work, and should be doable with gradient accumulation. (If it's broken for batch size > 1, you're welcome to open a new issue for that.)

@longjon longjon closed this as completed May 9, 2015
frankier pushed a commit to frankier/caffe that referenced this issue May 19, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants