DenseNet feature pyramid computation #308

moskewcz · 2014-04-08T20:59:56Z

DenseNet PR current state / TODOs (for discussion)

explanatory tech report:

DenseNet: Implementing Efficient ConvNet Descriptor Pyramids
Forrest Iandola, Matt Moskewicz, Sergey Karayev, Ross Girshick, Trevor Darrell, and Kurt Keutzer

additional integration notes (see main notes below inlined from the DENSENET_MERGE_TODO file)

last rebase was a while ago now; there were some delays getting our tech report polished up
in the mean time, some work on replacements for the GPL'd code is in progress; see:
- BLF: https://github.com/moskewcz/boda/blob/master/src/blf_pack.cc
- CNN padding/support/sizes calcs: https://github.com/moskewcz/boda/blob/master/src/conv_util.cc
- hosting repo / framework: https://github.com/moskewcz/boda/

inlined from: DENSENET_MERGE_TODO

list of the issues blocking the merge of the DenseNet feature branch:

critical
- replacement of GPL'd code (including removal from history)
- update build process / Makefile to match current practice
- tests
- trivial: remove DenseNet README.md header
- trivial: remove this todo file
unclear neccessity, semantics, and/or priority
- general cleanup of commit sequence (probably mostly squashing)
- input interface changes (i.e. jpeg filname as input -> ?)
- output interface changes (?, but probably something: image support size, multiple layer output, alignment, etc.)
- if still reading image files after any iface changes and removal of GPL code, use XX instead of YY

…and its python iface

…ake pycaffe'

…ors (as Sergey suggested)

…. works.

…ew data type is dict['feat'] = list of feature scales

…thon user

-- note: this commit almost certainly breaks compilation of matcaffe. mwm

…caffe output in matlab wrapper. mwm fix matlab vs octave compile stuff for matlab pyramid API

…-> caffe/src/stitch_pyramid

…to mean; corner regions still TODO

…rk; other minor changes. -- fill corners of padding with interpolation of edge padding (which is in turn already an interpolation from the image edge to the imagenet mean). -- add .PHONY stitch target to top level makefile for building just libStitchPyramid -- minor fix to testing makefile: add a -L. to src/stitch_pyramid/build/Makefile -- update matcaffe.cpp comment with current mkoctfile-based build command (can be used to test building matcaffe under octave) -- condense str() in featpyramid_common.hpp (str() is for debugging printfs) -- add a copy of str() in JPEDPyramid.cpp (FIXME: use some common copy?) mwm

…ject*) casts (to fix compile errors for some build envs). note also that there is a new SHARED_LDFLAGS makefile var that can be set to include -Wl,--no-undefined (for gcc) to avoid accidentally linking an .so missing some of its dependencies. but, since people use other compilers, it's not enabled by default, and only the stitch library build line uses the macro at this point. mwm

…on/license note there too.

Added some DenseNet API documentation. This will probably percolate from this top-level README.md to the caffe.berkeleyvision.org gh-pages.

forresti · 2014-04-16T00:26:24Z

Following up with people in #189 who are interested in sliding-window, dense, and multiscale CNN descriptors. Do you have any suggestions/feedback for this DenseNet PR?
@kloudkl @mavenlin @sguada @shelhamer @rodrigob

kloudkl · 2014-05-03T14:59:55Z

I've read the paper and most of the codes. It seems that there would be quite a lot of work to do to replace the external codes.

The winner of the Fine-Grained Challenge 2013 held along with the ILSVRC2013 was still using Fisher Vector which aggregated the dense sift local features. @moskewcz, is it possible to use the DenseNet to extract similar dense local features? And if so, how?

moskewcz · 2014-05-03T16:06:00Z

in short, i think the answer is no, at least not in the current code or in the BSD replacement. but, i'm not quite sure what you're asking.

in principle, if you have any dense feature, and a way to compute it on an image, then you can use an image pyramid to compute it across scales. modulo alignment and edge effects, you can also use a stitched image pyramid, if you want to or need to for some reason. i guess it's a pretty standard technique?

i'm still slowly chipping away at replacing all the GPL code and integrating the result into something with roughly equivalent functionality to the existing GPL DenseNet impl:
https://github.com/moskewcz/boda/commits/master

i'd guess that for either the GPL/ffld code or the BSD version (when it is done) of the DenseNet code it wouldn't be hard to modify it to compute dense multiscale descriptor pyramids for some arbitrary feature(s). it's just a matter of having a function to compute the desired feature(s) on a image and various (admittedly potentially non-trivial) feature-dependant glue issues: padding, alignment, mapping.

i'm not sure exactly what the value of that would be, though: i'd think there are other existing ways to go from image -> image pyramid -> feature pyramid for various dense features in C++ (or at least from matlab/python). but maybe not, i dunno.

kloudkl · 2014-05-04T01:18:58Z

Because this work focuses on constructing multiscale image and feature pyramids instead of extracting dense local features of a single scale, maybe DenseNet is better called PyramidNet.

The motivation to extract such dense local features is fine grained object classification. I fine-tuned the caffe reference ImageNet model on 23 knife ImageNet synsets and the test accuracy was only about 50%. But the winning results of the Fine-Grained Challenge 2013 were much better although the object categories were likely to be easier to classify. It's surprising that the winner that did not utilize deep neural network beat those who did.

There are three possible reasons. First, the features of the last convolutional layer have too large receptive fields and are not dense enough to capture the fine grained local information. Second, the fully connected layers do not keep as much statistics of the local features as the Fisher Vectors do. Third, the winner ensembled two complementary Fisher Vector models with quite different design choices and parameters.

As you have pointed out, the solution to the first issue involves non-trivial processing. But the output features of the last convolutional layer at each position can be used as the not so dense local features. The second problem can be mitigated by encoding the local features with the Fisher Vector or the VLAD using the open source implementation of VLFEAT. Finally, the features of multiple different CNN architectures combined together are superior than the features of a single net.

I have just figured out the outline of this scheme and only the experimental results can tell us whether it works or not.

kloudkl · 2014-07-02T01:38:02Z

A recent performance evaluation (#557) indicates that it's impossible for the Fisher Vector to beat CNN.
[1] K. Chatfield, K. Simonyan, A. Vedaldi, A. Zisserman. Return of the Devil in the Details: Delving Deep into Convolutional Nets. http://arxiv.org/abs/1405.3531

bhack · 2014-07-02T07:37:51Z

@kloudkl There are also experiments with caffe and VLAD:
http://arxiv.org/abs/1403.1840v1

kloudkl · 2014-07-03T02:26:22Z

It's strange since the Fisher Vector used to be more effective than VLAD.

melgor · 2014-10-27T13:04:29Z

Hi, when DenseNet will be merged to dev branch? As it works, I assume little work is need to merge it.

shelhamer · 2015-03-12T06:53:33Z

Thanks Forrest @forresti and Matt @moskewcz for posting the code for your tech report. Since the PR has been made, others can find this snapshot of the implementation.

However, this cannot be merged. There are the steps helpfully listed by Matt, but more fundamentally there was and still is the goodness of fit. As implemented this was essentially done independent of the framework and then attached. This shows in the 5000+ line diff.

Caffe is capable of dense inference learning -- blobs can take the necessary shapes, convolution is convolution and matrix multiplications can be cast to convolution, and losses can take different input and truth shapes. (The editing model parameters example shows how to make a model fully convolutional for inference.)

The contribution of this PR is to assemble pyramids and show how to do extraction. This seems like it could be done the most clearly and effectively as an example instead of through further code. For instance (1) the stitching approach could be turned into a data layer or (2) each level of the pyramid could be an input to a weight-shared pyramid net. At its simplest this could be a Python data layer that need not even be slow if one does multi-processing for prefetching.

As a pointer to dense models and matrix output follow #1698. Closing.

forresti and others added 30 commits March 18, 2014 14:01

import of ffld patchwork code; initial skeleton of pyramid stitching …

fc3503f

…and its python iface

add DenseNet prototxt files (from the future, without padding layers)

176b0cc

stitching 2000x2000 planes from within pycaffe.cpp

1821069

designing jpeg pyramid -> caffe copying stuff

f00eb3f

setting up input and output blobs for featpyramid

ecbd74e

getting stitched planes and visualizing them

f723879

creating multiple scales / planes and returning them to python

c005e6b

add disabled code for rand padding scales inside planes

96bfea8

remove old featpyramid test; keeping the multiscale test

4f25c7c

add STITCHPYRAMID_{SRC,HDRS} to Makefile (but no usage yet)

a99c574

convinced the make system to also build PyramidStitcher when we do 'm…

5944af5

…ake pycaffe'

use single imagenet mean pixel for data centering in feature pyramid

f76ca85

streamlining upsampling/downsampling code. going to return scale fact…

78fb085

…ors (as Sergey suggested)

new downsampling param passing looks good.

c05ae56

remove Eigen debris

d506d66

testing boost::python functonality of returning dictionaries from C++…

b1c06e0

…. works.

now returning a DICT instead of a LIST from extract_features(). the n…

7c1ea80

…ew data type is dict['feat'] = list of feature scales

avoid having 'scales' go out of scope...

0af3167

returning scales that make sense

45a327f

cleanup. not officially supporting 'return unstitched features' to py…

e904f25

…thon user

-- initial work on matlab exports for feature pyramid extraction.

178c693

-- note: this commit almost certainly breaks compilation of matcaffe. mwm

add disabled code for writing pyramid to image files

3decf2f

-- use p_vect_float instead of mxArray for buffering feature pyramid …

a2f3361

…caffe output in matlab wrapper. mwm fix matlab vs octave compile stuff for matlab pyramid API

moved stitch_pyramid from caffe/python/caffe/imagenet/stitch_pyramid …

30b8049

…-> caffe/src/stitch_pyramid

fixed demo

0cbbe39

rename extract_featpyramid -> convnet_featpyramid in matlab API

12e2fcd

partially implimented padding: background->mean; edges linear interp …

0e17940

…to mean; corner regions still TODO

add timers

b0af64c

moskewcz and others added 2 commits April 8, 2014 12:10

update README.md with DenseNet arXiv paper link; fix/polish attributi…

bd3dfa8

…on/license note there too.

Update README.md

8717483

Added some DenseNet API documentation. This will probably percolate from this top-level README.md to the caffe.berkeleyvision.org gh-pages.

This was referenced Apr 8, 2014

DenseNet feature pyramid computation #307

Closed

Sliding Window, Varying input/output size and Dense, multiscale extraction #189

Closed

shelhamer added the enhancement label Apr 8, 2014

kloudkl mentioned this pull request Apr 10, 2014

Implement the RBM layer to learn binary codes for large scale image retrieval #274

Closed

remove relics of hardcoded sbin

cbc7d8c

bhack mentioned this pull request May 31, 2014

sliding window lisa-lab/pylearn2#943

Closed

This was referenced Jun 27, 2014

r-cnn spp-net bing #548

Closed

Scale Invariant CNN (SICNN) #576

Closed

This was referenced Jul 3, 2014

MultiScaleOrderlessPoolingLayer #592

Closed

Implement SpatialPyramidPoolingLayer with the Split, Pooling, Flatten & Concat layers #560

Closed

shelhamer force-pushed the dev branch 3 times, most recently from 4278286 to c01f07a Compare August 28, 2014 07:00

shelhamer force-pushed the dev branch from 64258b6 to 403b56b Compare September 19, 2014 04:38

shelhamer force-pushed the dev branch from d8eb4df to 914da95 Compare October 8, 2014 16:36

sergeyk force-pushed the dev branch from 2fb4c97 to 1718903 Compare October 17, 2014 18:44

shelhamer mentioned this pull request Jan 8, 2015

Make a matrix output and ground truth example (segmentation, sliding window detection, etc.) #1698

Closed

shelhamer closed this Mar 12, 2015

shelhamer mentioned this pull request Mar 24, 2015

Spatial Pyramid Pooling Layer #2177

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DenseNet feature pyramid computation #308

DenseNet feature pyramid computation #308

moskewcz commented Apr 8, 2014

forresti commented Apr 16, 2014

kloudkl commented May 3, 2014

moskewcz commented May 3, 2014

kloudkl commented May 4, 2014

kloudkl commented Jul 2, 2014

bhack commented Jul 2, 2014

kloudkl commented Jul 3, 2014

melgor commented Oct 27, 2014

shelhamer commented Mar 12, 2015

DenseNet feature pyramid computation #308

DenseNet feature pyramid computation #308

Conversation

moskewcz commented Apr 8, 2014

DenseNet PR current state / TODOs (for discussion)

explanatory tech report:

additional integration notes (see main notes below inlined from the DENSENET_MERGE_TODO file)

inlined from: DENSENET_MERGE_TODO

forresti commented Apr 16, 2014

kloudkl commented May 3, 2014

moskewcz commented May 3, 2014

kloudkl commented May 4, 2014

kloudkl commented Jul 2, 2014

bhack commented Jul 2, 2014

kloudkl commented Jul 3, 2014

melgor commented Oct 27, 2014

shelhamer commented Mar 12, 2015