ImageNet LRN/MaxPool ordering #296

kmatzen · 2014-04-06T17:23:48Z

I don't think it's explicitly stated anywhere that the ImageNet example is supposed to be an exact reimplementation of the Krizhevsky 2012 architecture, but if it is, then the order of the LRN and max pool layers in Caffe's implementation seems to be backwards.

This network uses conv -> max pool -> LRN.
https://github.com/BVLC/caffe/blob/master/examples/imagenet/imagenet_train.prototxt#L48

This text suggests that he used conv -> LRN -> max pool.
"Response-normalization layers follow the ﬁrst and second convolutional layers. Max-pooling layers, of the kind described in Section 3.4, follow both response-normalization layers as well as the ﬁfth convolutional layer."

Either ordering seems to get good results, but for people reimplementing papers that say Krizhevsky's architecture was used, then it might be worthwhile to make sure your implementation matches his paper.

jeffdonahue · 2014-04-06T23:13:21Z

Huh, looks like you're correct - interesting that nobody else has ever pointed this out after 8+ months of us using this reimplementation of the architecture (first in cuda-convnet, then decaf, now caffe).

Feel free to send a PR with a note in the documentation that our implementation differs from Krizhevsky's published architecture in this way. (And if someone from Berkeley cares to train an instance of the corrected version and finds it matches or outperforms the reference model, we could replace it. It does seem more natural to normalize then max-pool.)

Edit: actually we probably don't want to ever actually 'replace' the current reference model at this point as it's been used in many results that have already been disseminated in various forms, but we could (and probably will) have additional reference model(s).

shelhamer · 2014-04-06T23:49:05Z

I'm happy to re-train. What should we do with the result? The caffe_reference_imagenet_model is already in use, so it shouldn't be replaced outright.

@jeffdonahue's suggestion of caffe_reference_alexnet_model should work fine. Note that for further exactness we should train with "relighting" or state more obviously that we train without it.

Le dimanche 6 avril 2014, Jeff Donahue notifications@github.com a écrit :

Huh, looks like you're correct - interesting that nobody else has ever
pointed this out after 8+ months of us using this reimplementation of the
architecture (first in cuda-convnet, then decaf, now caffe).

Feel free to send a PR with a note in the documentation that our
implementation differs from Krizhevsky's published architecture in this
way. (And if someone from Berkeley cares to train an instance of the
corrected version and finds it matches or outperforms the reference model,
we could replace it. It does seem more natural to normalize then max-pool.)

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/296#issuecomment-39686496
.

Evan Shelhamer

jeffdonahue · 2014-04-07T00:01:05Z

Yeah, sounds good - maybe you could change 'imagenet' to 'alexnet' or some other descriptive name to make it clear it's a different architecture.

kloudkl · 2014-04-07T03:09:57Z

If the model is going to be re-trained, why don't we choose the ZeilerNet (#33) that outperformed the AlexNet last year?

shelhamer · 2014-04-07T19:38:32Z

@Yangqing our Caffe reference ImageNet model LRN -> Max instead of Max -> LRN as in the Krizhevsky architecture.

I'm training now. I'll check back with AlexNet model results later this week and we can decide exactly how to package it.

@kloudkl we plan to release a ZF model too, but #33 (comment) still needs implementing to do it exactly right.

sguada · 2014-04-07T21:22:53Z

@shelhamer I will do a small test to check if the order is likely to affect the results. What I can tell you is that it increase the memory consumption by at least 1Gb.
Also we train with data warped to fit 256x256 instead of resize-crop as stated in the paper.

Maybe we should differentiate more explicitly caffe_reference_model and alexnet_reference_model.

shelhamer · 2014-04-07T21:37:15Z

I'm already running the training.

Update: after three days the loss is ~1.8 and val accuracy is ~54% at 170,000 or so iterations.

shelhamer · 2014-04-19T03:33:13Z

The AlexNet / Krizhevsky '12 architecture model was released in #327. Follow-up there for the details of training (and note that there are small differences from the training regime described in the paper).

harvinderdabas · 2018-11-16T09:43:53Z

I am into hardware acceleration for CNN inference, and come across this when I compared google net with Alexnet, GoogleNet does the LRN after pooling, which is efficient from computation point of view. From the intent of the LRN layer I feel LRN should be done before max pool, because the LRN done first can have a impact on the max pooling decisions.

shelhamer closed this as completed Apr 19, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ImageNet LRN/MaxPool ordering #296

ImageNet LRN/MaxPool ordering #296

kmatzen commented Apr 6, 2014

jeffdonahue commented Apr 6, 2014

shelhamer commented Apr 6, 2014

jeffdonahue commented Apr 7, 2014

kloudkl commented Apr 7, 2014

shelhamer commented Apr 7, 2014

sguada commented Apr 7, 2014

shelhamer commented Apr 7, 2014

shelhamer commented Apr 19, 2014

harvinderdabas commented Nov 16, 2018

ImageNet LRN/MaxPool ordering #296

ImageNet LRN/MaxPool ordering #296

Comments

kmatzen commented Apr 6, 2014

jeffdonahue commented Apr 6, 2014

shelhamer commented Apr 6, 2014

jeffdonahue commented Apr 7, 2014

kloudkl commented Apr 7, 2014

shelhamer commented Apr 7, 2014

sguada commented Apr 7, 2014

shelhamer commented Apr 7, 2014

shelhamer commented Apr 19, 2014

harvinderdabas commented Nov 16, 2018