Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ImageNet LRN/MaxPool ordering #296

Closed
kmatzen opened this issue Apr 6, 2014 · 9 comments
Closed

ImageNet LRN/MaxPool ordering #296

kmatzen opened this issue Apr 6, 2014 · 9 comments

Comments

@kmatzen
Copy link
Contributor

kmatzen commented Apr 6, 2014

I don't think it's explicitly stated anywhere that the ImageNet example is supposed to be an exact reimplementation of the Krizhevsky 2012 architecture, but if it is, then the order of the LRN and max pool layers in Caffe's implementation seems to be backwards.

This network uses conv -> max pool -> LRN.
https://github.com/BVLC/caffe/blob/master/examples/imagenet/imagenet_train.prototxt#L48

This text suggests that he used conv -> LRN -> max pool.
"Response-normalization layers follow the first and second convolutional layers. Max-pooling layers, of the kind described in Section 3.4, follow both response-normalization layers as well as the fifth convolutional layer."

Either ordering seems to get good results, but for people reimplementing papers that say Krizhevsky's architecture was used, then it might be worthwhile to make sure your implementation matches his paper.

@jeffdonahue
Copy link
Contributor

Huh, looks like you're correct - interesting that nobody else has ever pointed this out after 8+ months of us using this reimplementation of the architecture (first in cuda-convnet, then decaf, now caffe).

Feel free to send a PR with a note in the documentation that our implementation differs from Krizhevsky's published architecture in this way. (And if someone from Berkeley cares to train an instance of the corrected version and finds it matches or outperforms the reference model, we could replace it. It does seem more natural to normalize then max-pool.)

Edit: actually we probably don't want to ever actually 'replace' the current reference model at this point as it's been used in many results that have already been disseminated in various forms, but we could (and probably will) have additional reference model(s).

@shelhamer
Copy link
Member

I'm happy to re-train. What should we do with the result? The caffe_reference_imagenet_model is already in use, so it shouldn't be replaced outright.

@jeffdonahue's suggestion of caffe_reference_alexnet_model should work fine. Note that for further exactness we should train with "relighting" or state more obviously that we train without it.

Le dimanche 6 avril 2014, Jeff Donahue notifications@github.com a écrit :

Huh, looks like you're correct - interesting that nobody else has ever
pointed this out after 8+ months of us using this reimplementation of the
architecture (first in cuda-convnet, then decaf, now caffe).

Feel free to send a PR with a note in the documentation that our
implementation differs from Krizhevsky's published architecture in this
way. (And if someone from Berkeley cares to train an instance of the
corrected version and finds it matches or outperforms the reference model,
we could replace it. It does seem more natural to normalize then max-pool.)


Reply to this email directly or view it on GitHubhttps://github.com//issues/296#issuecomment-39686496
.

Evan Shelhamer

@jeffdonahue
Copy link
Contributor

Yeah, sounds good - maybe you could change 'imagenet' to 'alexnet' or some other descriptive name to make it clear it's a different architecture.

@kloudkl
Copy link
Contributor

kloudkl commented Apr 7, 2014

If the model is going to be re-trained, why don't we choose the ZeilerNet (#33) that outperformed the AlexNet last year?

@shelhamer
Copy link
Member

@Yangqing our Caffe reference ImageNet model LRN -> Max instead of Max -> LRN as in the Krizhevsky architecture.

I'm training now. I'll check back with AlexNet model results later this week and we can decide exactly how to package it.

@kloudkl we plan to release a ZF model too, but #33 (comment) still needs implementing to do it exactly right.

@sguada
Copy link
Contributor

sguada commented Apr 7, 2014

@shelhamer I will do a small test to check if the order is likely to affect the results. What I can tell you is that it increase the memory consumption by at least 1Gb.
Also we train with data warped to fit 256x256 instead of resize-crop as stated in the paper.

Maybe we should differentiate more explicitly caffe_reference_model and alexnet_reference_model.

@shelhamer
Copy link
Member

I'm already running the training.

Update: after three days the loss is ~1.8 and val accuracy is ~54% at 170,000 or so iterations.

@shelhamer
Copy link
Member

The AlexNet / Krizhevsky '12 architecture model was released in #327. Follow-up there for the details of training (and note that there are small differences from the training regime described in the paper).

@harvinderdabas
Copy link

I am into hardware acceleration for CNN inference, and come across this when I compared google net with Alexnet, GoogleNet does the LRN after pooling, which is efficient from computation point of view. From the intent of the LRN layer I feel LRN should be done before max pool, because the LRN done first can have a impact on the max pooling decisions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants