-
Notifications
You must be signed in to change notification settings - Fork 18.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GoogLeNet training in Caffe #1367
Conversation
I have also trained the networks for a few weeks. What top1 accuracy on val have you achieved? |
I check the definition file and so far to my knowledge, it looks the same as the original paper. Just a small question about the Inception structure, why the "pad" is added to the conv3_reduce/conv5_reduce rather than conv3/conv5 itself? The final answer is exactly the same. But it's a waste of calculation since there is nothing to learn for the pading area in 1*1 convolutions reduce layer. |
I added the pad in conv3/conv5 instead of conv3_reduce/conv5_reduce, and it trained well. |
@weiliu89 I want to try the network,could you share me some information about GPU-memory and the time cost ,thank you! |
@amiralush I don't see any accuracy number, I saw that you plot the loss vs. iterations. What top1 accuracy on val have you achieved? |
@weiliu89 Since I'm using my own dataset it's not relevant. |
@amiralush Ok. I see. Thanks! |
batch_size: 50 | ||
} | ||
transform_param { | ||
crop_size: 227 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that the original paper used a 224x224 crop. It probably makes minimal difference, but at least, you should probably make crop_size
the same for the TRAIN
and TEST
phases (they're currently 228 and 227, respectively).
bottom: "clf1_fc2" | ||
bottom: "label" | ||
top: "clf1_loss" | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add loss_weight: 0.3
From the paper (page 6):
During training, their loss gets added to the total loss of the network with a discount weight (the losses of the auxiliary classifiers were weighted by 0.3).
Note that this PR has some competition; @kmatzen has written a Python script to automatically generate GoogLeNet protobuffers: #1169 (comment) Direct link: https://github.com/kmatzen/caffe/blob/inception_plan/python/inception_plan.py |
(2) Moved padding from conv3_reduce/conv5_reduce to conv3/conv5. (3) Changed test crop size to be consistent with train crop-size.
Thanks for your comments! I have committed these changes. @drdan14 thanks for pointing this out, I wasn't aware of this before making this PR. I've actually trained this configuration and tested its convergence before making the PR after getting several requests. |
@amiralush
-also inc3_conv3 (4a) |
layers { | ||
bottom: "inc3_conv5_reduce" | ||
top: "inc3_conv5_reduce" | ||
name: "inc3_conv5_relu" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo: this should be inc3_conv5_reduce_relu
. As it stands now, this name is a duplicate of a different layer.
@amiralush - What does clf1_loss, clf2_loss refer to? |
@marutiagarwal these are the auxiliary loses. |
@amiralush - I am training (on 1000 classes ILSVRC 2012 data) googlenet using train_test.prototxt provided in your repo. I am using batch size = 32. Since beyond this, I get memory error. Rest all the values are same as your proto files. After 400,000 iterations: clf1_accuracy = 0.434675 While in the googlenet publication, the reported classification accuracy was close to ~93.3%. I am wondering what might be the problem in my training. Could you please suggest based on your experience. Thanks ! |
@marutiagarwal |
@marutiagarwal the performance of a single GoogleNet according to the authors is top-1 70% and top-5 89%, but is after approximately 250 epochs, i.e. with batch_size=32 represents 10M iterations. In my implementation of GoogleNet I got top-1 68.7% and top-5 89%, after 60 epochs, with batch_size=32 represents 2.4M iterations. I will be sharing my prototxt and the model next week. So your results after 400K iterations look pretty good, so just probably need to let it run for longer. Also the results depend on the data augmentation that you use. In their paper they did a lot of scales and aspect_ratios, while in Caffe we usually resize the image to 256x256 and distort the aspect ratio. |
@marutiagarwal if you used the solver.prototxt posted by @amiralush then you probably are not going to get much better because it decreases the learning rate too aggressively. |
Yes, I do second that. Here is an excerpt of my solver's config file:
Maybe it is slightly too often / too 'slow' of a step change, but I do reach 66.2% after 2.2M iterations on a slightly different dataset than the imagenet competition one (e.g. I did build mine from the Web from scratch a while ago, though I could use the original one now). |
@marutiagarwal , @sguada is right. As I stated at the beginning of this PR, the solver is a default one. Personally I've optimized the learning rate steps. I didn't publish this solver. |
@marutiagarwal paper says they are decreasing the learning rate by 4% every 8 epochs. 8 epochs was a bit long, but I've kept their step. My hunch, untested, is that it should be possible to take larger steps for 500K iterations or so, and then slow down. |
@amiralush many thanks for your contribution. Take a look at #1598 for a closer replication of GoogleNet, including solvers. |
@sguada hi I've trained googlenet use my own train data. |
Top-1 is accuracy. |
This is the GoogLeNet (as devised by Szegedy at al) train/test definition in Caffe I did as part of my work at Superfish Ltd.
I've trained it on large scale dataset we have here at Superfish and it's very robust. The attached solver is a default one. One can go ahead and follow the original paper for a more specific implementation of the solver.
The network definition is a bit cumbersome, one might want to implement an Inception module to make things simpler.
I did my best to follow the original paper.