Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GoogLeNet training in Caffe #1367

Closed
wants to merge 4 commits into from
Closed

GoogLeNet training in Caffe #1367

wants to merge 4 commits into from

Conversation

amiralush
Copy link

This is the GoogLeNet (as devised by Szegedy at al) train/test definition in Caffe I did as part of my work at Superfish Ltd.

I've trained it on large scale dataset we have here at Superfish and it's very robust. The attached solver is a default one. One can go ahead and follow the original paper for a more specific implementation of the solver.

The network definition is a bit cumbersome, one might want to implement an Inception module to make things simpler.

I did my best to follow the original paper.

@weiliu89
Copy link

I have also trained the networks for a few weeks. What top1 accuracy on val have you achieved?

@bigiceberg
Copy link

I check the definition file and so far to my knowledge, it looks the same as the original paper. Just a small question about the Inception structure, why the "pad" is added to the conv3_reduce/conv5_reduce rather than conv3/conv5 itself? The final answer is exactly the same. But it's a waste of calculation since there is nothing to learn for the pading area in 1*1 convolutions reduce layer.

@weiliu89
Copy link

I added the pad in conv3/conv5 instead of conv3_reduce/conv5_reduce, and it trained well.

@dabilied
Copy link

@weiliu89 I want to try the network,could you share me some information about GPU-memory and the time cost ,thank you!

@amiralush
Copy link
Author

@dabilied please check #1317

@weiliu89
Copy link

@dabilied I use batch size of 32 in train and 50 in test, it takes 3693MB in GPU, and 1min 23 second for 160 forward/backward iterations. For more detailed information about time, you can check #1317

@weiliu89
Copy link

@amiralush I don't see any accuracy number, I saw that you plot the loss vs. iterations. What top1 accuracy on val have you achieved?

@amiralush
Copy link
Author

@weiliu89 Since I'm using my own dataset it's not relevant.

@weiliu89
Copy link

@amiralush Ok. I see. Thanks!

batch_size: 50
}
transform_param {
crop_size: 227
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that the original paper used a 224x224 crop. It probably makes minimal difference, but at least, you should probably make crop_size the same for the TRAIN and TEST phases (they're currently 228 and 227, respectively).

@dgolden1 dgolden1 mentioned this pull request Oct 31, 2014
bottom: "clf1_fc2"
bottom: "label"
top: "clf1_loss"
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add loss_weight: 0.3

From the paper (page 6):

During training, their loss gets added to the total loss of the network with a discount weight (the losses of the auxiliary classifiers were weighted by 0.3).

@dgolden1
Copy link
Contributor

Note that this PR has some competition; @kmatzen has written a Python script to automatically generate GoogLeNet protobuffers: #1169 (comment)

Direct link: https://github.com/kmatzen/caffe/blob/inception_plan/python/inception_plan.py

(2) Moved padding from conv3_reduce/conv5_reduce to conv3/conv5.
(3)  Changed test crop size to be consistent with train crop-size.
@amiralush
Copy link
Author

Thanks for your comments! I have committed these changes.

@drdan14 thanks for pointing this out, I wasn't aware of this before making this PR. I've actually trained this configuration and tested its convergence before making the PR after getting several requests.

@okn2020
Copy link

okn2020 commented Nov 1, 2014

@amiralush
Please check

  • inc2_conv5_reduce (3b) why there (and only there)
    weight_decay: 1.0
    weight_decay: 0.0

-also inc3_conv3 (4a)
why there std: 0.01 ??

layers {
bottom: "inc3_conv5_reduce"
top: "inc3_conv5_reduce"
name: "inc3_conv5_relu"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: this should be inc3_conv5_reduce_relu. As it stands now, this name is a duplicate of a different layer.

@futurely
Copy link

#1169 (comment)

@marutiagarwal
Copy link

@amiralush - What does clf1_loss, clf2_loss refer to?

@amiralush
Copy link
Author

@marutiagarwal these are the auxiliary loses.

@marutiagarwal
Copy link

@amiralush - I am training (on 1000 classes ILSVRC 2012 data) googlenet using train_test.prototxt provided in your repo. I am using batch size = 32. Since beyond this, I get memory error. Rest all the values are same as your proto files. After 400,000 iterations:

clf1_accuracy = 0.434675
clf2_accuracy = 0.458775
clf3_accuracy = 0.526625

While in the googlenet publication, the reported classification accuracy was close to ~93.3%. I am wondering what might be the problem in my training. Could you please suggest based on your experience. Thanks !

@ducha-aiki
Copy link
Contributor

@marutiagarwal
They have top-5 (5 guesses from network allowed) accuracy. Also they have averaged results from several nets and lots of image crops (~144 network passes). I have little bit less accuracy than you in 400K iterations and after 1.4M iterations 0.644 top-1 accuracy. The reimplementation by caffe-maintainers has 0.688.
#1106 (comment)
#1106 (comment)

@sguada
Copy link
Contributor

sguada commented Dec 5, 2014

@marutiagarwal the performance of a single GoogleNet according to the authors is top-1 70% and top-5 89%, but is after approximately 250 epochs, i.e. with batch_size=32 represents 10M iterations.

In my implementation of GoogleNet I got top-1 68.7% and top-5 89%, after 60 epochs, with batch_size=32 represents 2.4M iterations. I will be sharing my prototxt and the model next week.

So your results after 400K iterations look pretty good, so just probably need to let it run for longer. Also the results depend on the data augmentation that you use. In their paper they did a lot of scales and aspect_ratios, while in Caffe we usually resize the image to 256x256 and distort the aspect ratio.

@sguada
Copy link
Contributor

sguada commented Dec 5, 2014

@marutiagarwal if you used the solver.prototxt posted by @amiralush then you probably are not going to get much better because it decreases the learning rate too aggressively.

@beniz
Copy link

beniz commented Dec 5, 2014

Yes, I do second that. Here is an excerpt of my solver's config file:

base_lr: 0.01
display: 20
test_iter: 200
test_interval: 1000
max_iter: 2850000
lr_policy: "step"
stepsize: 25000
gamma: 0.96
momentum: 0.9
weight_decay: 0.0005

Maybe it is slightly too often / too 'slow' of a step change, but I do reach 66.2% after 2.2M iterations on a slightly different dataset than the imagenet competition one (e.g. I did build mine from the Web from scratch a while ago, though I could use the original one now).

@amiralush
Copy link
Author

@marutiagarwal , @sguada is right. As I stated at the beginning of this PR, the solver is a default one. Personally I've optimized the learning rate steps. I didn't publish this solver.

@marutiagarwal
Copy link

@beniz - gamma = 0.96? Isn't it too high?
@sguada - Thanks for the pointers. I'll be glad to see ur proto files.

@beniz
Copy link

beniz commented Dec 8, 2014

@marutiagarwal paper says they are decreasing the learning rate by 4% every 8 epochs. 8 epochs was a bit long, but I've kept their step. My hunch, untested, is that it should be possible to take larger steps for 500K iterations or so, and then slow down.

@sguada
Copy link
Contributor

sguada commented Dec 19, 2014

@amiralush many thanks for your contribution.

Take a look at #1598 for a closer replication of GoogleNet, including solvers.

@sguada sguada closed this Dec 19, 2014
@mengfanr
Copy link

@sguada hi I've trained googlenet use my own train data.
Test net output #0: loss1/loss1 = 2.17452 (* 0.3 = 0.652355 loss)
I0915 15:48:17.370787 2951 solver.cpp:414] Test net output #1: loss1/top-1 = 0.38588
I0915 15:48:17.370795 2951 solver.cpp:414] Test net output #2: loss1/top-5 = 0.727879
I0915 15:48:17.370802 2951 solver.cpp:414] Test net output #3: loss2/loss1 = 2.0714 (* 0.3 = 0.621419 loss)
I0915 15:48:17.370808 2951 solver.cpp:414] Test net output #4: loss2/top-1 = 0.40742
I0915 15:48:17.370813 2951 solver.cpp:414] Test net output #5: loss2/top-5 = 0.750621
I0915 15:48:17.370820 2951 solver.cpp:414] Test net output #6: loss3/loss3 = 2.10992 (* 1 = 2.10992 loss)
I0915 15:48:17.370825 2951 solver.cpp:414] Test net output #7: loss3/top-1 = 0.4066
I0915 15:48:17.370828 2951 solver.cpp:414] Test net output #8: loss3/top-5 = 0.74664
I don't understand the result I get. Could you explain how can I get the accuracy of my network?
especially what dose loss1/loss1 loss2/loss1 loss3/loss3 mean? and what does loss1/top-5 loss2/top-5 loss3/top-5 mean? which can be considered as the final accuracy of my work? Thanks!

@ducha-aiki
Copy link
Contributor

Top-1 is accuracy.
Top-5 is accuracy when you allowed to make 5 guesses.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.