GoogLeNet training in Caffe #1367

amiralush · 2014-10-27T08:32:37Z

This is the GoogLeNet (as devised by Szegedy at al) train/test definition in Caffe I did as part of my work at Superfish Ltd.

I've trained it on large scale dataset we have here at Superfish and it's very robust. The attached solver is a default one. One can go ahead and follow the original paper for a more specific implementation of the solver.

The network definition is a bit cumbersome, one might want to implement an Inception module to make things simpler.

I did my best to follow the original paper.

weiliu89 · 2014-10-29T15:09:02Z

I have also trained the networks for a few weeks. What top1 accuracy on val have you achieved?

bigiceberg · 2014-10-30T02:30:12Z

I check the definition file and so far to my knowledge, it looks the same as the original paper. Just a small question about the Inception structure, why the "pad" is added to the conv3_reduce/conv5_reduce rather than conv3/conv5 itself? The final answer is exactly the same. But it's a waste of calculation since there is nothing to learn for the pading area in 1*1 convolutions reduce layer.

weiliu89 · 2014-10-30T02:37:52Z

I added the pad in conv3/conv5 instead of conv3_reduce/conv5_reduce, and it trained well.

dabilied · 2014-10-30T14:35:45Z

@weiliu89 I want to try the network,could you share me some information about GPU-memory and the time cost ,thank you!

amiralush · 2014-10-30T14:43:36Z

@dabilied please check #1317

weiliu89 · 2014-10-30T14:43:46Z

@dabilied I use batch size of 32 in train and 50 in test, it takes 3693MB in GPU, and 1min 23 second for 160 forward/backward iterations. For more detailed information about time, you can check #1317

weiliu89 · 2014-10-30T14:45:52Z

@amiralush I don't see any accuracy number, I saw that you plot the loss vs. iterations. What top1 accuracy on val have you achieved?

amiralush · 2014-10-30T14:46:48Z

@weiliu89 Since I'm using my own dataset it's not relevant.

weiliu89 · 2014-10-30T14:53:55Z

@amiralush Ok. I see. Thanks!

dgolden1 · 2014-10-30T19:04:31Z

examples/googlenet/googlenet_train_test.prototxt

+    batch_size: 50
+  }
+  transform_param {
+    crop_size: 227


I think that the original paper used a 224x224 crop. It probably makes minimal difference, but at least, you should probably make crop_size the same for the TRAIN and TEST phases (they're currently 228 and 227, respectively).

dgolden1 · 2014-10-31T19:40:41Z

examples/googlenet/googlenet_train_test.prototxt

+  bottom: "clf1_fc2"
+  bottom: "label"
+  top: "clf1_loss"
+}


Add loss_weight: 0.3

From the paper (page 6):

During training, their loss gets added to the total loss of the network with a discount weight (the losses of the auxiliary classifiers were weighted by 0.3).

dgolden1 · 2014-10-31T19:43:14Z

Note that this PR has some competition; @kmatzen has written a Python script to automatically generate GoogLeNet protobuffers: #1169 (comment)

Direct link: https://github.com/kmatzen/caffe/blob/inception_plan/python/inception_plan.py

(2) Moved padding from conv3_reduce/conv5_reduce to conv3/conv5. (3) Changed test crop size to be consistent with train crop-size.

amiralush · 2014-11-01T06:29:11Z

Thanks for your comments! I have committed these changes.

@drdan14 thanks for pointing this out, I wasn't aware of this before making this PR. I've actually trained this configuration and tested its convergence before making the PR after getting several requests.

okn2020 · 2014-11-01T11:21:59Z

@amiralush
Please check

inc2_conv5_reduce (3b) why there (and only there)
weight_decay: 1.0
weight_decay: 0.0

-also inc3_conv3 (4a)
why there std: 0.01 ??

dgolden1 · 2014-11-14T23:26:58Z

examples/googlenet/googlenet_train_test.prototxt

+layers {
+  bottom: "inc3_conv5_reduce"
+  top: "inc3_conv5_reduce"
+  name: "inc3_conv5_relu"


Typo: this should be inc3_conv5_reduce_relu. As it stands now, this name is a duplicate of a different layer.

futurely · 2014-11-17T01:06:39Z

#1169 (comment)

marutiagarwal · 2014-11-28T07:30:56Z

@amiralush - What does clf1_loss, clf2_loss refer to?

amiralush · 2014-11-28T17:57:17Z

@marutiagarwal these are the auxiliary loses.

marutiagarwal · 2014-12-05T03:11:27Z

@amiralush - I am training (on 1000 classes ILSVRC 2012 data) googlenet using train_test.prototxt provided in your repo. I am using batch size = 32. Since beyond this, I get memory error. Rest all the values are same as your proto files. After 400,000 iterations:

clf1_accuracy = 0.434675
clf2_accuracy = 0.458775
clf3_accuracy = 0.526625

While in the googlenet publication, the reported classification accuracy was close to ~93.3%. I am wondering what might be the problem in my training. Could you please suggest based on your experience. Thanks !

ducha-aiki · 2014-12-05T09:17:58Z

@marutiagarwal
They have top-5 (5 guesses from network allowed) accuracy. Also they have averaged results from several nets and lots of image crops (~144 network passes). I have little bit less accuracy than you in 400K iterations and after 1.4M iterations 0.644 top-1 accuracy. The reimplementation by caffe-maintainers has 0.688.
#1106 (comment)
#1106 (comment)

sguada · 2014-12-05T15:14:01Z

@marutiagarwal the performance of a single GoogleNet according to the authors is top-1 70% and top-5 89%, but is after approximately 250 epochs, i.e. with batch_size=32 represents 10M iterations.

In my implementation of GoogleNet I got top-1 68.7% and top-5 89%, after 60 epochs, with batch_size=32 represents 2.4M iterations. I will be sharing my prototxt and the model next week.

So your results after 400K iterations look pretty good, so just probably need to let it run for longer. Also the results depend on the data augmentation that you use. In their paper they did a lot of scales and aspect_ratios, while in Caffe we usually resize the image to 256x256 and distort the aspect ratio.

sguada · 2014-12-05T15:27:58Z

@marutiagarwal if you used the solver.prototxt posted by @amiralush then you probably are not going to get much better because it decreases the learning rate too aggressively.

beniz · 2014-12-05T15:44:48Z

Yes, I do second that. Here is an excerpt of my solver's config file:

base_lr: 0.01
display: 20
test_iter: 200
test_interval: 1000
max_iter: 2850000
lr_policy: "step"
stepsize: 25000
gamma: 0.96
momentum: 0.9
weight_decay: 0.0005

Maybe it is slightly too often / too 'slow' of a step change, but I do reach 66.2% after 2.2M iterations on a slightly different dataset than the imagenet competition one (e.g. I did build mine from the Web from scratch a while ago, though I could use the original one now).

amiralush · 2014-12-05T18:33:21Z

@marutiagarwal , @sguada is right. As I stated at the beginning of this PR, the solver is a default one. Personally I've optimized the learning rate steps. I didn't publish this solver.

marutiagarwal · 2014-12-08T05:39:24Z

@beniz - gamma = 0.96? Isn't it too high?
@sguada - Thanks for the pointers. I'll be glad to see ur proto files.

beniz · 2014-12-08T08:24:40Z

@marutiagarwal paper says they are decreasing the learning rate by 4% every 8 epochs. 8 epochs was a bit long, but I've kept their step. My hunch, untested, is that it should be possible to take larger steps for 500K iterations or so, and then slow down.

sguada · 2014-12-19T15:10:00Z

@amiralush many thanks for your contribution.

Take a look at #1598 for a closer replication of GoogleNet, including solvers.

mengfanr · 2015-09-15T08:01:10Z

@sguada hi I've trained googlenet use my own train data.
Test net output #0: loss1/loss1 = 2.17452 (* 0.3 = 0.652355 loss)
I0915 15:48:17.370787 2951 solver.cpp:414] Test net output #1: loss1/top-1 = 0.38588
I0915 15:48:17.370795 2951 solver.cpp:414] Test net output #2: loss1/top-5 = 0.727879
I0915 15:48:17.370802 2951 solver.cpp:414] Test net output #3: loss2/loss1 = 2.0714 (* 0.3 = 0.621419 loss)
I0915 15:48:17.370808 2951 solver.cpp:414] Test net output #4: loss2/top-1 = 0.40742
I0915 15:48:17.370813 2951 solver.cpp:414] Test net output #5: loss2/top-5 = 0.750621
I0915 15:48:17.370820 2951 solver.cpp:414] Test net output #6: loss3/loss3 = 2.10992 (* 1 = 2.10992 loss)
I0915 15:48:17.370825 2951 solver.cpp:414] Test net output #7: loss3/top-1 = 0.4066
I0915 15:48:17.370828 2951 solver.cpp:414] Test net output #8: loss3/top-5 = 0.74664
I don't understand the result I get. Could you explain how can I get the accuracy of my network?
especially what dose loss1/loss1 loss2/loss1 loss3/loss3 mean? and what does loss1/top-5 loss2/top-5 loss3/top-5 mean? which can be considered as the final accuracy of my work? Thanks!

ducha-aiki · 2015-09-15T09:04:43Z

Top-1 is accuracy.
Top-5 is accuracy when you allowed to make 5 guesses.

amiralush added 3 commits October 27, 2014 09:49

GoogLeNet train/test definition

8a1ef8b

GoogLeNet solver definition

5edaf8b

GoogLeNet example

3a7d773

amiralush mentioned this pull request Oct 27, 2014

Caffe Timings for GoogleNet, VGG, AlexNet with cuDNN #1317

Closed

dgolden1 reviewed Oct 30, 2014
View reviewed changes

dgolden1 mentioned this pull request Oct 31, 2014

Python log parser #1384

Merged

dgolden1 reviewed Oct 31, 2014
View reviewed changes

(1) Added weighting to auxiliary losses.

5e63cff

(2) Moved padding from conv3_reduce/conv5_reduce to conv3/conv5. (3) Changed test crop size to be consistent with train crop-size.

dgolden1 reviewed Nov 14, 2014
View reviewed changes

ducha-aiki mentioned this pull request Nov 27, 2014

How to implement the GoogleNet? #1106

Closed

sguada mentioned this pull request Nov 30, 2014

Modular model definitions #1290

Closed

sguada closed this Dec 19, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GoogLeNet training in Caffe #1367

GoogLeNet training in Caffe #1367

amiralush commented Oct 27, 2014

weiliu89 commented Oct 29, 2014

bigiceberg commented Oct 30, 2014

weiliu89 commented Oct 30, 2014

dabilied commented Oct 30, 2014

amiralush commented Oct 30, 2014

weiliu89 commented Oct 30, 2014

weiliu89 commented Oct 30, 2014

amiralush commented Oct 30, 2014

weiliu89 commented Oct 30, 2014

dgolden1 Oct 30, 2014

dgolden1 Oct 31, 2014

dgolden1 commented Oct 31, 2014

amiralush commented Nov 1, 2014

okn2020 commented Nov 1, 2014

dgolden1 Nov 14, 2014

futurely commented Nov 17, 2014

marutiagarwal commented Nov 28, 2014

amiralush commented Nov 28, 2014

marutiagarwal commented Dec 5, 2014

ducha-aiki commented Dec 5, 2014

sguada commented Dec 5, 2014

sguada commented Dec 5, 2014

beniz commented Dec 5, 2014

amiralush commented Dec 5, 2014

marutiagarwal commented Dec 8, 2014

beniz commented Dec 8, 2014

sguada commented Dec 19, 2014

mengfanr commented Sep 15, 2015

ducha-aiki commented Sep 15, 2015

GoogLeNet training in Caffe #1367

GoogLeNet training in Caffe #1367

Conversation

amiralush commented Oct 27, 2014

weiliu89 commented Oct 29, 2014

bigiceberg commented Oct 30, 2014

weiliu89 commented Oct 30, 2014

dabilied commented Oct 30, 2014

amiralush commented Oct 30, 2014

weiliu89 commented Oct 30, 2014

weiliu89 commented Oct 30, 2014

amiralush commented Oct 30, 2014

weiliu89 commented Oct 30, 2014

dgolden1 Oct 30, 2014

Choose a reason for hiding this comment

dgolden1 Oct 31, 2014

Choose a reason for hiding this comment

dgolden1 commented Oct 31, 2014

amiralush commented Nov 1, 2014

okn2020 commented Nov 1, 2014

dgolden1 Nov 14, 2014

Choose a reason for hiding this comment

futurely commented Nov 17, 2014

marutiagarwal commented Nov 28, 2014

amiralush commented Nov 28, 2014

marutiagarwal commented Dec 5, 2014

ducha-aiki commented Dec 5, 2014

sguada commented Dec 5, 2014

sguada commented Dec 5, 2014

beniz commented Dec 5, 2014

amiralush commented Dec 5, 2014

marutiagarwal commented Dec 8, 2014

beniz commented Dec 8, 2014

sguada commented Dec 19, 2014

mengfanr commented Sep 15, 2015

ducha-aiki commented Sep 15, 2015