strange training log of imagenet #1102

zcyang · 2014-09-18T03:17:41Z

Hi,

I am using the default model (imagenet_train_val.prototxt) to train imagenet. After many iterations, I got something strange:

I0917 15:36:23.184696 2254 solver.cpp:270] Test score #0: 0.00699999
I0917 15:36:23.184826 2254 solver.cpp:270] Test score #1: 15.3443
I0917 15:36:24.644654 2254 solver.cpp:195] Iteration 30000, loss = 0.00245684
I0917 15:36:24.644719 2254 solver.cpp:365] Iteration 30000, lr = 0.01
I0917 15:36:59.604066 2254 solver.cpp:195] Iteration 30020, loss = 0.00182616
I0917 15:36:59.604195 2254 solver.cpp:365] Iteration 30020, lr = 0.01
I0917 15:37:34.563117 2254 solver.cpp:195] Iteration 30040, loss = 0.000589138
I0917 15:37:34.563244 2254 solver.cpp:365] Iteration 30040, lr = 0.01
I0917 15:38:09.522680 2254 solver.cpp:195] Iteration 30060, loss = 0.00313978
I0917 15:38:09.522809 2254 solver.cpp:365] Iteration 30060, lr = 0.01
I0917 15:38:44.481019 2254 solver.cpp:195] Iteration 30080, loss = 0.00256942
I0917 15:38:44.481150 2254 solver.cpp:365] Iteration 30080, lr = 0.01
I0917 15:39:19.437052 2254 solver.cpp:195] Iteration 30100, loss = 0.000853064
I0917 15:39:19.437180 2254 solver.cpp:365] Iteration 30100, lr = 0.01
I0917 15:39:54.397054 2254 solver.cpp:195] Iteration 30120, loss = 0.00521982
I0917 15:39:54.397181 2254 solver.cpp:365] Iteration 30120, lr = 0.01

The test score #0 stays around 0.005 from the every beginning and does not change much during the long process.
Moreover, the test score #1 is around 15 which is much greater than 0.005.

Did anyone have this problem? what's the potential causes for this?

Can someone share me the training log?

thanks,

zcyang · 2014-09-19T01:26:25Z

I also checked the training accuracy, it gets to 100%!!!
Is it the problem with data? But I did everything according to the manual...

wkal · 2014-09-19T04:09:22Z

I had tried it in other data set, it looked fine, the only thing I'm feeling not good is that the accuracy not reach to my expectation.

zcyang · 2014-09-19T04:55:27Z

How did you generate the data? Did you use convert_imageset.cpp to resize the image and then convert it to leveldb?

wkal · 2014-09-19T05:16:18Z

I generate the data by call the tools convert_imageset.bin and compute_image_mean.bin that located in build directory, the detail steps followed the official tutorial.

wkal · 2014-09-19T05:26:46Z

And when you type convert_imageset.bin abd compute_image_mean.bin in command line with the option --h, it will show the usage info, I followed the info and did it.

zcyang · 2014-09-19T05:30:12Z

I following the tutorial strictly as well... weird...
Did you first convert the image to 256*256 and then use convert_imageset.bin to store the image to leveldb?
Or did you use convert_imageset.bin to do them in one step?

wkal · 2014-09-19T05:37:41Z

Well, I forget the details! I think maybe the two approaches are same, but I suggest first resize the images to 256*256, in this way, we can save time for convert_imageset.bin. I have little remember that I maybe using this approach! But I can't sure, because when I do this preprocessing, I tried many times.

zcyang · 2014-09-21T01:53:46Z

problem solved, it's the problem of the data...

zcyang closed this as completed Sep 21, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

strange training log of imagenet #1102

strange training log of imagenet #1102

zcyang commented Sep 18, 2014

zcyang commented Sep 19, 2014

wkal commented Sep 19, 2014

zcyang commented Sep 19, 2014

wkal commented Sep 19, 2014

wkal commented Sep 19, 2014

zcyang commented Sep 19, 2014

wkal commented Sep 19, 2014

zcyang commented Sep 21, 2014

strange training log of imagenet #1102

strange training log of imagenet #1102

Comments

zcyang commented Sep 18, 2014

zcyang commented Sep 19, 2014

wkal commented Sep 19, 2014

zcyang commented Sep 19, 2014

wkal commented Sep 19, 2014

wkal commented Sep 19, 2014

zcyang commented Sep 19, 2014

wkal commented Sep 19, 2014

zcyang commented Sep 21, 2014