Finetuning issues, loss: -nan after 100 iterations #644

wendlerc · 2014-07-08T08:30:45Z

Hello,

First of all, I know that several issues regarding that topic already exists, unfortunately none of them provided sufficient information for me to solve my problem.

I was trying to reuse the pretrained imagenet model in order to solve a binary classification task. So what did I do:

I took the aeroplane images and labels from the pascal VOC2007 dataset and converted it into a suitable format (resize the images to 256x256 and put the label 1 for aeroplane and 0 for not aeroplane) for convert_imageset.bin
I generated the leveldb and mean.protobin files using convert_imageset.bin (in create_imagenet.sh) and make_imagenet_mean.sh
I renamed the last fully connected layer from the imagenet_train/val.prototxt and reduced it's number of outputs to 2, as a solver I took the solver definition from the pascal-finetune example.
I called finetune_net.bin

What I got was the following:

The tuning was really slow (my imageset consists of ~2500 images and I run the solver in cpumode)
After a certain amount of iterations the loss becomes -nan

I0707 17:01:49.294651 13063 solver.cpp:106] Iteration 0, Testing net
I0707 17:20:26.931828 13063 solver.cpp:142] Test score #0: 0.002
I0707 17:20:26.931887 13063 solver.cpp:142] Test score #1: 1.84863
I0707 22:03:18.925554 13063 solver.cpp:237] Iteration 100, lr = 0.001
I0707 22:03:19.511451 13063 solver.cpp:87] Iteration 100, loss = -nan

Additionally I made a few runs with slightly different network definitions, e.g. use all the imagenet layers and put an extra fully connected layer with 2 outputs on top or use just 1 output, but these failed as well with the same output.

I did not find much documentation on finetuning, except the slides in the presentation and several issues #31 #328 #140 and more.

I am new to caffe and it is my first time that I work with neural networks, therefore please don't be afraid of writing detailed answers. E.g. Is it sufficient to just reduce the number of outputs of the last fully connected layer in order to make the imagenet suitable for a binary classification task?

Best regards,

Chris

sguada · 2014-07-08T13:14:57Z

Try a smaller base_lr

On Tuesday, July 8, 2014, mezN notifications@github.com wrote:

Hello,

First of all, I know that several issues regarding that topic already
exists, unfortunately none of them provided sufficient information for me
to solve my problem.

I was trying to reuse the pretrained imagenet model in order to solve a
binary classification task. So what did I do:

I took the aeroplane images and labels from the pascal VOC2007 dataset
and converted it into a suitable format for the convert_imageset.bin

I generated the leveldb and mean.protobin files using
convert_imageset.bin (in create_imagenet.sh) and make_imagenet_mean.sh

I renamed the last fully connected layer from the
imagenet_train/val.prototxt and reduced it's number of outputs to 2, as a
solver I took the solver definition from the pascal-finetune example.

I called finetune_net.bin

What I got was the following:

The tuning was really slow (my imageset consists of ~2500 images and I
run the solver in cpumode)

After a certain amount of iterations the loss becomes -nan

I0707 17:01:49.294651 13063 solver.cpp:106] Iteration 0, Testing net
I0707 17:20:26.931828 13063 solver.cpp:142] Test score #0: 0.002
I0707 17:20:26.931887 13063 solver.cpp:142] Test score #1
#1: 1.84863
I0707 22:03:18.925554 13063 solver.cpp:237] Iteration 100, lr = 0.001
I0707 22:03:19.511451 13063 solver.cpp:87] Iteration 100, loss = -nan

I did not find much documentation on finetuning, except the slides in the
presentation and several issues #31
#31 #328
#328 #140
#140 and more.

I am new to caffe and it is my first time that I work with neural
networks, therefore please don't be afraid of writing detailed answers.
E.g. Is it sufficient to just reduce the number of outputs of the last
fully connected layer in order to make the imagenet suitable for a binary
classification task?

Best regards,

Chris

—
Reply to this email directly or view it on GitHub
#644.

Sergio

wendlerc · 2014-07-08T13:35:53Z

I reduced the size of my dataset and also changed the proportion e.g. 50 aeroplanes 100 no-aeroplanes, also I reduced the learning rate and now it seems to work. Thanks, would you mind telling me what exactly momentum and weight_decay does? I understood all parameters of the solver.prototxt except them now :)

sguada · 2014-07-08T19:32:01Z

Take a look here
http://leon.bottou.org/research/stochastic

Sergio

2014-07-08 6:35 GMT-07:00 mezN notifications@github.com:

I reduced the size of my dataset and also changed the proportion e.g. 50
aeroplanes 100 no-aeroplanes, also I reduced the learning rate and now it
seems to work. Thanks, would you mind telling me what exactly momentum and
weight_decay does? I understood all parameters of the solver.prototxt
except them now :)

—
Reply to this email directly or view it on GitHub
#644 (comment).

wendlerc · 2014-07-08T19:33:19Z

thanks, I closed this issue :)

caffecuda · 2014-07-10T15:36:16Z

@Mezn Hi, can you please look at #631? Many thanks.

wendlerc mentioned this issue Jul 8, 2014

how to do classification with SGD? #541

Closed

wendlerc closed this as completed Jul 8, 2014

wendlerc mentioned this issue Jul 9, 2014

Re-train the last layers of a pre-trained model #637

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Finetuning issues, loss: -nan after 100 iterations #644

Finetuning issues, loss: -nan after 100 iterations #644

wendlerc commented Jul 8, 2014

sguada commented Jul 8, 2014

wendlerc commented Jul 8, 2014

sguada commented Jul 8, 2014

wendlerc commented Jul 8, 2014

caffecuda commented Jul 10, 2014

Finetuning issues, loss: -nan after 100 iterations #644

Finetuning issues, loss: -nan after 100 iterations #644

Comments

wendlerc commented Jul 8, 2014

sguada commented Jul 8, 2014

wendlerc commented Jul 8, 2014

sguada commented Jul 8, 2014

wendlerc commented Jul 8, 2014

caffecuda commented Jul 10, 2014