From the iteration 0,loss =NAN #5986

Phalange96 · 2017-10-16T10:19:34Z

I am using my data to train bvlc_alexnet,and I didn't change the structure of alexnet .
when I run solver.prototxt ,I found loss=NAN at every iterion(from iteration 0).
I have tried reducing the learning rate to 0.000001,it didn't work.
I even set base_lr = 0,loss still equals to NAN from iteration 0.
it's disturbed me .because yangqing answered at issues#409 #409 (comment)

For a sanity check, try running with a learning rate 0 to see if any nan
errors pop up (they shouldn't, since no learning takes place). If data is
not initialized well, it might be possible that even 0.0001 is a too high
learning rate.

Yangqing

here is the solver,prototxt:

base_lr: 0.000001
lr_policy: "step"
gamma: 0.1
stepsize: 500
display: 1
max_iter: 2000
momentum: 0.9
weight_decay: 0.000

here is the output:

1508138445 INFO: src/caffe/solver.cpp : line 218 : Iteration 0 (-0.382098 iter/s, 183.199s/1 iters), loss = nan
1508138445 INFO: src/caffe/solver.cpp : line 237 : Train net output #0: loss = nan (* 1 = nan loss)
1508138445 INFO: src/caffe/solvers/sgd_solver.cpp : line 105 : Iteration 0, lr = 0
1508138506 INFO: src/caffe/solver.cpp : line 218 : Iteration 1 (0.0163991 iter/s, 60.979s/1 iters), loss = nan
1508138506 INFO: src/caffe/solver.cpp : line 237 : Train net output #0: loss = nan (* 1 = nan loss)
1508138506 INFO: src/caffe/solvers/sgd_solver.cpp : line 105 : Iteration 1, lr = 0
1508138567 INFO: src/caffe/solver.cpp : line 218 : Iteration 2 (0.0164096 iter/s, 60.94s/1 iters), loss = nan
1508138567 INFO: src/caffe/solver.cpp : line 237 : Train net output #0: loss = nan (* 1 = nan loss)
1508138567 INFO: src/caffe/solvers/sgd_solver.cpp : line 105 : Iteration 2, lr = 0
1508138628 INFO: src/caffe/solver.cpp : line 218 : Iteration 3 (0.0164134 iter/s, 60.926s/1 iters), loss = nan
1508138628 INFO: src/caffe/solver.cpp : line 237 : Train net output #0: loss = nan (* 1 = nan loss)
1508138628 INFO: src/caffe/solvers/sgd_solver.cpp : line 105 : Iteration 3, lr = 0
1508138689 INFO: src/caffe/solver.cpp : line 218 : Iteration 4 (0.0164325 iter/s, 60.855s/1 iters), loss = nan
1508138689 INFO: src/caffe/solver.cpp : line 237 : Train net output #0: loss = nan (* 1 = nan loss)

should I change the net structure more simple or did my data have some problem?
thanks very much!

Noiredd · 2017-10-17T10:34:26Z

Please do not post usage, installation, or modeling questions, or other requests for help to Issues. Use the caffe-users list instead. This helps developers maintain a clear, uncluttered, and efficient view of the state of Caffe. Please read the guidelines for contributing before submitting an issue or a pull request.

You might want to review your data as it is likely to be the source of NaNs, potentially also the caffemodel. I suggest using the python interface to inspect the inside of each blob.

shaibagon · 2017-10-17T10:36:32Z

You might find this SO thread useful.

Phalange96 · 2017-10-18T03:11:10Z

Thanks a lot! After I reduced the batch_size,the problem seems to be disappeared,at least,from 0th iterations to 5000th iterations,NAN error is disappeared,my program is running.
But it might also have other reasons，I‘ll check out.’Thanks very much!

Phalange96 · 2017-10-20T11:27:10Z

thanks a lot ,I am new to caffe , and I've been having this problem recently: Check failed: error == cudaSuccess (3 vs. 0) initialization error. luckly I found your question about this problem on stack overfllow: https://stackoverflow.com/questions/43756686/check-failed-error-cudasuccess-3-vs-0-initialization-error-check-fail I think it's fate. LOL! would you please give me some suggestion on this new problem? thank you very much! At 2017-10-17 18:38:11, "Shai" <notifications@github.com> wrote: You might find this SO thread useful. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

Phalange96 changed the title ~~From the iteration 0,loss has been NAN~~ From the iteration 0,loss =NAN Oct 16, 2017

Noiredd closed this as completed Oct 17, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

From the iteration 0,loss =NAN #5986

From the iteration 0,loss =NAN #5986

Phalange96 commented Oct 16, 2017 •

edited

Noiredd commented Oct 17, 2017

shaibagon commented Oct 17, 2017

Phalange96 commented Oct 18, 2017

Phalange96 commented Oct 20, 2017 via email

From the iteration 0,loss =NAN #5986

From the iteration 0,loss =NAN #5986

Comments

Phalange96 commented Oct 16, 2017 • edited

Noiredd commented Oct 17, 2017

shaibagon commented Oct 17, 2017

Phalange96 commented Oct 18, 2017

Phalange96 commented Oct 20, 2017 via email

Phalange96 commented Oct 16, 2017 •

edited