Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

From the iteration 0,loss =NAN #5986

Closed
Phalange96 opened this issue Oct 16, 2017 · 4 comments
Closed

From the iteration 0,loss =NAN #5986

Phalange96 opened this issue Oct 16, 2017 · 4 comments

Comments

@Phalange96
Copy link

Phalange96 commented Oct 16, 2017

I am using my data to train bvlc_alexnet,and I didn't change the structure of alexnet .
when I run solver.prototxt ,I found loss=NAN at every iterion(from iteration 0).
I have tried reducing the learning rate to 0.000001,it didn't work.
I even set base_lr = 0,loss still equals to NAN from iteration 0.
it's disturbed me .because yangqing answered at issues#409 #409 (comment)

For a sanity check, try running with a learning rate 0 to see if any nan
errors pop up (they shouldn't, since no learning takes place). If data is
not initialized well, it might be possible that even 0.0001 is a too high
learning rate.

Yangqing

here is the solver,prototxt:

base_lr: 0.000001
lr_policy: "step"
gamma: 0.1
stepsize: 500
display: 1
max_iter: 2000
momentum: 0.9
weight_decay: 0.000

here is the output:

1508138445 INFO: src/caffe/solver.cpp : line 218 : Iteration 0 (-0.382098 iter/s, 183.199s/1 iters), loss = nan
1508138445 INFO: src/caffe/solver.cpp : line 237 : Train net output #0: loss = nan (* 1 = nan loss)
1508138445 INFO: src/caffe/solvers/sgd_solver.cpp : line 105 : Iteration 0, lr = 0
1508138506 INFO: src/caffe/solver.cpp : line 218 : Iteration 1 (0.0163991 iter/s, 60.979s/1 iters), loss = nan
1508138506 INFO: src/caffe/solver.cpp : line 237 : Train net output #0: loss = nan (* 1 = nan loss)
1508138506 INFO: src/caffe/solvers/sgd_solver.cpp : line 105 : Iteration 1, lr = 0
1508138567 INFO: src/caffe/solver.cpp : line 218 : Iteration 2 (0.0164096 iter/s, 60.94s/1 iters), loss = nan
1508138567 INFO: src/caffe/solver.cpp : line 237 : Train net output #0: loss = nan (* 1 = nan loss)
1508138567 INFO: src/caffe/solvers/sgd_solver.cpp : line 105 : Iteration 2, lr = 0
1508138628 INFO: src/caffe/solver.cpp : line 218 : Iteration 3 (0.0164134 iter/s, 60.926s/1 iters), loss = nan
1508138628 INFO: src/caffe/solver.cpp : line 237 : Train net output #0: loss = nan (* 1 = nan loss)
1508138628 INFO: src/caffe/solvers/sgd_solver.cpp : line 105 : Iteration 3, lr = 0
1508138689 INFO: src/caffe/solver.cpp : line 218 : Iteration 4 (0.0164325 iter/s, 60.855s/1 iters), loss = nan
1508138689 INFO: src/caffe/solver.cpp : line 237 : Train net output #0: loss = nan (* 1 = nan loss)

should I change the net structure more simple or did my data have some problem?
thanks very much!

@Phalange96 Phalange96 changed the title From the iteration 0,loss has been NAN From the iteration 0,loss =NAN Oct 16, 2017
@Noiredd
Copy link
Member

Noiredd commented Oct 17, 2017

Please do not post usage, installation, or modeling questions, or other requests for help to Issues. Use the caffe-users list instead. This helps developers maintain a clear, uncluttered, and efficient view of the state of Caffe. Please read the guidelines for contributing before submitting an issue or a pull request.

You might want to review your data as it is likely to be the source of NaNs, potentially also the caffemodel. I suggest using the python interface to inspect the inside of each blob.

@Noiredd Noiredd closed this as completed Oct 17, 2017
@shaibagon
Copy link
Member

You might find this SO thread useful.

@Phalange96
Copy link
Author

Thanks a lot! After I reduced the batch_size,the problem seems to be disappeared,at least,from 0th iterations to 5000th iterations,NAN error is disappeared,my program is running.
But it might also have other reasons,I‘ll check out.’Thanks very much!

@Phalange96
Copy link
Author

Phalange96 commented Oct 20, 2017 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants