weights become very large and then loss = nan #373

chyh1990 · 2014-04-27T07:31:13Z

I use caffe to train my CNN but loss become nan after a few thousand iterations.
I dump the weights before that iterations and I found that some weights in the inner product layers become very large (e.g. +3e29). I check my shuffled input data and make sure they are in reasonable range.

I represent this in both GPU mode and CPU mode. Does this indicate some numeric problems in caffe or other causes?

aravindhm · 2014-04-27T17:15:05Z

I had similar issues when the learning rate was too high for my network. Lowering the learning rate solved the problem for me (I don't think it was a numerical problem in caffe). An indicator for high learning rate is a diverging error value and this typically leads to NaN in my experiments.

chyh1990 · 2014-04-27T17:26:08Z

I don't think it is the learning rate's problem. I dump out the weight of my network and find that only one of the bias value in the last inner product layer become 3e+29 after a specific iteration, but all other weights seems good (e.g. between -10.0 and 10.0). Strange...

smiley19 · 2014-05-13T17:15:32Z

@chyh1990 I think I might have same problem with you.
But I don't know how to check all the weights and bias in the model.
I am a beginner in python. I try the example for filter_visualization.ipynb but there are some errors.
Could you tell me how to get all parameters in my training model? Thanks a lot!

sguada · 2014-05-13T17:34:20Z

Try to set initial bias to 0.1 in all layers, or add regularization to the
bias (just set all the weight_decays in the layers to 1)

On Tuesday, May 13, 2014, smiley19 notifications@github.com wrote:

@chyh1990 https://github.com/chyh1990 I think I might have same problem
with you.
But I don't know how to check all the weights and bias in the model.
I am a beginner in python. I try the example for
filter_visualization.ipynb but there are some errors.
Could you tell me how to get all parameters in my training model? Thanks a
lot!

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/373#issuecomment-42984262
.

Sergio

smiley19 · 2014-05-15T02:55:36Z

@chyh1990 Thanks for your help, you do gave me a big favor!

shelhamer · 2014-05-15T17:19:46Z

Re: #373 (comment) about WriteProtoToTextFile, this is fixed in dev by #417.

Yangqing · 2014-06-23T18:36:11Z

(I assume this has been addressed, but feel free to reopen the issue should there be further questions.)

henuwpf · 2015-04-15T02:07:30Z

@chyh1990 Could you tell me how to get all parameters in my training model? Thanks very much!

hyojinie · 2016-03-08T22:00:33Z

I understand that regularization gets rid of unreasonable values for biases and weights, but I wonder how could a loss become NaN? Is it because those values resulted the loss to be -inf?

Yangqing closed this as completed Jun 23, 2014

niuzhiheng mentioned this issue Jun 24, 2014

How to adjust the parameters correctly ? niuzhiheng/caffe#6

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

weights become very large and then loss = nan #373

weights become very large and then loss = nan #373

chyh1990 commented Apr 27, 2014

aravindhm commented Apr 27, 2014

chyh1990 commented Apr 27, 2014

smiley19 commented May 13, 2014

sguada commented May 13, 2014

smiley19 commented May 15, 2014

shelhamer commented May 15, 2014

Yangqing commented Jun 23, 2014

henuwpf commented Apr 15, 2015

hyojinie commented Mar 8, 2016

weights become very large and then loss = nan #373

weights become very large and then loss = nan #373

Comments

chyh1990 commented Apr 27, 2014

aravindhm commented Apr 27, 2014

chyh1990 commented Apr 27, 2014

smiley19 commented May 13, 2014

sguada commented May 13, 2014

smiley19 commented May 15, 2014

shelhamer commented May 15, 2014

Yangqing commented Jun 23, 2014

henuwpf commented Apr 15, 2015

hyojinie commented Mar 8, 2016