Added CHECK if loss is NaN#1479
Conversation
|
I have 74 different softmax loss layers in one network, and some of them being nan can be safely ignored. So this PR will negatively impact my currently working network training. This should be an option in solver instead. |
|
@netheril96 ok, I guess in that case the training can recover from NaNs. I will see how easy is to add it as an option in the solver. |
There was a problem hiding this comment.
Could "due to" be misleading here? It may be that the NaN appears in some low layer but gets propagated up to the loss... (there are really two possible meanings of "due to" here, and I know which one you mean now, but I'm worried I'll forget when I see this message later).
… stop_on_nan is false
df2e1f8 to
4c174c5
Compare
|
More generally, perhaps this should die on any loss that isn't finite (i.e. NaN, +infinity, or -infinity). |
|
Thanks a lot - this is highly needed! I've spent way too long trying to understand why pycaffe was crashing when the network was diverging. Quite annoying when trying to do a parameter search over a large parameter space. |
|
@PiranjaF you could change it to break from the loop instead of crash. |
|
@seanbell I did not think it could be done that easily. Thanks. |
|
Closing since the dev branch is deprecated. Please send PRs to master. |
As mentioned in #1349 If Loss is Nan it doesn't make sense to keep training.