too high loss #416

dccho · 2015-11-15T23:56:43Z

i'm using DIGIT 3.0.0-rc1 + NVIDIA/caffe v0.13.2.
Loss2 for training set (green graph) suddenly went to high. Here is log when it was happened.

I1115 20:08:15.366227  6494 solver.cpp:217] Iteration 75776, loss = 10.4095
I1115 20:08:15.366282  6494 solver.cpp:234]     Train net output #0: loss = 5.77672 (* 1 = 5.77672 loss)
I1115 20:08:15.366291  6494 solver.cpp:234]     Train net output #1: loss1/loss = 3.67123 (* 0.2 = 0.734247 loss)
I1115 20:08:15.366297  6494 solver.cpp:234]     Train net output #2: loss2/loss = 3.37401 (* 0.3 = 1.0122 loss)
I1115 20:08:15.366300  6494 solver.cpp:234]     Train net output #3: loss3/loss = 5.77274 (* 0.5 = 2.88637 loss)
I1115 20:08:15.817288  6494 solver.cpp:511] Iteration 75776 (0.393751/s), lr = 0.05
I1115 20:09:36.631261  6494 solver.cpp:217] Iteration 75808, loss = 52.4953
I1115 20:09:36.631345  6494 solver.cpp:234]     Train net output #0: loss = 5.88328 (* 1 = 5.88328 loss)
I1115 20:09:36.631353  6494 solver.cpp:234]     Train net output #1: loss1/loss = 87.3365 (* 0.2 = 17.4673 loss)
I1115 20:09:36.631358  6494 solver.cpp:234]     Train net output #2: loss2/loss = 87.3365 (* 0.3 = 26.201 loss)
I1115 20:09:36.631363  6494 solver.cpp:234]     Train net output #3: loss3/loss = 5.88758 (* 0.5 = 2.94379 loss)

The text was updated successfully, but these errors were encountered:

lukeyeager · 2015-11-17T18:45:54Z

Thanks for the bug report @dccho. Unfortunately, I don't know what would cause this.

Has this happened to you more than once? Is the problem reproducible? Try it again. If you can provide some extra information, the the engineers working on NVIDIA/caffe should be able to help.

GPU type[s]
Driver version
CUDA version
cuDNN version

dccho · 2015-12-11T02:25:50Z

It happen frequently now. I just ran googlenet in standard networks without any change.

GPU : 4 x Titan X
Driver : 352.63
CUDA : 7.5
cuDNN : v3

dccho · 2015-12-11T06:59:31Z

Probably it is problem with combination of several unstable libraries. Now I change DIGITS to 3.0-rc3 with cuDNN v4, NVCaffe-0.14, then it works fine.

lukeyeager · 2015-12-11T16:36:38Z

Glad you got your problem worked out - sorry I wasn't more help debugging! I'll close this now, but anyone who can add to the discussion should still feel free to do so.

gheinrich · 2015-12-12T04:55:42Z

It would be interesting to know if the high loss issue with NVCaffe-0.13 also occurred during single-GPU training?

lukeyeager added the bug label Nov 17, 2015

lukeyeager closed this as completed Dec 11, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

too high loss #416

too high loss #416

dccho commented Nov 15, 2015

lukeyeager commented Nov 17, 2015

dccho commented Dec 11, 2015

dccho commented Dec 11, 2015

lukeyeager commented Dec 11, 2015

gheinrich commented Dec 12, 2015

too high loss #416

too high loss #416

Comments

dccho commented Nov 15, 2015

lukeyeager commented Nov 17, 2015

dccho commented Dec 11, 2015

dccho commented Dec 11, 2015

lukeyeager commented Dec 11, 2015

gheinrich commented Dec 12, 2015