DetectNet training fails with error code -9 #1362

SimonBirrell · 2016-12-26T08:11:01Z

Hi,

I've been following the object detection tutorial with one difference: I've compiled NVCaffe to be CPU only.

After a few hours of training, I get the following:

ERROR: error code -9

Ignoring source layer loss2/classifier
Ignoring source layer loss2/loss
Ignoring source layer pool4/3x3_s2
Ignoring source layer pool4/3x3_s2_pool4/3x3_s2_0_split
Ignoring source layer pool5/7x7_s1
Ignoring source layer pool5/drop_7x7_s1
Ignoring source layer loss3/classifier
Ignoring source layer loss3/loss3
Starting Optimization
Solving
Learning Rate Policy: step
Iteration 0, Testing net (#0)
Ignoring source layer train_data
Ignoring source layer train_label
Ignoring source layer train_transform
Test net output #0: loss_bbox = 18.4677 (* 2 = 36.9353 loss)
Test net output #1: loss_coverage = 329.823 (* 1 = 329.823 loss)
Test net output #2: mAP = 0
Test net output #3: precision = 0
Test net output #4: recall = 0

Any ideas? I can't find "-9" in the source code of either Caffe or DIGITS.

Thanks!

Simon

gheinrich · 2016-12-28T21:34:44Z

Hi @SimonBirrell, negative error codes are for Linux signals. Here, -9 is for SIGKILL . Your process was killed: a common cause is that it allocated too much memory. I have never tried to train DetectNet on CPU...

SimonBirrell · 2016-12-29T14:13:14Z

Thanks for that! I'm trying to run the training again with more physical memory.

Training without a GPU is obviously not ideal - I'm travelling away from the various nVidia workstations I have access to, and wanted to give Digits and object detection a try. So far, 2 hours of 1 CPU and 4GB RAM on a virtual machine haven't got to 1% of training, so we'll see if this is feasible at all. I'll report back!

gheinrich added object-detection question labels Dec 28, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DetectNet training fails with error code -9 #1362

DetectNet training fails with error code -9 #1362

SimonBirrell commented Dec 26, 2016

gheinrich commented Dec 28, 2016

SimonBirrell commented Dec 29, 2016

DetectNet training fails with error code -9 #1362

DetectNet training fails with error code -9 #1362

Comments

SimonBirrell commented Dec 26, 2016

gheinrich commented Dec 28, 2016

SimonBirrell commented Dec 29, 2016