Segmentation fault after training for a few iterations #2

IrenTang · 2019-12-04T07:12:53Z

@Zzh-tju Hi,
Thank you for your work, my loss converges a lot faster！But after a few iterations, I meet the same error as #356 when training on a custom dataset.

14: 485.570526, 994.591980 avg, 0.000000 rate, 0.189616 seconds, 56 images
Loaded: 0.000045 seconds
v3 (ciou loss, Normalizer: (iou: 0.500000, cls: 1.000000) Region 82 Avg (IOU: 0.103565, CIOU: 0.060234), Class: 0.565292, Obj: 0.683245, No Obj: 0.368090, .5R: 0.000000, .75R: 0.000000, count: 2
v3 (ciou loss, Normalizer: (iou: 0.500000, cls: 1.000000) Region 94 Avg (IOU: -nan, CIOU: -nan), Class: -nan, Obj: -nan, No Obj: 0.528647, .5R: -nan, .75R: -nan, count: 0
v3 (ciou loss, Normalizer: (iou: 0.500000, cls: 1.000000) Region 106 Avg (IOU: 0.136653, CIOU: 0.131538), Class: 0.449005, Obj: 0.167284, No Obj: 0.423050, .5R: 0.000000, .75R: 0.000000, count: 2
train_network_err: 962.811768
v3 (ciou loss, Normalizer: (iou: 0.500000, cls: 1.000000) Region 82 Avg (IOU: -nan, CIOU: -nan), Class: -nan, Obj: -nan, No Obj: 0.367310, .5R: -nan, .75R: -nan, count: 0
v3 (ciou loss, Normalizer: (iou: 0.500000, cls: 1.000000) Region 94 Avg (IOU: 0.434457, CIOU: 0.430856), Class: 0.575645, Obj: 0.622357, No Obj: 0.528018, .5R: 0.000000, .75R: 0.000000, count: 1
v3 (ciou loss, Normalizer: (iou: 0.500000, cls: 1.000000) Region 106 Avg (IOU: 0.173894, CIOU: 0.167250), Class: 0.229916, Obj: 0.276545, No Obj: 0.422309, .5R: 0.000000, .75R: 0.000000, count: 2
Segmentation fault

##################
The same data list works on the official YOLOv3(pjreddie/darknet).
It happened randomly, and there are not 0.0 in annotation .txt-files.
##################

What I have tried:

try random with 0 and 1
try batch/subdivisions with 64/32, 64/64, 32/16, 4/2
try width/height with 608,512,384
try nms_kind wiith diounms and greedynms

I didn't modify the calculation of bbox like this in Yolo3，as said AlexeyAB
b.x = (i + logistic_activate(x[index + 0])) / w

I am using Tesla P40 GPU memory size is 20GB. And my anchors = 38,41, 86,52, 65,103, 146,79, 103,164, 180,164, 245,107, 152,256, 254,247 when width=512.

Is there anything else I have missed and what else can cause this error？ Thanks!

The text was updated successfully, but these errors were encountered:

Zzh-tju · 2019-12-04T09:38:21Z

OK, I think this problem may caused by the dataset. Once I trained coco2017 before, it was bound to go wrong in hundreds of iters. Setting batch/subdivision to 64/32 will only delay the occurrence time of error report, maybe thousands of iters. And I trained coco2014, it didn't happen again.

Zzh-tju closed this as completed Apr 5, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Segmentation fault after training for a few iterations #2

Segmentation fault after training for a few iterations #2

IrenTang commented Dec 4, 2019

Zzh-tju commented Dec 4, 2019

Segmentation fault after training for a few iterations #2

Segmentation fault after training for a few iterations #2

Comments

IrenTang commented Dec 4, 2019

Zzh-tju commented Dec 4, 2019