Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation fault after training for a few iterations #2

Closed
IrenTang opened this issue Dec 4, 2019 · 1 comment
Closed

Segmentation fault after training for a few iterations #2

IrenTang opened this issue Dec 4, 2019 · 1 comment

Comments

@IrenTang
Copy link

IrenTang commented Dec 4, 2019

@Zzh-tju Hi,
Thank you for your work, my loss converges a lot faster!But after a few iterations, I meet the same error as #356 when training on a custom dataset.

14: 485.570526, 994.591980 avg, 0.000000 rate, 0.189616 seconds, 56 images
Loaded: 0.000045 seconds
v3 (ciou loss, Normalizer: (iou: 0.500000, cls: 1.000000) Region 82 Avg (IOU: 0.103565, CIOU: 0.060234), Class: 0.565292, Obj: 0.683245, No Obj: 0.368090, .5R: 0.000000, .75R: 0.000000, count: 2
v3 (ciou loss, Normalizer: (iou: 0.500000, cls: 1.000000) Region 94 Avg (IOU: -nan, CIOU: -nan), Class: -nan, Obj: -nan, No Obj: 0.528647, .5R: -nan, .75R: -nan, count: 0
v3 (ciou loss, Normalizer: (iou: 0.500000, cls: 1.000000) Region 106 Avg (IOU: 0.136653, CIOU: 0.131538), Class: 0.449005, Obj: 0.167284, No Obj: 0.423050, .5R: 0.000000, .75R: 0.000000, count: 2
train_network_err: 962.811768
v3 (ciou loss, Normalizer: (iou: 0.500000, cls: 1.000000) Region 82 Avg (IOU: -nan, CIOU: -nan), Class: -nan, Obj: -nan, No Obj: 0.367310, .5R: -nan, .75R: -nan, count: 0
v3 (ciou loss, Normalizer: (iou: 0.500000, cls: 1.000000) Region 94 Avg (IOU: 0.434457, CIOU: 0.430856), Class: 0.575645, Obj: 0.622357, No Obj: 0.528018, .5R: 0.000000, .75R: 0.000000, count: 1
v3 (ciou loss, Normalizer: (iou: 0.500000, cls: 1.000000) Region 106 Avg (IOU: 0.173894, CIOU: 0.167250), Class: 0.229916, Obj: 0.276545, No Obj: 0.422309, .5R: 0.000000, .75R: 0.000000, count: 2
Segmentation fault

##################
The same data list works on the official YOLOv3(pjreddie/darknet).
It happened randomly, and there are not 0.0 in annotation .txt-files.
##################

What I have tried:

  1. try random with 0 and 1
  2. try batch/subdivisions with 64/32, 64/64, 32/16, 4/2
  3. try width/height with 608,512,384
  4. try nms_kind wiith diounms and greedynms

I didn't modify the calculation of bbox like this in Yolo3,as said AlexeyAB
b.x = (i + logistic_activate(x[index + 0])) / w

I am using Tesla P40 GPU memory size is 20GB. And my anchors = 38,41, 86,52, 65,103, 146,79, 103,164, 180,164, 245,107, 152,256, 254,247 when width=512.

Is there anything else I have missed and what else can cause this error? Thanks!

@Zzh-tju
Copy link
Owner

Zzh-tju commented Dec 4, 2019

OK, I think this problem may caused by the dataset. Once I trained coco2017 before, it was bound to go wrong in hundreds of iters. Setting batch/subdivision to 64/32 will only delay the occurrence time of error report, maybe thousands of iters. And I trained coco2014, it didn't happen again.

@Zzh-tju Zzh-tju closed this as completed Apr 5, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants