Got 'Segmentation fault' when trained 1/299 epochs , 'resource_tracker: There appear to be 6 leaked semaphore objects to clean up at shutdown' #11

ShirleyHe2020 · 2021-04-14T13:10:47Z

2*8 G , GPU. I run : python scripts/train.py --batch 16 --epochs 300 --cfg configs/model_mobilenet.yaml
epoch 0 finished , when epoch 1 in process, got warnings as below and the training stopped automatically.

Epoch gpu_mem box obj cls total targets img_size
0/299 5.55G 0.09329 0.01881 0 0.1121 11 640: 100%|██████████| 836/836 [05:06<00:00, 2.73it/s]
Class Images Targets P R mAP@.5 mAP@.5:.95: 100%|██████████| 74/74 [00:35<00:00, 2.11it/s]
all 2.36e+03 2.81e+03 0.000155 0.0121 7.23e-05 1.11e-05
Images sizes do not match. This will causes images to be display incorrectly in the UI.

 Epoch   gpu_mem       box       obj       cls     total   targets  img_size
 1/299     5.53G   0.08231   0.02054         0    0.1029        40       640:  36%|███▌      | 303/836 [01:46<03:02,  2.91it/s]Segmentation fault

(base) user @Debian:~/anaconda3/envs/ultra_YOLOv5/flexible-yolov5-main$ /home/user /anaconda3/lib/python3.8/multiprocessing/resource_tracker.py:216: UserWarning: resource_tracker: There appear to be 6 leaked semaphore objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be %d '

The text was updated successfully, but these errors were encountered:

ShirleyHe2020 · 2021-04-15T06:28:00Z

Segmentation fault again at 179/229 epochs :

 Epoch   gpu_mem       box       obj       cls     total   targets  img_size

178/299 5.54G 0.02711 0.008768 0 0.03587 9 640: 100%|██████████| 836/836 [04:44<00:00, 2.94it/s]
Class Images Targets P R mAP@.5 mAP@.5:.95: 100%|██████████| 74/74 [00:27<00:00, 2.69it/s]
all 2.36e+03 2.81e+03 0.0533 0.0594 0.0554 0.0421

 Epoch   gpu_mem       box       obj       cls     total   targets  img_size

179/299 5.54G 0.02733 0.008876 0 0.03621 7 640: 100%|██████████| 836/836 [04:44<00:00, 2.93it/s]
Class Images Targets P R mAP@.5 mAP@.5:.95: 26%|██▌ | 19/74 [00:07<00:21, 2.59it/s]ERROR: Unexpected segment ation fault encountered in worker.
Class Images Targets P R mAP@.5 mAP@.5:.95: 26%|██▌ | 19/74 [00:07<00:21, 2.59it/s]
Traceback (most recent call last):
File "train.py", line 527, in
train(hyp, opt, device, tb_writer, wandb)
File "train.py", line 344, in train
results, maps, times = test(opt.data,
File "/home/qian/anaconda3/envs/ultra_YOLOv5/flexible-yolov5-main/eval.py", line 110, in test
t0 += time_synchronized() - t
File "/home/qian/anaconda3/envs/ultra_YOLOv5/flexible-yolov5-main/utils/torch_utils.py", line 83, in time_synchronized
torch.cuda.synchronize()
File "/home/qian/anaconda3/lib/python3.8/site-packages/torch/cuda/init.py", line 380, in synchronize
return torch._C._cuda_synchronize()
File "/home/qian/anaconda3/lib/python3.8/site-packages/torch/utils/data/_utils/signal_handling.py", line 66, in handler
_error_if_any_worker_fails()
RuntimeError: DataLoader worker (pid 16352) is killed by signal: Segmentation fault.

Bobo-y · 2021-04-15T10:24:51Z

i'm sorry, I can't reproduce your problems.

Bobo-y closed this as completed Apr 20, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Got 'Segmentation fault' when trained 1/299 epochs , 'resource_tracker: There appear to be 6 leaked semaphore objects to clean up at shutdown' #11

Got 'Segmentation fault' when trained 1/299 epochs , 'resource_tracker: There appear to be 6 leaked semaphore objects to clean up at shutdown' #11

ShirleyHe2020 commented Apr 14, 2021

ShirleyHe2020 commented Apr 15, 2021

Bobo-y commented Apr 15, 2021

Got 'Segmentation fault' when trained 1/299 epochs , 'resource_tracker: There appear to be 6 leaked semaphore objects to clean up at shutdown' #11

Got 'Segmentation fault' when trained 1/299 epochs , 'resource_tracker: There appear to be 6 leaked semaphore objects to clean up at shutdown' #11

Comments

ShirleyHe2020 commented Apr 14, 2021

ShirleyHe2020 commented Apr 15, 2021

Bobo-y commented Apr 15, 2021