Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Got 'Segmentation fault' when trained 1/299 epochs , 'resource_tracker: There appear to be 6 leaked semaphore objects to clean up at shutdown' #11

Closed
ShirleyHe2020 opened this issue Apr 14, 2021 · 2 comments

Comments

@ShirleyHe2020
Copy link

2*8 G , GPU. I run : python scripts/train.py --batch 16 --epochs 300 --cfg configs/model_mobilenet.yaml
epoch 0 finished , when epoch 1 in process, got warnings as below and the training stopped automatically.

Epoch gpu_mem box obj cls total targets img_size
0/299 5.55G 0.09329 0.01881 0 0.1121 11 640: 100%|██████████| 836/836 [05:06<00:00, 2.73it/s]
Class Images Targets P R mAP@.5 mAP@.5:.95: 100%|██████████| 74/74 [00:35<00:00, 2.11it/s]
all 2.36e+03 2.81e+03 0.000155 0.0121 7.23e-05 1.11e-05
Images sizes do not match. This will causes images to be display incorrectly in the UI.

 Epoch   gpu_mem       box       obj       cls     total   targets  img_size
 1/299     5.53G   0.08231   0.02054         0    0.1029        40       640:  36%|███▌      | 303/836 [01:46<03:02,  2.91it/s]Segmentation fault

(base) user @Debian:~/anaconda3/envs/ultra_YOLOv5/flexible-yolov5-main$ /home/user /anaconda3/lib/python3.8/multiprocessing/resource_tracker.py:216: UserWarning: resource_tracker: There appear to be 6 leaked semaphore objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be %d '

@ShirleyHe2020
Copy link
Author

Segmentation fault again at 179/229 epochs :

 Epoch   gpu_mem       box       obj       cls     total   targets  img_size

178/299 5.54G 0.02711 0.008768 0 0.03587 9 640: 100%|██████████| 836/836 [04:44<00:00, 2.94it/s]
Class Images Targets P R mAP@.5 mAP@.5:.95: 100%|██████████| 74/74 [00:27<00:00, 2.69it/s]
all 2.36e+03 2.81e+03 0.0533 0.0594 0.0554 0.0421

 Epoch   gpu_mem       box       obj       cls     total   targets  img_size

179/299 5.54G 0.02733 0.008876 0 0.03621 7 640: 100%|██████████| 836/836 [04:44<00:00, 2.93it/s]
Class Images Targets P R mAP@.5 mAP@.5:.95: 26%|██▌ | 19/74 [00:07<00:21, 2.59it/s]ERROR: Unexpected segment ation fault encountered in worker.
Class Images Targets P R mAP@.5 mAP@.5:.95: 26%|██▌ | 19/74 [00:07<00:21, 2.59it/s]
Traceback (most recent call last):
File "train.py", line 527, in
train(hyp, opt, device, tb_writer, wandb)
File "train.py", line 344, in train
results, maps, times = test(opt.data,
File "/home/qian/anaconda3/envs/ultra_YOLOv5/flexible-yolov5-main/eval.py", line 110, in test
t0 += time_synchronized() - t
File "/home/qian/anaconda3/envs/ultra_YOLOv5/flexible-yolov5-main/utils/torch_utils.py", line 83, in time_synchronized
torch.cuda.synchronize()
File "/home/qian/anaconda3/lib/python3.8/site-packages/torch/cuda/init.py", line 380, in synchronize
return torch._C._cuda_synchronize()
File "/home/qian/anaconda3/lib/python3.8/site-packages/torch/utils/data/_utils/signal_handling.py", line 66, in handler
_error_if_any_worker_fails()
RuntimeError: DataLoader worker (pid 16352) is killed by signal: Segmentation fault.

@Bobo-y
Copy link
Owner

Bobo-y commented Apr 15, 2021

i'm sorry, I can't reproduce your problems.

@Bobo-y Bobo-y closed this as completed Apr 20, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants