Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

使用自制的COCO数据集训练DETR模型时会遇到问题 #125

Closed
ezekielqu opened this issue Dec 16, 2021 · 4 comments · Fixed by #152
Closed

使用自制的COCO数据集训练DETR模型时会遇到问题 #125

ezekielqu opened this issue Dec 16, 2021 · 4 comments · Fixed by #152
Assignees
Labels
bug Something isn't working

Comments

@ezekielqu
Copy link

训练DETR遇到的报错,出现的时间不固定,有时候是在训练刚开始几个batch时,有时候是训练了几个epoch后会出现。单卡3090,显存占用率基本维持在50%左右

训练使用的命令:
CUDA_VISIBLE_DEVICES=1
python main_single_gpu.py
-cfg='./configs/detr_resnet50.yaml'
-dataset='coco'
-batch_size=2
-data_path='/dataset/coco' \

报错:
Traceback (most recent call last):
File "main_single_gpu.py", line 321, in
main()
File "main_single_gpu.py", line 289, in main
accum_iter=config.TRAIN.ACCUM_ITER)
File "main_single_gpu.py", line 91, in train
loss_dict = criterion(outputs, targets)
File "/home/cuiyuan/anaconda3/envs/paddle2/lib/python3.7/site-packages/paddle/fluid/dygraph/layers.py", line 902, in call
outputs = self.forward(*inputs, **kwargs)
File "/disk/disk1/quyi/PaddleViT/object_detection/DETR/detr.py", line 285, in forward
indices = self.matcher(outputs_without_aux, targets) # list of index(tensor) pairs
File "/home/cuiyuan/anaconda3/envs/paddle2/lib/python3.7/site-packages/paddle/fluid/dygraph/layers.py", line 902, in call
outputs = self.forward(*inputs, **kwargs)
File "/disk/disk1/quyi/PaddleViT/object_detection/DETR/matcher.py", line 129, in forward
idx = linear_sum_assignment(c[i])
File "/home/cuiyuan/anaconda3/envs/paddle2/lib/python3.7/site-packages/scipy/optimize/_lsap.py", line 100, in linear_sum_assignment
return _lsap_module.calculate_assignment(cost_matrix)
ValueError: matrix contains invalid numeric entries

@xperzy
Copy link
Collaborator

xperzy commented Dec 22, 2021

Thanks for the issue, currently DETR training is still debugging, if you can provide more details, it would be helpful for us to locate the error. Thanks.

@xperzy xperzy added the bug Something isn't working label Dec 22, 2021
@FL77N
Copy link
Collaborator

FL77N commented Dec 23, 2021

你好,你在使用 MS-COCO 2017 数据时,会出现这类问题吗?

@ezekielqu
Copy link
Author

你好,你在使用 MS-COCO 2017 数据时,会出现这类问题吗?

训练COCO2017官方数据集也会遇到同样的报错

@FL77N
Copy link
Collaborator

FL77N commented Dec 23, 2021

你好,你在使用 MS-COCO 2017 数据时,会出现这类问题吗?

训练COCO2017官方数据集也会遇到同样的报错

谢谢你的反馈,我们正在处理相关的 bug

@xperzy xperzy linked a pull request Jan 7, 2022 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants