自定义数据集在计算loss时出错 #28

zhangvia · 2021-06-18T15:31:49Z

学长你好，我想自定义一个数据集来跑这个模型，我把数据集的标注转成了coco格式如下所示。然后修改了config文件的数据集路径，但在计算loss时还是报错了。错误如下：
"images":[略],
"categories":[略],
"annotations":[
{
"area":3366.6824649275077,
"category_id":3,
"segmentation":[
[
356.7924715069118,
54.362633790420304,
337.6467271719448,
29.497233060353793,
252.64484276405284,
94.9465843308558,
271.79058709901983,
119.81198506092231
]
],
"iscrowd":0,
"bbox":[
304.7186571354823,
74.65460906063805,
31.3822828934849,
107.27971818858487
],
"image_id":18,
"id":1
}；

Traceback (most recent call last):
File "tools/train.py", line 95, in
main()
File "tools/train.py", line 91, in main
logger=logger)
File "/home/localchao/LHX/jyz/ReDet/mmdet/apis/train.py", line 61, in train_detector
_non_dist_train(model, dataset, cfg, validate=validate)
File "/home/localchao/LHX/jyz/ReDet/mmdet/apis/train.py", line 197, in _non_dist_train
runner.run(data_loaders, cfg.workflow, cfg.total_epochs)
File "/home/localchao/anaconda3/envs/CG-Net/lib/python3.7/site-packages/mmcv-0.2.13-py3.7-linux-x86_64.egg/mmcv/runner/runner.py", line 358, in run
epoch_runner(data_loaders[i], **kwargs)
File "/home/localchao/anaconda3/envs/CG-Net/lib/python3.7/site-packages/mmcv-0.2.13-py3.7-linux-x86_64.egg/mmcv/runner/runner.py", line 264, in train
self.model, data_batch, train_mode=True, **kwargs)
File "/home/localchao/LHX/jyz/ReDet/mmdet/apis/train.py", line 39, in batch_processor
losses = model(**data)
File "/home/localchao/anaconda3/envs/CG-Net/lib/python3.7/site-packages/torch/nn/modules/module.py", line 541, in call
result = self.forward(*input, **kwargs)
File "/home/localchao/anaconda3/envs/CG-Net/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 150, in forward
return self.module(*inputs[0], **kwargs[0])
File "/home/localchao/anaconda3/envs/CG-Net/lib/python3.7/site-packages/torch/nn/modules/module.py", line 541, in call
result = self.forward(*input, **kwargs)
File "/home/localchao/LHX/jyz/ReDet/mmdet/models/detectors/base_new.py", line 95, in forward
return self.forward_train(img, img_meta, **kwargs)
File "/home/localchao/LHX/jyz/ReDet/mmdet/models/detectors/ReDet.py", line 143, in forward_train
*rpn_loss_inputs, gt_bboxes_ignore=gt_bboxes_ignore)
File "/home/localchao/LHX/jyz/ReDet/mmdet/models/anchor_heads/rpn_head.py", line 51, in loss
gt_bboxes_ignore=gt_bboxes_ignore)
File "/home/localchao/LHX/jyz/ReDet/mmdet/models/anchor_heads/anchor_head.py", line 177, in loss
sampling=self.sampling)
File "/home/localchao/LHX/jyz/ReDet/mmdet/core/anchor/anchor_target.py", line 63, in anchor_target
unmap_outputs=unmap_outputs)
File "/home/localchao/LHX/jyz/ReDet/mmdet/core/utils/misc.py", line 24, in multi_apply
return tuple(map(list, zip(*map_results)))
File "/home/localchao/LHX/jyz/ReDet/mmdet/core/anchor/anchor_target.py", line 108, in anchor_target_single
cfg.allowed_border)
File "/home/localchao/LHX/jyz/ReDet/mmdet/core/anchor/anchor_target.py", line 173, in anchor_inside_flags
(flat_anchors[:, 2] < img_w + allowed_border) &
RuntimeError: Expected object of scalar type Byte but got scalar type Bool for argument #2 'other' in call to _th_and

csuhan · 2021-06-18T17:29:36Z

使用低版本pytorch
或者
RuntimeError in: mmdet/core/anchor/anchor_target.py #15

zhangvia · 2021-06-19T05:20:35Z

使用低版本pytorch
或者

RuntimeError in: mmdet/core/anchor/anchor_target.py #15

我两个方法都用了，会报同样的错，segmentation fault

zhangvia · 2021-06-19T08:52:37Z

我更新了gcc解决了segmentation fault 之后重新编译了mmdet，我修改了configs文件中的数据路径，修改datatype为cocodataset，修改了imagescale，numclasses，以及lr（因为我只有单张1080ti）。现在模型跑起来后，会自动终止什么信息也没有是怎么回事。。。运行train.py后信息如下：
[localchao@localhost ReDet]$ python tools/train.py configs/ReDet/ReDet_re50_refpn_1x_dota1.py
ReResNet Orientation: 8 Fix Params: False
2021-06-20 00:37:38,683 - INFO - Distributed training: False
2021-06-20 00:38:11,890 - INFO - load model from: work_dirs/ReResNet_pretrain/re_resnet50_c8_batch256-25b16846.pth
2021-06-20 00:38:11,940 - WARNING - The model and loaded state dict do not match exactly

unexpected key in source state_dict: head.fc.weight, head.fc.bias

missing keys in source state_dict: layer1.2.conv3.filter, layer3.1.conv1.filter, layer4.2.conv2.filter, layer3.1.conv3.filter, layer4.2.conv3.filter, layer1.1.conv1.filter, layer3.4.conv1.filter, layer4.2.conv1.filter, layer2.0.conv2.filter, layer1.2.conv2.filter, layer3.4.conv2.filter, conv1.filter, layer2.1.conv2.filter, layer3.5.conv1.filter, layer1.1.conv2.filter, layer2.3.conv1.filter, layer1.0.conv2.filter, layer2.3.conv2.filter, layer2.2.conv1.filter, layer2.2.conv2.filter, layer2.1.conv1.filter, layer3.2.conv1.filter, layer3.3.conv3.filter, layer1.2.conv1.filter, layer3.5.conv2.filter, layer3.1.conv2.filter, layer4.1.conv1.filter, layer4.0.conv3.filter, layer2.2.conv3.filter, layer3.0.downsample.0.filter, layer1.0.conv1.filter, layer2.3.conv3.filter, layer4.1.conv3.filter, layer3.2.conv2.filter, layer4.1.conv2.filter, layer3.0.conv2.filter, layer1.1.conv3.filter, layer3.4.conv3.filter, layer4.0.downsample.0.filter, layer3.3.conv1.filter, layer3.0.conv1.filter, layer2.0.downsample.0.filter, layer2.0.conv3.filter, layer3.5.conv3.filter, layer1.0.conv3.filter, layer3.0.conv3.filter, layer3.3.conv2.filter, layer2.1.conv3.filter, layer3.2.conv3.filter, layer4.0.conv1.filter, layer4.0.conv2.filter, layer2.0.conv1.filter, layer1.0.downsample.0.filter

loading annotations into memory...
Done (t=0.01s)
creating index...
index created!
2021-06-20 00:38:13,770 - INFO - Start running, host: localchao@localhost.localdomain, work_dir: /home/localchao/LHX/jyz/ReDet/work_dirs/ReDet_re50_refpn_1x_dota1
2021-06-20 00:38:13,770 - INFO - workflow: [('train', 1)], max: 12 epochs
[localchao@localhost ReDet]$
@csuhan

csuhan · 2021-06-19T10:45:58Z

可以在模型内forward_train加断点，查看是否有运行

csuhan closed this as completed Jun 22, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

自定义数据集在计算loss时出错 #28

自定义数据集在计算loss时出错 #28

zhangvia commented Jun 18, 2021

csuhan commented Jun 18, 2021

zhangvia commented Jun 19, 2021

zhangvia commented Jun 19, 2021 •

edited

Loading

csuhan commented Jun 19, 2021

自定义数据集在计算loss时出错 #28

自定义数据集在计算loss时出错 #28

Comments

zhangvia commented Jun 18, 2021

csuhan commented Jun 18, 2021

zhangvia commented Jun 19, 2021

zhangvia commented Jun 19, 2021 • edited Loading

csuhan commented Jun 19, 2021

zhangvia commented Jun 19, 2021 •

edited

Loading