Runtime error occurs when i train my own data #13

jhk623 · 2018-01-16T06:17:08Z

when i run python3 train.py train,
======user config========
{'caffe_pretrain': False,
'caffe_pretrain_path': '/home/garcons/simple-faster-rcnn-pytorch/fasterrcnn_12211511_0.701052458187_torchvision_pretrain.pth',
'data': 'voc',
'debug_file': '/tmp/debugf',
'env': 'faster-rcnn',
'epoch': 14,
'load_path': None,
'lr': 0.001,
'lr_decay': 0.1,
'max_size': 1000,
'min_size': 400,
'num_workers': 4,
'plot_every': 40,
'port': 8097,
'pretrained_model': 'vgg16',
'roi_sigma': 1.0,
'rpn_sigma': 3.0,
'test_num': 1000,
'test_num_workers': 4,
'use_adam': False,
'use_chainer': False,
'use_drop': False,
'voc_data_dir': '/home/garcons/simple-faster-rcnn-pytorch/garconsdata/',
'weight_decay': 0.0005}
==========end============
load data
model construct completed
0it [00:00, ?it/s]/opt/conda/conda-bld/pytorch_1512386481460/work/torch/lib/THC/THCTensorIndex.cu:382: long calculateOffset(IndexType, LinearIndexCalcData<IndexType, Dims>) [with IndexType = unsigned int, Dims = 3U]: block: [0,0,0], thread: [32,0,0] Assertion indexAtDim < data.baseSizes[dim] failed.
/opt/conda/conda-bld/pytorch_1512386481460/work/torch/lib/THC/THCTensorIndex.cu:382: long calculateOffset(IndexType, LinearIndexCalcData<IndexType, Dims>) [with IndexType = unsigned int, Dims = 3U]: block: [0,0,0], thread: [33,0,0] Assertion indexAtDim < data.baseSizes[dim] failed.
...
/opt/conda/conda-bld/pytorch_1512386481460/work/torch/lib/THC/THCTensorIndex.cu:382: long calculateOffset(IndexType, LinearIndexCalcData<IndexType, Dims>) [with IndexType = unsigned int, Dims = 3U]: block: [0,0,0], thread: [31,0,0] Assertion indexAtDim < data.baseSizes[dim] failed.
THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1512386481460/work/torch/lib/THC/generic/THCTensorIndex.cu line=648 error=59 : device-side assert triggered

Traceback (most recent call last):
File "train.py", line 130, in
fire.Fire()
File "/root/anaconda3/lib/python3.6/site-packages/fire/core.py", line 127, in Fire
component_trace = _Fire(component, args, context, name)
File "/root/anaconda3/lib/python3.6/site-packages/fire/core.py", line 366, in _Fire
component, remaining_args)
File "/root/anaconda3/lib/python3.6/site-packages/fire/core.py", line 542, in _CallCallable
result = fn(*varargs, **kwargs)
File "train.py", line 80, in train
trainer.train_step(img, bbox, label, scale)
File "/home/garcons/simple-faster-rcnn-pytorch/trainer.py", line 168, in train_step
losses = self.forward(imgs, bboxes, labels, scale)
File "/home/garcons/simple-faster-rcnn-pytorch/trainer.py", line 147, in forward
at.totensor(gt_roi_label).long()]
File "/root/anaconda3/lib/python3.6/site-packages/torch/autograd/variable.py", line 78, in getitem
return Index.apply(self, key)
File "/root/anaconda3/lib/python3.6/site-packages/torch/autograd/_functions/tensor.py", line 87, in forward
result = i.index(ctx.index)
RuntimeError: cuda runtime error (59) : device-side assert triggered at /opt/conda/conda-bld/pytorch_1512386481460/work/torch/lib/THC/generic/THCTensorIndex.cu:648

this error occurs when i train with my own data
with pretrained model and VOC2007 dataset, there was no error like this.
i tried CUDA_LAUNCH_BLOCKING=1 python3 train.py train but it doesn't work.

how can i fix this error?

The text was updated successfully, but these errors were encountered:

chenyuntc · 2018-01-16T07:11:19Z

trainer.py line 147

 roi_loc = roi_cls_loc[t.arange(0, n_sample).long().cuda(), \
                              at.totensor(gt_roi_label).long()]

you may print roi_cls_loc and gt_roi_label to see what happen.

chenyuntc · 2018-01-24T13:47:37Z

I'll close it for now, feel free to reopen it if you have any questions.

Bigwode · 2018-03-27T02:27:12Z

I have the same error when I try to train on my own dataset. Would you please to share how you solved this problem? THANKS

Bigwode · 2018-03-27T02:34:42Z

@chenyuntc I try to print the roi_cls_loc and gt_roi_label, but it print nothing

shiontao · 2018-06-11T18:40:39Z

I have same problem too = =
/opt/conda/conda-bld/pytorch_1518244421288/work/torch/lib/THC/THCTensorIndex.cu:417: long calculateOffset(IndexType, LinearIndexCalcData<IndexType, Dims>) [with IndexType = unsigned int, Dims = 3U]: block: [0,0,0], thread: [0,0,0] AssertionindexAtDim < data.baseSizes[dim]failed. /opt/conda/conda-bld/pytorch_1518244421288/work/torch/lib/THC/THCTensorIndex.cu:417: long calculateOffset(IndexType, LinearIndexCalcData<IndexType, Dims>) [with IndexType = unsigned int, Dims = 3U]: block: [0,0,0], thread: [1,0,0] AssertionindexAtDim < data.baseSizes[dim]failed. /opt/conda/conda-bld/pytorch_1518244421288/work/torch/lib/THC/THCTensorIndex.cu:417: long calculateOffset(IndexType, LinearIndexCalcData<IndexType, Dims>) [with IndexType = unsigned int, Dims = 3U]: block: [0,0,0], thread: [2,0,0] AssertionindexAtDim < data.baseSizes[dim]failed. /opt/conda/conda-bld/pytorch_1518244421288/work/torch/lib/THC/THCTensorIndex.cu:417: long calculateOffset(IndexType, LinearIndexCalcData<IndexType, Dims>) [with IndexType = unsigned int, Dims = 3U]: block: [0,0,0], thread: [3,0,0] AssertionindexAtDim < data.baseSizes[dim]failed. /opt/conda/conda-bld/pytorch_1518244421288/work/torch/lib/THC/THCTensorIndex.cu:417: long calculateOffset(IndexType, LinearIndexCalcData<IndexType, Dims>) [with IndexType = unsigned int, Dims = 3U]: block: [0,0,0], thread: [4,0,0] AssertionindexAtDim < data.baseSizes[dim]failed. /opt/conda/conda-bld/pytorch_1518244421288/work/torch/lib/THC/THCTensorIndex.cu:417: long calculateOffset(IndexType, LinearIndexCalcData<IndexType, Dims>) [with IndexType = unsigned int, Dims = 3U]: block: [0,0,0], thread: [5,0,0] AssertionindexAtDim < data.baseSizes[dim]failed. /opt/conda/conda-bld/pytorch_1518244421288/work/torch/lib/THC/THCTensorIndex.cu:417: long calculateOffset(IndexType, LinearIndexCalcData<IndexType, Dims>) [with IndexType = unsigned int, Dims = 3U]: block: [0,0,0], thread: [6,0,0] AssertionindexAtDim < data.baseSizes[dim]failed. /opt/conda/conda-bld/pytorch_1518244421288/work/torch/lib/THC/THCTensorIndex.cu:417: long calculateOffset(IndexType, LinearIndexCalcData<IndexType, Dims>) [with IndexType = unsigned int, Dims = 3U]: block: [0,0,0], thread: [7,0,0] AssertionindexAtDim < data.baseSizes[dim]` failed.
THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1518244421288/work/torch/lib/THC/generic/THCTensorCopy.c line=21 error=59 : device-side assert triggered

`

chenyuntc closed this as completed Jan 24, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Runtime error occurs when i train my own data #13

Runtime error occurs when i train my own data #13

jhk623 commented Jan 16, 2018

chenyuntc commented Jan 16, 2018

chenyuntc commented Jan 24, 2018

Bigwode commented Mar 27, 2018

Bigwode commented Mar 27, 2018

shiontao commented Jun 11, 2018

Runtime error occurs when i train my own data #13

Runtime error occurs when i train my own data #13

Comments

jhk623 commented Jan 16, 2018

chenyuntc commented Jan 16, 2018

chenyuntc commented Jan 24, 2018

Bigwode commented Mar 27, 2018

Bigwode commented Mar 27, 2018

shiontao commented Jun 11, 2018