Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Runtime error occurs when i train my own data #13

Closed
jhk623 opened this issue Jan 16, 2018 · 5 comments
Closed

Runtime error occurs when i train my own data #13

jhk623 opened this issue Jan 16, 2018 · 5 comments

Comments

@jhk623
Copy link

jhk623 commented Jan 16, 2018

when i run python3 train.py train,
======user config========
{'caffe_pretrain': False,
'caffe_pretrain_path': '/home/garcons/simple-faster-rcnn-pytorch/fasterrcnn_12211511_0.701052458187_torchvision_pretrain.pth',
'data': 'voc',
'debug_file': '/tmp/debugf',
'env': 'faster-rcnn',
'epoch': 14,
'load_path': None,
'lr': 0.001,
'lr_decay': 0.1,
'max_size': 1000,
'min_size': 400,
'num_workers': 4,
'plot_every': 40,
'port': 8097,
'pretrained_model': 'vgg16',
'roi_sigma': 1.0,
'rpn_sigma': 3.0,
'test_num': 1000,
'test_num_workers': 4,
'use_adam': False,
'use_chainer': False,
'use_drop': False,
'voc_data_dir': '/home/garcons/simple-faster-rcnn-pytorch/garconsdata/',
'weight_decay': 0.0005}
==========end============
load data
model construct completed
0it [00:00, ?it/s]/opt/conda/conda-bld/pytorch_1512386481460/work/torch/lib/THC/THCTensorIndex.cu:382: long calculateOffset(IndexType, LinearIndexCalcData<IndexType, Dims>) [with IndexType = unsigned int, Dims = 3U]: block: [0,0,0], thread: [32,0,0] Assertion indexAtDim < data.baseSizes[dim] failed.
/opt/conda/conda-bld/pytorch_1512386481460/work/torch/lib/THC/THCTensorIndex.cu:382: long calculateOffset(IndexType, LinearIndexCalcData<IndexType, Dims>) [with IndexType = unsigned int, Dims = 3U]: block: [0,0,0], thread: [33,0,0] Assertion indexAtDim < data.baseSizes[dim] failed.
...
/opt/conda/conda-bld/pytorch_1512386481460/work/torch/lib/THC/THCTensorIndex.cu:382: long calculateOffset(IndexType, LinearIndexCalcData<IndexType, Dims>) [with IndexType = unsigned int, Dims = 3U]: block: [0,0,0], thread: [31,0,0] Assertion indexAtDim < data.baseSizes[dim] failed.
THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1512386481460/work/torch/lib/THC/generic/THCTensorIndex.cu line=648 error=59 : device-side assert triggered

Traceback (most recent call last):
File "train.py", line 130, in
fire.Fire()
File "/root/anaconda3/lib/python3.6/site-packages/fire/core.py", line 127, in Fire
component_trace = _Fire(component, args, context, name)
File "/root/anaconda3/lib/python3.6/site-packages/fire/core.py", line 366, in _Fire
component, remaining_args)
File "/root/anaconda3/lib/python3.6/site-packages/fire/core.py", line 542, in _CallCallable
result = fn(*varargs, **kwargs)
File "train.py", line 80, in train
trainer.train_step(img, bbox, label, scale)
File "/home/garcons/simple-faster-rcnn-pytorch/trainer.py", line 168, in train_step
losses = self.forward(imgs, bboxes, labels, scale)
File "/home/garcons/simple-faster-rcnn-pytorch/trainer.py", line 147, in forward
at.totensor(gt_roi_label).long()]
File "/root/anaconda3/lib/python3.6/site-packages/torch/autograd/variable.py", line 78, in getitem
return Index.apply(self, key)
File "/root/anaconda3/lib/python3.6/site-packages/torch/autograd/_functions/tensor.py", line 87, in forward
result = i.index(ctx.index)
RuntimeError: cuda runtime error (59) : device-side assert triggered at /opt/conda/conda-bld/pytorch_1512386481460/work/torch/lib/THC/generic/THCTensorIndex.cu:648

this error occurs when i train with my own data
with pretrained model and VOC2007 dataset, there was no error like this.
i tried CUDA_LAUNCH_BLOCKING=1 python3 train.py train but it doesn't work.

how can i fix this error?

@chenyuntc
Copy link
Owner

trainer.py line 147

 roi_loc = roi_cls_loc[t.arange(0, n_sample).long().cuda(), \
                              at.totensor(gt_roi_label).long()]

you may print roi_cls_loc and gt_roi_label to see what happen.

@chenyuntc
Copy link
Owner

I'll close it for now, feel free to reopen it if you have any questions.

@Bigwode
Copy link

Bigwode commented Mar 27, 2018

I have the same error when I try to train on my own dataset. Would you please to share how you solved this problem? THANKS

@Bigwode
Copy link

Bigwode commented Mar 27, 2018

@chenyuntc I try to print the roi_cls_loc and gt_roi_label, but it print nothing

@shiontao
Copy link

I have same problem too = =
/opt/conda/conda-bld/pytorch_1518244421288/work/torch/lib/THC/THCTensorIndex.cu:417: long calculateOffset(IndexType, LinearIndexCalcData<IndexType, Dims>) [with IndexType = unsigned int, Dims = 3U]: block: [0,0,0], thread: [0,0,0] AssertionindexAtDim < data.baseSizes[dim]failed. /opt/conda/conda-bld/pytorch_1518244421288/work/torch/lib/THC/THCTensorIndex.cu:417: long calculateOffset(IndexType, LinearIndexCalcData<IndexType, Dims>) [with IndexType = unsigned int, Dims = 3U]: block: [0,0,0], thread: [1,0,0] AssertionindexAtDim < data.baseSizes[dim]failed. /opt/conda/conda-bld/pytorch_1518244421288/work/torch/lib/THC/THCTensorIndex.cu:417: long calculateOffset(IndexType, LinearIndexCalcData<IndexType, Dims>) [with IndexType = unsigned int, Dims = 3U]: block: [0,0,0], thread: [2,0,0] AssertionindexAtDim < data.baseSizes[dim]failed. /opt/conda/conda-bld/pytorch_1518244421288/work/torch/lib/THC/THCTensorIndex.cu:417: long calculateOffset(IndexType, LinearIndexCalcData<IndexType, Dims>) [with IndexType = unsigned int, Dims = 3U]: block: [0,0,0], thread: [3,0,0] AssertionindexAtDim < data.baseSizes[dim]failed. /opt/conda/conda-bld/pytorch_1518244421288/work/torch/lib/THC/THCTensorIndex.cu:417: long calculateOffset(IndexType, LinearIndexCalcData<IndexType, Dims>) [with IndexType = unsigned int, Dims = 3U]: block: [0,0,0], thread: [4,0,0] AssertionindexAtDim < data.baseSizes[dim]failed. /opt/conda/conda-bld/pytorch_1518244421288/work/torch/lib/THC/THCTensorIndex.cu:417: long calculateOffset(IndexType, LinearIndexCalcData<IndexType, Dims>) [with IndexType = unsigned int, Dims = 3U]: block: [0,0,0], thread: [5,0,0] AssertionindexAtDim < data.baseSizes[dim]failed. /opt/conda/conda-bld/pytorch_1518244421288/work/torch/lib/THC/THCTensorIndex.cu:417: long calculateOffset(IndexType, LinearIndexCalcData<IndexType, Dims>) [with IndexType = unsigned int, Dims = 3U]: block: [0,0,0], thread: [6,0,0] AssertionindexAtDim < data.baseSizes[dim]failed. /opt/conda/conda-bld/pytorch_1518244421288/work/torch/lib/THC/THCTensorIndex.cu:417: long calculateOffset(IndexType, LinearIndexCalcData<IndexType, Dims>) [with IndexType = unsigned int, Dims = 3U]: block: [0,0,0], thread: [7,0,0] AssertionindexAtDim < data.baseSizes[dim]` failed.
THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1518244421288/work/torch/lib/THC/generic/THCTensorCopy.c line=21 error=59 : device-side assert triggered

`

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants