RuntimeError: Dimension out of range (expected to be in range of [-2, 1], but got 2) with CornerNet_Saccade #34

Sujith93 · 2020-05-08T07:47:28Z

I'm running

from train_detector import Detector
gtf = Detector();

root_dir = "/home/SK00495085/monk/Monk_Object_Detection/data";
coco_dir = "training_menu"
img_dir = "/"
set_dir = "Images"

gtf.Train_Dataset(root_dir, coco_dir, img_dir, set_dir, batch_size=4, num_workers=4)

root_dir = "/home/SK00495085/monk/Monk_Object_Detection/data";
coco_dir = "validation_menu"
img_dir = "/"
set_dir = "Images"

gtf.Val_Dataset(root_dir, coco_dir, img_dir, set_dir)
gtf.Model(model_name="CornerNet_Saccade")
gtf.Hyper_Params(lr=0.00025, total_iterations=6900000, val_interval=10000)
gtf.Setup();
gtf.Train();

I got this error:

loading annotations into memory...
Done (t=0.59s)
creating index...
index created!
loading annotations into memory...
Done (t=0.22s)
creating index...
index created!
Loading Model - core.models.CornerNet_Saccade
Model Loaded
start_iter = 0
distributed = False
world_size = 0
initialize = False
batch_size = 1
learning_rate = 0.00025
max_iteration = 6900000
stepsize = 5520000
snapshot = 3450000
val_iter = 10000
display = 100
decay_rate = 10
Process 0: building model...
total parameters: 116967797
start prefetching data...
shuffling indices...
setting learning rate to: 0.00025
training start...
start prefetching data...
shuffling indices...
0%| | 0/6900000 [00:00<?, ?it/s]
Traceback (most recent call last):
File "training_saccade.py", line 31, in
gtf.Train();
File "/home/SK00495085/monk/Monk_Object_Detection/6_cornernet_lite/lib/train_detector.py", line 298, in Train
training_loss = self.system_dict["local"]["nnet"].train(**training)
File "/home/SK00495085/monk/Monk_Object_Detection/6_cornernet_lite/lib/core/nnet/py_factory.py", line 93, in train
loss = self.network(xs, ys)
File "/home/SK00495085/.conda/envs/monk_env/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/home/SK00495085/monk/Monk_Object_Detection/6_cornernet_lite/lib/core/models/py_utils/data_parallel.py", line 68, in forward
return self.module(*inputs[0], **kwargs[0])
File "/home/SK00495085/.conda/envs/monk_env/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/home/SK00495085/monk/Monk_Object_Detection/6_cornernet_lite/lib/core/nnet/py_factory.py", line 20, in forward
loss = self.loss(preds, ys, **kwargs)
File "/home/SK00495085/.conda/envs/monk_env/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/home/SK00495085/monk/Monk_Object_Detection/6_cornernet_lite/lib/core/models/py_utils/losses.py", line 150, in forward
pull, push = self.ae_loss(tl_tag, br_tag, gt_mask)
File "/home/SK00495085/monk/Monk_Object_Detection/6_cornernet_lite/lib/core/models/py_utils/losses.py", line 26, in _ae_loss
dist = tag_mean.unsqueeze(1) - tag_mean.unsqueeze(2)
RuntimeError: Dimension out of range (expected to be in range of [-2, 1], but got 2)

THEFASHIONGEEK · 2020-05-08T08:43:16Z

Are you training on multiple GPUs?

Sujith93 · 2020-05-08T08:46:01Z

no, only one GPU

THEFASHIONGEEK · 2020-05-08T09:15:51Z

try to increase the batch size to 8. if the error still persists please let us know.

Sujith93 · 2020-05-08T09:32:48Z

I changed the batch size to 8. its working fine.
The reason why I reduced to 1 is because of the below error.
"RuntimeError: CUDA out of memory. Tried to allocate 399.88 MiB"

Now it's running. Thank you

Sujith93 · 2020-05-08T09:57:38Z

It ran for some time, But now it's throwing this below error

batch_size = 8
learning_rate = 0.00025
max_iteration = 6900000
stepsize = 5520000
snapshot = 3450000
val_iter = 10000
display = 100
decay_rate = 10
Process 0: building model...
Traceback (most recent call last):
File "training_saccade.py", line 31, in
gtf.Train();
File "/home/SK00495085/monk/Monk_Object_Detection/6_cornernet_lite/lib/train_detector.py", line 232, in Train
self.system_dict["local"]["model"], distributed=distributed, gpu=gpu)
File "/home/SK00495085/monk/Monk_Object_Detection/6_cornernet_lite/lib/core/nnet/py_factory.py", line 51, in init
self.network = DataParallel(self.network, chunk_sizes=system_config.chunk_sizes)
File "/home/SK00495085/monk/Monk_Object_Detection/6_cornernet_lite/lib/core/models/py_utils/data_parallel.py", line 61, in init
self.module.cuda(device_ids[0])
File "/home/SK00495085/.conda/envs/monk_env/lib/python3.6/site-packages/torch/nn/modules/module.py", line 260, in cuda
return self._apply(lambda t: t.cuda(device))
File "/home/SK00495085/.conda/envs/monk_env/lib/python3.6/site-packages/torch/nn/modules/module.py", line 187, in _apply
module._apply(fn)
File "/home/SK00495085/.conda/envs/monk_env/lib/python3.6/site-packages/torch/nn/modules/module.py", line 187, in _apply
module._apply(fn)
File "/home/SK00495085/.conda/envs/monk_env/lib/python3.6/site-packages/torch/nn/modules/module.py", line 187, in _apply
module._apply(fn)
[Previous line repeated 7 more times]
File "/home/SK00495085/.conda/envs/monk_env/lib/python3.6/site-packages/torch/nn/modules/module.py", line 193, in _apply
param.data = fn(param.data)
File "/home/SK00495085/.conda/envs/monk_env/lib/python3.6/site-packages/torch/nn/modules/module.py", line 260, in
return self._apply(lambda t: t.cuda(device))
RuntimeError: CUDA out of memory. Tried to allocate 5.12 MiB (GPU 0; 7.93 GiB total capacity; 146.31 MiB already allocated; 5.56 MiB free; 839.00 KiB cached)

Sujith93 · 2020-05-08T10:07:36Z

It ran for the first time, but the same code running for the second time throws this Runtime error.

Sujith93 · 2020-05-08T13:42:05Z

The issues are with Cuda. I got it.
Thanks.

Sujith93 · 2020-05-10T03:19:30Z

once again runtime error

11, 1.55s/it]^M 0%| | 9976/6900000 [20:38:55<2966:40:26, 1.55s/it]^M 0%| | 9977/6900000 [20:38:56<2967:57:09, 1.55s/it]^M 0%| | 9978/6900000 [20:38:58<2955:07:32, 1.54s/it]^M 0%| | 9979/6900000 [20:38:59<2944:00:52, 1.54s/it]^M 0%| | 9980/6900000 [20:39:01<2937:11:48, 1.53s/it]^M 0%| | 9981/6900000 [20:39:02<2937:09:15, 1.53s/it]^M 0%| | 9982/6900000 [20:39:04<2951:29:57, 1.54s/it]^M 0%| | 9983/6900000 [20:39:05<2943:31:21, 1.54s/it]^M 0%| | 9984/6900000 [20:39:07<2937:01:24, 1.53s/it]^M 0%| | 9985/6900000 [20:39:08<2935:26:05, 1.53s/it]^M 0%| | 9986/6900000 [20:39:10<2931:41:56, 1.53s/it]^M 0%| | 9987/6900000 [20:39:12<2928:14:04, 1.53s/it]^M 0%| | 9988/6900000 [20:39:13<2951:58:24, 1.54s/it]^M 0%| | 9989/6900000 [20:39:15<2983:16:16, 1.56s/it]^M 0%| | 9990/6900000 [20:39:16<3001:54:02, 1.57s/it]^M 0%| | 9991/6900000 [20:39:18<2993:27:21, 1.56s/it]^M 0%| | 9992/6900000 [20:39:19<2987:51:46, 1.56s/it]^M 0%| | 9993/6900000 [20:39:21<2970:26:55, 1.55s/it]^M 0%| | 9994/6900000 [20:39:22<2996:07:47, 1.57s/it]^M 0%| | 9995/6900000 [20:39:24<2977:08:39, 1.56s/it]^M 0%| | 9996/6900000 [20:39:26<2961:41:57, 1.55s/it]^M 0%| | 9997/6900000 [20:39:27<2948:27:08, 1.54s/it]^M 0%| | 9998/6900000 [20:39:29<2940:58:51, 1.54s/it]^M
^M 0%| | 9999/6900000 [20:39:32<2965:10:45, 1.55s/it]^M 0%| | 9999/6900000 [20:39:33<14235:41:00, 7.44s/it]
^MProcess 0: training loss at iteration 10000: 3.72452449798584

Traceback (most recent call last):
File "training_saccade.py", line 32, in
gtf.Train();
File "/home/SK00495085/monk/Monk_Object_Detection/6_cornernet_lite/lib/train_detector.py", line 307, in Train
validation_loss = self.system_dict["local"]["nnet"].validate(**validation)
File "/home/SK00495085/monk/Monk_Object_Detection/6_cornernet_lite/lib/core/nnet/py_factory.py", line 105, in validate
loss = self.network(xs, ys)
File "/home/SK00495085/.conda/envs/monk_env/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/home/SK00495085/monk/Monk_Object_Detection/6_cornernet_lite/lib/core/models/py_utils/data_parallel.py", line 68, in forward
return self.module(*inputs[0], **kwargs[0])
File "/home/SK00495085/.conda/envs/monk_env/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/home/SK00495085/monk/Monk_Object_Detection/6_cornernet_lite/lib/core/nnet/py_factory.py", line 20, in forward
loss = self.loss(preds, ys, **kwargs)
File "/home/SK00495085/.conda/envs/monk_env/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/home/SK00495085/monk/Monk_Object_Detection/6_cornernet_lite/lib/core/models/py_utils/losses.py", line 134, in forward
focal_loss += self.focal_loss(tl_heats, gt_tl_heat, gt_tl_valid)
File "/home/SK00495085/monk/Monk_Object_Detection/6_cornernet_lite/lib/core/models/py_utils/losses.py", line 57, in _focal_loss_mask
pos_pred = pred[pos_inds]
RuntimeError: The shape of the mask [8, 18, 64, 64] at index 1 does not match the shape of the indexed tensor [8, 79, 64, 64] at index 1

abhi-kumar · 2020-05-10T03:58:42Z

Which dataset are you working on? Please point to that dataset and share your code so that we can reproduce the errors

Sujith93 · 2020-05-10T04:59:27Z

I'm working on a real-time project so I'm unable to share the data.
But the steps which I have done till now are listed below.

I set up the required environment to run CornerNet_Saccade
Converted VOC XML to Coco dataset using VOC Type to Coco - Via Monk Type Annotation.ipynb
I have train and valid data so I considered Train With Validation Data.ipynb
So kept the model running for the whole night, but suddenly it stopped throwing that run time error.

abhi-kumar · 2020-05-10T11:51:25Z

Please check if there is any discrepancy within annotation files. Since it started and and ran for certain hours then the issue could be traced back to an image maybe with no labels and bounding boxes or box shapes crossing the image boundaries.

Sujith93 · 2020-05-15T05:41:18Z

yes, I crosschecked both train and valid data a couple of times regarding the bounding boxes and the labels. All are good. But when I go with only training data without validation data, everything works fine.
Seems there is an issue with the validation part.

abhi-kumar · 2020-05-15T09:36:10Z

Thank you for the detailed analysis. Will check validation codes.

abhi-kumar · 2020-08-05T14:16:16Z

We have run multiple tests on corner-net pipeline, yet the error wasn't reproduced. Have you reached a solution yet?

abhi-kumar · 2020-08-18T12:25:39Z

Closing due to inactivity

Sujith93 changed the title ~~RuntimeError: Dimension out of range (expected to be in range of [-2, 1], but got 2)~~ RuntimeError: Dimension out of range (expected to be in range of [-2, 1], but got 2) with CornerNet_Saccade May 8, 2020

Sujith93 closed this as completed May 8, 2020

Sujith93 reopened this May 10, 2020

abhi-kumar added the bug Something isn't working label May 10, 2020

abhi-kumar closed this as completed Aug 18, 2020

waghts95 mentioned this issue Aug 25, 2020

Error in Resume training module of 4_efficientdet, getting after completing 5 epoch. #56

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RuntimeError: Dimension out of range (expected to be in range of [-2, 1], but got 2) with CornerNet_Saccade #34

RuntimeError: Dimension out of range (expected to be in range of [-2, 1], but got 2) with CornerNet_Saccade #34

Sujith93 commented May 8, 2020 •

edited

Loading

THEFASHIONGEEK commented May 8, 2020

Sujith93 commented May 8, 2020 •

edited

Loading

THEFASHIONGEEK commented May 8, 2020

Sujith93 commented May 8, 2020 •

edited

Loading

Sujith93 commented May 8, 2020

Sujith93 commented May 8, 2020

Sujith93 commented May 8, 2020

Sujith93 commented May 10, 2020

abhi-kumar commented May 10, 2020

Sujith93 commented May 10, 2020 •

edited

Loading

abhi-kumar commented May 10, 2020

Sujith93 commented May 15, 2020

abhi-kumar commented May 15, 2020

abhi-kumar commented Aug 5, 2020

abhi-kumar commented Aug 18, 2020

RuntimeError: Dimension out of range (expected to be in range of [-2, 1], but got 2) with CornerNet_Saccade #34

RuntimeError: Dimension out of range (expected to be in range of [-2, 1], but got 2) with CornerNet_Saccade #34

Comments

Sujith93 commented May 8, 2020 • edited Loading

THEFASHIONGEEK commented May 8, 2020

Sujith93 commented May 8, 2020 • edited Loading

THEFASHIONGEEK commented May 8, 2020

Sujith93 commented May 8, 2020 • edited Loading

Sujith93 commented May 8, 2020

Sujith93 commented May 8, 2020

Sujith93 commented May 8, 2020

Sujith93 commented May 10, 2020

abhi-kumar commented May 10, 2020

Sujith93 commented May 10, 2020 • edited Loading

abhi-kumar commented May 10, 2020

Sujith93 commented May 15, 2020

abhi-kumar commented May 15, 2020

abhi-kumar commented Aug 5, 2020

abhi-kumar commented Aug 18, 2020

Sujith93 commented May 8, 2020 •

edited

Loading

Sujith93 commented May 8, 2020 •

edited

Loading

Sujith93 commented May 8, 2020 •

edited

Loading

Sujith93 commented May 10, 2020 •

edited

Loading