RuntimeError: Creating MTGP constants failed #15

ivanlengyel · 2021-11-29T19:59:36Z

Hi, I am trying to implement this repo.
I've downloaded the ade20k checkpoints and created a conda env following your yaml file.

When I run the testing command python test.py --name oasis_ade20k --dataset_mode ade20k --gpu_ids 0 \ azureuser@ivan-fantasia-default --dataroot test_images --batch_size 1 I get the following error:

/opt/conda/conda-bld/pytorch_1544176307774/work/aten/src/THC/THCTensorScatterGather.cu:176: void THCudaTensor_scatterFillKernel(TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, Real, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = -1]: block: [232,0,0], thread: [101,0,0] Assertion `indexValue >= 0 && indexValue < tensor.sizes[dim]` failed.
Traceback (most recent call last):
  File "test.py", line 25, in <module>
    generated = model(None, label, "generate", None)
  File "/anaconda/envs/oasis/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/anaconda/envs/oasis/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 141, in forward
    return self.module(*inputs[0], **kwargs[0])
  File "/anaconda/envs/oasis/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/azureuser/IM/OASIS/models/models.py", line 72, in forward
    fake = self.netEMA(label)
  File "/anaconda/envs/oasis/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/azureuser/IM/OASIS/models/generator.py", line 36, in forward
    z = torch.randn(seg.size(0), self.opt.z_dim, dtype=torch.float32, device=dev)
RuntimeError: Creating MTGP constants failed. at /opt/conda/conda-bld/pytorch_1544176307774/work/aten/src/THC/THCTensorRandom.cu:35

I am running on test_image folder which are some ade20k images.

Any suggestion?
Thanks ;)

The text was updated successfully, but these errors were encountered:

SushkoVadim · 2021-11-29T20:13:31Z

Hi,

I am not totally sure, but it seems that you are not following the official structure of the Ade20k dataset.
Our dataloader expects the test data to lie under --dataroot/${...}/validation, and also expects to have label maps:

OASIS/dataloaders/Ade20kDataset.py

Line 40 in 6e728ec

path_lab = os.path.join(self.opt.dataroot, "annotations", mode)

Could you try to re-arrange the folders so that they follow the official Ade20k structure?

ivanlengyel · 2021-11-29T20:21:40Z

Hi, thanks for the reply ;)
I tried to follow the official Ade20k structure I think

$ tree test_images -L 2
test_images
├── annotations
│   └── validation
└── images
    └── validation

in images/validation/ are the files.jpg and in annotations/validation the files with the annotations .png.

I don't know if this has something to be: but I am exploring the outputs of the dataloaders in this case

OASIS/test.py

Lines 23 to 24 in 6e728ec

    
           for i, data_i in enumerate(dataloader_val): 
        
               _, label = models.preprocess_input(opt, data_i)

and for data_i

np.unique(data_i['label'].cpu().numpy())
array([  0.,  10.,  26.,  39.,  51.,  56.,  70.,  77.,  80.,  83.,  90.,
       102., 116., 128., 153., 179., 181., 204., 230., 255.],
      dtype=float32)

and data_i['label'].cpu().numpy().shape = (1, 3, 256, 256)

I don't know if this is fine or not. I am just commenting this because of the error Assertion indexValue >= 0 && indexValue < tensor.sizes[dim] failed

and I read in the the dataloader is expecting

OASIS/dataloaders/Ade20kDataset.py

Lines 16 to 18 in 6e728ec

    
           opt.label_nc = 150 
        
           opt.contain_dontcare_label = True 
        
           opt.semantic_nc = 151 # label_nc + unknown

, that is up to 150 labels and maybe I am providing more somehow?

SushkoVadim · 2021-11-29T20:33:34Z

Thanks for trying out!
Another guess: could it happen that we use different versions of Ade20k?
See here: #7

In our version we had only 150 classes, yours seems to have more?

ivanlengyel · 2021-11-29T20:39:22Z

It seems to me that the annotations.png images are loaded as regular images, and then we have (batch_size, 3, 256, 256) labels instead of (batch_size, nc, 256, 256).

Do you think that this could be the problem?

edit: in my case I obtained values that go between 0..255 because of the RGB colors.

ivanlengyel · 2021-11-29T20:42:55Z

I don't know if there is any mapper color-->label or how the labels are supposed to be created from the png 3 channels images.

EDIT: I am following this answer to see if I can make it work:
#7 (comment)

SushkoVadim · 2021-11-29T20:50:17Z

The output of data_i['label'].shape is torch.Size([1, 1, 256, 256]) for me.
So something is wrong in the label map format (or PIL version, but don't think so)

ivanlengyel · 2021-11-29T21:01:50Z

Well, I have good and bad news.

The good news is that after downloading the dataset that you pointed out in #7 (comment) I don't obtain that error any more.
So I guess that the validation images should be GRAY values (that is 1 channel) images. I don't know why I downloaded some images of the ADE challenge in which the annotations are in color. That is the reason of the error.

The bad news is that I have a new error XD:

python test.py --name oasis_ade20k --dataset_mode ade20k --gpu_ids 0  \
--dataroot ADEChallengeData2016 --batch_size 1
----------------- Options ---------------
                EMA_decay: 0.9999
               batch_size: 1
               channels_G: 64
          checkpoints_dir: ./checkpoints
                ckpt_iter: best
                 dataroot: ADEChallengeData2016          	[default: ./datasets/cityscapes/]
             dataset_mode: ade20k                        	[default: coco]
                  gpu_ids: 0
                     name: oasis_ade20k                  	[default: label2coco]
               no_3dnoise: False
                   no_EMA: False
                  no_flip: False
         no_spectral_norm: False
           num_res_blocks: 6
          param_free_norm: syncbatch
                    phase: test                          	[default: train]
              results_dir: ./results/
                     seed: 42
                 spade_ks: 3
                    z_dim: 64
----------------- End -------------------
Created Ade20kDataset, size train: 2000, size val: 2000
Created OASIS_Generator with 74314691 parameters
THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1544176307774/work/aten/src/THC/THCGeneral.cpp line=405 error=11 : invalid argument
Traceback (most recent call last):
  File "test.py", line 25, in <module>
    generated = model(None, label, "generate", None)
  File "/anaconda/envs/oasis/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/anaconda/envs/oasis/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 141, in forward
    return self.module(*inputs[0], **kwargs[0])
  File "/anaconda/envs/oasis/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/azureuser/IM/OASIS/models/models.py", line 72, in forward
    fake = self.netEMA(label)
  File "/anaconda/envs/oasis/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/azureuser/IM/OASIS/models/generator.py", line 43, in forward
    x = self.body[i](x, seg)
  File "/anaconda/envs/oasis/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/azureuser/IM/OASIS/models/generator.py", line 78, in forward
    dx = self.conv_0(self.activ(self.norm_0(x, seg)))
  File "/anaconda/envs/oasis/lib/python3.7/site-packages/torch/nn/modules/module.py", line 485, in __call__
    hook(self, input)
  File "/anaconda/envs/oasis/lib/python3.7/site-packages/torch/nn/utils/spectral_norm.py", line 100, in __call__
    setattr(module, self.name, self.compute_weight(module, do_power_iteration=module.training))
  File "/anaconda/envs/oasis/lib/python3.7/site-packages/torch/nn/utils/spectral_norm.py", line 86, in compute_weight
    sigma = torch.dot(u, torch.mv(weight_mat, v))
RuntimeError: cublas runtime error : the GPU program failed to execute at /opt/conda/conda-bld/pytorch_1544176307774/work/aten/src/THC/THCBlas.cu:116
(oasis) -----

I guess I'll have to debug this new error. However there is not much info in the error message :S

ivanlengyel · 2021-11-29T21:13:46Z

It seems that this error is related with some cuda-pytorch combinations.

I will try to use newer versions of cuda and torch to see if I can fix this error.

However what intrigues me is that I am using the same versions as you since I create the env using the provided yaml.
Anyway, I will investigate on this and see if I can fix it. If I can I'll make sure to post the fix here so if other users are having the same problem they have an answer.

ivanlengyel · 2021-11-30T13:31:11Z

I close this issue since the original problem was fixed.

🚫 The problem is that the dataloader is expecting 1channel gray-scale images and somehow I downloaded a test set of ADE in which the labels are in color. Then My labels where (batch_size, 3, 256, 256) instead of (batch_size, nc, 256, 256).

✅ Using gray scale images fixed my original issue.

Thanks @SushkoVadim for the fast reply and the nice repo.

ivanlengyel closed this as completed Nov 30, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RuntimeError: Creating MTGP constants failed #15

RuntimeError: Creating MTGP constants failed #15

ivanlengyel commented Nov 29, 2021

SushkoVadim commented Nov 29, 2021

ivanlengyel commented Nov 29, 2021 •

edited

SushkoVadim commented Nov 29, 2021

ivanlengyel commented Nov 29, 2021 •

edited

ivanlengyel commented Nov 29, 2021 •

edited

SushkoVadim commented Nov 29, 2021

ivanlengyel commented Nov 29, 2021

ivanlengyel commented Nov 29, 2021 •

edited

ivanlengyel commented Nov 30, 2021

RuntimeError: Creating MTGP constants failed #15

RuntimeError: Creating MTGP constants failed #15

Comments

ivanlengyel commented Nov 29, 2021

SushkoVadim commented Nov 29, 2021

ivanlengyel commented Nov 29, 2021 • edited

SushkoVadim commented Nov 29, 2021

ivanlengyel commented Nov 29, 2021 • edited

ivanlengyel commented Nov 29, 2021 • edited

SushkoVadim commented Nov 29, 2021

ivanlengyel commented Nov 29, 2021

ivanlengyel commented Nov 29, 2021 • edited

ivanlengyel commented Nov 30, 2021

ivanlengyel commented Nov 29, 2021 •

edited

ivanlengyel commented Nov 29, 2021 •

edited

ivanlengyel commented Nov 29, 2021 •

edited

ivanlengyel commented Nov 29, 2021 •

edited