Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Launch train with custom dataset - Invalid index in scatter at .. #68

Open
Luke035 opened this issue Jul 29, 2019 · 17 comments
Open

Launch train with custom dataset - Invalid index in scatter at .. #68

Luke035 opened this issue Jul 29, 2019 · 17 comments

Comments

@Luke035
Copy link

Luke035 commented Jul 29, 2019

Hi all,

thanks for the awesome work you've done!

I'm trying to launch a new train with a custom dataset I've generated using the custom_dataset option in training.

My dataset is composed of 2 classes with a background, considering the background there are 3 classes overall.
I'm trying to stick with the simplest launch configuration, but I'm always getting the same error

python train.py --name REDACTED --dataset_mode custom --label_dir REDACTED --image_dir REDACTED --instance_dir REDACTED  --gpu_ids -1

Traceback (most recent call last):
  File "train.py", line 40, in <module>
    trainer.run_generator_one_step(data_i)
  File "/root/SPADE/trainers/pix2pix_trainer.py", line 35, in run_generator_one_step
    g_losses, generated = self.pix2pix_model(data, mode='generator')
  File "/usr/local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "/root/SPADE/models/pix2pix_model.py", line 43, in forward
    input_semantics, real_image = self.preprocess_input(data)
  File "/root/SPADE/models/pix2pix_model.py", line 125, in preprocess_input
    input_semantics = input_label.scatter_(1, label_map, 1.0)
RuntimeError: Invalid index in scatter at /pytorch/aten/src/TH/generic/THTensorEvenMoreMath.cpp:551

I also tried to launch the same code with the coco_stuff folder but used as a custom dataset and I'm getting the same error.

I'm pretty sure that this is due sto some config option that I'm missing.
Can you guys give me any clue of what's happening?

Thanks!

@aviel08
Copy link

aviel08 commented Jul 29, 2019

Hey I think you have to properly specify the location of --label_dir --image_dir and instance_dir. You can use --dataroot to make it easier.

@Luke035
Copy link
Author

Luke035 commented Jul 30, 2019

Hi @aviel08 thanks for the answer. Unfortunately it doesn't seem to work, when I use --dataroot option with --dataset_mode set to custom I get the error train.py: error: the following arguments are required: --label_dir, --image_dir

In any case I made some progress and I've been able to make it work for coco_stuff by launching the train with --contain_dontcare_label option

 python train.py --name REDACTED  --dataset_mode custom --label_dir /root/SPADE/datasets/coco_stuff/train_label/ --image_dir /root/SPADE/datasets/coco_stuff/train_img --no_instance --gpu_ids -1 --label_nc 182 --contain_dontcare_label

But with my custom dataset I still haven't been able to make it work: debuggin the code I saw that the labels generated in my code are (np.uniques on the label tensor) 3: [0,2,128]
Probably there's something wrong with the generated labels.
Any clue?

@aviel08
Copy link

aviel08 commented Jul 30, 2019

Well first thing you should try is using the provided dataset to test it first and then you can use your own data. Make sure you're specify the location of each folder for --label_dir and --image_dir. Then specify --label_nc, which is 182 for coco_stuff, this is the default anyway but it's a good practice since you'll have to write the number of your labels anyway.
If it works with coco_stuff but it doesn't with your dataset you might need to replicate their format.
I think it's something like this:
rgb - 8 bits - image
grayscale - 8 bits - label
grayscale 16 bits - instance.

Your command should be:

python train.py --name [name_of_experiment] --dataset_mode custom --label_dir [path_to_labels] -- image_dir [path_to_images] --label_nc [num_labels]

@Luke035
Copy link
Author

Luke035 commented Jul 30, 2019

@aviel08 Thanks again for your suggestions.

The fact is that for the provided dataset everything works fine even if I use it with the --dataset_mode custom option, but not when I try with my dataset.

In your last comment you wrote that label_nc should be set to 182, and that's what I've done with the coco_stuff dataset. Do you think that I have to do the same with my dataset even if it's composed of just 3 categories?

@aviel08
Copy link

aviel08 commented Jul 30, 2019

So you narrowed it down to your dataset, it has to be like the ones on the sample, same format.
If you have 3 categories then your have to write --label_nc 2

@Luke035
Copy link
Author

Luke035 commented Jul 30, 2019

Yes that's exactly what I tried, but I'm still getting the error

RuntimeError: Invalid index in scatter at /pytorch/aten/src/TH/generic/THTensorEvenMoreMath.cpp:551

I'm also trying to use the label_map generated from the coco dataset with the input_label_map generated from my dataset but still no luck

@aviel08
Copy link

aviel08 commented Jul 30, 2019 via email

@Luke035
Copy link
Author

Luke035 commented Jul 30, 2019

It seems to work! Wow 😮
Any clue of the reason why it worked?

This is the command I used:

python train.py --name REDACTED  --dataset_mode custom --label_dir REDACTED --image_dir REDACTED --instance_dir REDACTED --label_nc 200 --contain_dontcare_label --gpu_ids -1 

@aviel08
Copy link

aviel08 commented Jul 30, 2019 via email

@Luke035
Copy link
Author

Luke035 commented Jul 30, 2019

Good to know, but it seems like a bug: the label_nc number that I set is not correct given the dataset that I have. I tried to debug it and my guess is that there's something in the dataloader

@NseaBlu
Copy link

NseaBlu commented Nov 25, 2019

thanks for the awesome work you've done!
"the label_nc number that I set is not correct given the dataset that I have" ,Will this affect the network structure and training results?
thank you very much!

@Ehsan-Yaghoubi
Copy link

Hi @Luke035 and @aviel08 ,

I have the same error in another project. I could not find a solution for it. I appreciate your help. I have asked the question at michuanhaohao/reid-strong-baseline#120

@Sonatau
Copy link

Sonatau commented Jun 2, 2021

I met the same problem that you mentioned, and it worked after I used the command:
python train.py --name MYTEST --dataset_mode custom --label_dir ./datasets/water_color/train_label --image_dir ./datasets/water_color/train_img --label_nc 300 --no_instance --contain_dontcare_label

but I still don't know why it worked, but the key point is "label_nc". And I have seen the another answer, said we need to set a large value of this parameter, such as 200 or 300, the classes of my custom dataset is 9, then it worked..

@mjehanzaib999
Copy link

@Sonatau what are you using for the labels and the instance maps? I am using rgb masks for labels and grayscale type maps for the instance parameter and getting cuda errors

@Sonatau
Copy link

Sonatau commented Sep 14, 2021

Hi @mjehanzaib999, according to the label map, you should convert the segmentation mask (H, W, 3) to a grey label (H, W) firstly. For example, in the label map, we have id 1 mapping to rgb (22, 33, 11). if there is a pixel in mask belongs to rgb (22, 33, 11), the corresponding position in label should be 1.
Unfortunately I didn't use the instance maps, so I might not be able to solve your puzzle about it.

@sareyu0753
Copy link

the same issue, when i set larger label_nc, ok

@NseaBlu
Copy link

NseaBlu commented Mar 1, 2022 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants