Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Runtime error during testing panoptic segmentation (tensors on different devices) #3

Closed
dv-rastogi opened this issue Oct 14, 2021 · 7 comments

Comments

@dv-rastogi
Copy link

During testing panoptic segmentation, with the following command:

python3 test_panoptic.py --dataset_folder "../PASTIS" --weight_folder "../UTAE_PAPs"

I ran into the following error:

  File "test_panoptic.py", line 142, in <module>
    main(config)
  File "test_panoptic.py", line 125, in main
    device=device,
  File "/utae-paps/train_panoptic.py", line 243, in iterate
    pano_meter.add(predictions, y)
  File "/utae-paps/src/panoptic/metrics.py", line 126, in add
    self.cumulative_ious[i] += torch.stack(ious).sum()
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!

It seems that tensor self.cumulative_ious is on cpu while the other tensor is on cuda.

The following change in file /utae-paps/src/panoptic/metrics.py:

   self.cumulative_ious[i] += torch.stack(ious).sum().to(device='cpu')

manages to fix this issue. Kindly confirm the validity of this fix.

Kindly also note that I also tried transferring self.cumulative_ious to cuda, but it runs into errors later during execution.

@VSainteuf
Copy link
Owner

Yes this seems like a valid fix.
I'll just try to understand why I did not bump into this error on my setup even though I also run it on gpu.
Can you share the environment in which you are running the code ?

@dv-rastogi
Copy link
Author

dv-rastogi commented Oct 14, 2021

I am running this on Google Colab with a GPU runtime.
Here is the CPU information for reference:
image
Here is the GPU information for reference:
image

@julianblue
Copy link

@VSainteuf I would probably do something like

self.cumulative_ious[i] += torch.stack(ious).sum().to(device=cumulative_ious[i].get_device())

which should get the device type.

@VSainteuf
Copy link
Owner

Hi @julianblue ,
Yes, but the self.cumulative_ious tensor is initialised on cpu in line 30, so hard coding .to(device='cpu') does not matter too much.
Of course, it could be worth testing if adding the option to keep cumulative_ious on gpu makes the .add(predictions, target) method a bit faster. In that case we would need something like what you suggest.
I'll have a look.

@jhonjam
Copy link

jhonjam commented Dec 15, 2021

Hello, you got to do a new training with your own dataset using this implementation?
I don't know how and what type of data is zones.
I am trying to understand what zones are like and how they are generated for new training.

in file train_panoptic.py, on line 218 through line 235 is called training in conjunction with zones.

Than you

if mode != "train": with torch.no_grad(): predictions = model( x, batch_positions=dates, pseudo_nms=compute_metrics, heatmap_only=heatmap_only, ) else: zones = y[:, :, :, 2] if config.supmax else None optimizer.zero_grad() predictions = model( x,

@jhonjam
Copy link

jhonjam commented Dec 16, 2021

@VSainteuf @watch24hrs-iiitd, Hello

it is possible to do a new training with a new dataset. I am studying the implementation and I find that the file train_panoptic.py needs the zone parameters for this. What are zones and what type of data are they.

Than you

if mode != "train": with torch.no_grad(): predictions = model( x, batch_positions=dates, pseudo_nms=compute_metrics, heatmap_only=heatmap_only, ) else: zones = y[:, :, :, 2] if config.supmax else None optimizer.zero_grad() predictions = model( x,

@VSainteuf
Copy link
Owner

Hi @jhonjam I'm moving your question to a new issue #6 .

VSainteuf added a commit that referenced this issue Dec 20, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants