Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Torch device error in example "plot_sliced_wass_grad_flow_pytorch.py" #371

Closed
eloitanguy opened this issue May 5, 2022 · 4 comments · Fixed by #373
Closed

Torch device error in example "plot_sliced_wass_grad_flow_pytorch.py" #371

eloitanguy opened this issue May 5, 2022 · 4 comments · Fixed by #373

Comments

@eloitanguy
Copy link
Contributor

Describe the bug

Running the "plot_sliced_wass_grad_flow_pytorch.py" raises a torch device-based RuntimeError

To Reproduce

Steps to reproduce the behavior:

  1. From the POT source folder, navigate to examples/backends
  2. Run python plot_sliced_wass_grad_flow_pytorch.py

Terminal output (with edited paths)

2022-05-05 11:08:24.082850: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
Traceback (most recent call last):
File "POT/examples/backends/plot_sliced_wass_grad_flow_pytorch.py", line 82, in
loss = ot.sliced_wasserstein_distance(x1_torch, x2_torch, n_projections=20, seed=gen)
File "POT/ot/sliced.py", line 149, in sliced_wasserstein_distance
projections = get_random_projections(d, n_projections, seed, backend=nx, type_as=X_s)
File "POT/ot/sliced.py", line 58, in get_random_projections
projections = nx.randn(d, n_projections, type_as=type_as)
File "POT/ot/backend.py", line 1777, in randn
return torch.randn(size=size, dtype=type_as.dtype, generator=self.rng_, device=type_as.device)
RuntimeError: Expected a 'cuda' device type for generator but found 'cpu'

Expected behavior

The script should run entirely on gpu and never expect cpu data, since torch.cuda.is_available() == True in this case

Environment (please complete the following information):

  • OS (e.g. MacOS, Windows, Linux): Linux
  • Python version: 3.9.12
  • How was POT installed (source, pip, conda): source
  • Build command you used (if compiling from source): python setup.py build_ext --inplace
  • Only for GPU related bugs:
    • CUDA version: 10.1.24
    • GPU models and configuration: RTX 2070MQ
    • Any other relevant information: N/A

Output of the following code snippet:

import platform; print(platform.platform())

Linux-5.13.0-40-generic-x86_64-with-glibc2.31

import sys; print("Python", sys.version)

Python 3.9.12 (main, Apr 5 2022, 06:56:58)
[GCC 7.5.0]

import numpy; print("NumPy", numpy.__version__)

NumPy 1.22.3

import scipy; print("SciPy", scipy.__version__)

SciPy 1.8.0

import ot; print("POT", ot.__version__)

2022-05-05 11:25:05.637911: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
POT 0.8.3dev

import torch;print("torch", torch.__version__)

torch 1.11.0+cu102

(yes, my CUDA version is old as dirt, but this should be irrelevant)

Additional Context

I prepare my conda env as follows:

conda create -n ot_dev
conda activate ot_dev
conda install pip
pip install -r requirements.txt
cd docs/
pip install -r requirements.txt

@eloitanguy eloitanguy changed the title Torch device error in example "lot_sliced_wass_grad_flow_pytorch.py" Torch device error in example "plot_sliced_wass_grad_flow_pytorch.py" May 5, 2022
@ncassereau-idris
Copy link
Contributor

ncassereau-idris commented May 5, 2022

Hi, Thanks for your report, I can reproduce your issue.

It comes from the fact that torch.randn does not allow a device where the generator is not located. It remained unspotted because on github, POT does not have access to a GPU so examples were computed on CPU. From what I understand from https://pytorch.org/docs/stable/generated/torch.randn.html, it seems to be a Pytorch bug ; I don't think this behaviour was intended.

I can see two ways for fixing it. We could either

  • Replace

    POT/ot/backend.py

    Line 1777 in eccb138

    return torch.randn(size=size, dtype=type_as.dtype, generator=self.rng_, device=type_as.device)
    with return torch.randn(size=size, dtype=type_as.dtype, generator=self.rng_).to(type_as.device)
    Same thing with

    POT/ot/backend.py

    Line 1771 in eccb138

    return torch.rand(size=size, generator=self.rng_, dtype=type_as.dtype, device=type_as.device)
  • Have a generator on CPU and another on GPU and choose each time we want a random tensor.

The first solution is more straightforward but it is a bit more time-consuming. See (Benchmark done on Pytorch 1.10 with a V100):

rng = torch.Generator("cpu")
rng.seed()
rng_gpu = torch.Generator("cuda")
rng_gpu.seed()
%timeit torch.randn(100000, generator=rng).to("cuda")
%timeit torch.randn(100000, generator=rng_gpu, device="cuda")

returns

774 µs ± 135 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
10.5 µs ± 37.7 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

So the second solution is more efficient, but requires more code.

What do you think @rflamary ?

@rflamary
Copy link
Collaborator

rflamary commented May 5, 2022

I think that we need fast generators in a lot of potential applications so I'm sorry @ncassereau-idris but I prefer the second ;). But in this case it means that you need one generator per device because if you have two GPU then it will still has a problem no?

@ncassereau-idris
Copy link
Contributor

I think that we need fast generators in a lot of potential applications so I'm sorry @ncassereau-idris but I prefer the second ;). But in this case it means that you need one generator per device because if you have two GPU then it will still has a problem no?

No actually, I just tested it with 2 V100.
With two devices we can use the generator of a GPU and set the device argument to the second GPU and it works just fine. This issue appears with the CPU really, maybe a TPU would be an issue as well idk.
It might come from the fact that the type changes with the device (torch.Tensor vs torch.cuda.Tensor). Maybe the implementation is different as well.
I tried setting a CPU generator with the state of a GPU generator. It turns out that the shape of the internal state changes as well, so there are profound difference which apparently do not enjoy dealing with several device types.

About the correction of the bug, I will suggest a PR tomorrow, or this afternoon if I have time.

@rflamary
Copy link
Collaborator

rflamary commented May 5, 2022

great thanks for checking it was not obvious o my side

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants