`torch.cuda.current_device()` is changed by CuPy after 10.0 #6569

fangwei123456 · 2022-03-20T11:42:38Z

Description

Recently, @Yanqi-Chen reports a bug with using a Pytorch module accelerated by CuPy when training with Distributed Data Parallel (DDP).

In DDP training, each process uses torch.cuda.current_device() as its default device. But he finds that CuPy will change torch.cuda.current_device(). For example, when training with 4 GPUs, torch.cuda.current_device() should be 0, 1, 2, 3 in each process. But after using CuPy, all processes's torch.cuda.current_device() will output 0.

To Reproduce

I run the following codes to reproduce this problem:

import torch
import cupy
kernel_code = r'''
extern "C" __global__
        void relu(const float* x, float *y, const int &N)
        {
            const int index = blockIdx.x * blockDim.x + threadIdx.x;
            if (index < N)
            {
                y[index] = (float) (x[index] >= 0.0f);
            }
        }
'''

def relu(x: torch.Tensor):
    device_id = x.get_device()
    torch.cuda.set_device(device_id)
    print('1:', torch.cuda.current_device())
    y = torch.zeros_like(x)
    assert device_id >= 0

    with cupy.cuda.Device(device_id):

        kernel = cupy.RawKernel(kernel_code, 'relu')
        threads = 1024
        N = x.numel()

        blocks = (N + threads - 1) // threads
        x = x.contiguous()
        y = y.contiguous()
        N = cupy.asarray(N)
        kernel((blocks,), (threads,), (x.data_ptr(), y.data_ptr(), N))
    print('2:', torch.cuda.current_device())
    return y
device = 'cuda:1'
x = torch.rand([8], device=device) - 0.5
y = relu(x)
print(f'x={x}')
print(f'y={y}')

In machine A, I get outputs:

(pytorch-env) wfang@ubuntu:~/temp_dir$ python test.py 
1: 1
2: 0
x=tensor([-0.1473, -0.3093, -0.0547, -0.1389,  0.4446, -0.3286, -0.4435,  0.0105],
       device='cuda:1')
y=tensor([0., 0., 0., 0., 1., 0., 0., 1.], device='cuda:1')

You can find that the default device is changed from 1 to 0 after with cupy.cuda.Device(device_id).

However, in machine B, I get outputs:

(pytorch-env) wfang@Precision-5820-Tower-X-Series:~/tempdir$ python test.py 
1: 1
2: 1
x=tensor([ 0.0060,  0.0141,  0.4118, -0.4813,  0.4609, -0.3557,  0.3739, -0.3464],
       device='cuda:1')
y=tensor([1., 1., 1., 0., 1., 0., 1., 0.], device='cuda:1')

In machine B, the default device is not changed.

Installation

Source (pip install cupy)

Environment

In machine A:

(pytorch-env) wfang@ubuntu:~/temp_dir$ conda list torch
# packages in environment at /home/wfang/anaconda3/envs/pytorch-env:
#
# Name                    Version                   Build  Channel
pytorch                   1.10.1          py3.9_cuda11.3_cudnn8.2.0_0    pytorch
pytorch-mutex             1.0                        cuda    pytorch
torchaudio                0.10.1               py39_cu113    pytorch
torchvision               0.11.2               py39_cu113    pytorch

(pytorch-env) wfang@ubuntu:~/temp_dir$ conda list cu
# packages in environment at /home/wfang/anaconda3/envs/pytorch-env:
#
# Name                    Version                   Build  Channel
cudatoolkit               11.3.1               h2bc3f7f_2    defaults
cupy-cuda113              10.2.0                   pypi_0    pypi
ncurses                   6.3                  h7f8727e_2    defaults

(pytorch-env) wfang@ubuntu:~/temp_dir$ nvidia-smi
Sun Mar 20 19:41:32 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.74       Driver Version: 470.74       CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA A100-SXM...  On   | 00000000:18:00.0 Off |                    0 |
| N/A   56C    P0   241W / 400W |  13828MiB / 81251MiB |     81%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+
|   1  NVIDIA A100-SXM...  On   | 00000000:3B:00.0 Off |                    0 |
| N/A   61C    P0   218W / 400W |  13830MiB / 81251MiB |     97%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+
|   2  NVIDIA A100-SXM...  On   | 00000000:86:00.0 Off |                   98 |
| N/A   56C    P0   222W / 400W |  13828MiB / 81251MiB |     96%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+
|   3  NVIDIA A100-SXM...  On   | 00000000:AF:00.0 Off |                    0 |
| N/A   57C    P0   215W / 400W |  13828MiB / 81251MiB |     88%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A     20830      C   ...vs/pytorch-env/bin/python    13817MiB |
|    1   N/A  N/A     20831      C   ...vs/pytorch-env/bin/python    13819MiB |
|    2   N/A  N/A     20832      C   ...vs/pytorch-env/bin/python    13817MiB |
|    3   N/A  N/A     20833      C   ...vs/pytorch-env/bin/python    13817MiB |
+-----------------------------------------------------------------------------+

(pytorch-env) wfang@ubuntu:~/temp_dir$ gpustat 
ubuntu                   Sun Mar 20 19:44:07 2022  470.74
[0] NVIDIA A100-SXM-80GB | 58'C, 100 % | 13828 / 81251 MB | wfang(13817M)
[1] NVIDIA A100-SXM-80GB | 63'C,  81 % | 13830 / 81251 MB | wfang(13819M)
[2] NVIDIA A100-SXM-80GB | 57'C,  91 % | 13828 / 81251 MB | wfang(13817M)
[3] NVIDIA A100-SXM-80GB | 59'C,  81 % | 13828 / 81251 MB | wfang(13817M)

(pytorch-env) wfang@ubuntu:/usr/local/cuda/bin$ ./nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Mon_May__3_19:15:13_PDT_2021
Cuda compilation tools, release 11.3, V11.3.109
Build cuda_11.3.r11.3/compiler.29920130_0

In machine B:

(pytorch-env) wfang@Precision-5820-Tower-X-Series:~/tempdir$ conda list torch
# packages in environment at /home/wfang/anaconda3/envs/pytorch-env:
#
# Name                    Version                   Build  Channel
pytorch                   1.10.1          py3.9_cuda11.3_cudnn8.2.0_0    pytorch
pytorch-mutex             1.0                        cuda    pytorch
torch-tb-profiler         0.3.1                    pypi_0    pypi
torchaudio                0.10.1               py39_cu113    pytorch
torchvision               0.11.2               py39_cu113    pytorch

(pytorch-env) wfang@Precision-5820-Tower-X-Series:~/tempdir$ conda list cu
# packages in environment at /home/wfang/anaconda3/envs/pytorch-env:
#
# Name                    Version                   Build  Channel
cudatoolkit               11.3.1               h2bc3f7f_2    defaults
cupy-cuda111              9.4.0                    pypi_0    pypi
ncurses                   6.2                  h58526e2_4    conda-forge

(pytorch-env) wfang@Precision-5820-Tower-X-Series:~/tempdir$ nvidia-smi
Sun Mar 20 19:42:23 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 465.19.01    Driver Version: 465.19.01    CUDA Version: 11.3     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  On   | 00000000:17:00.0 Off |                  N/A |
| 19%   36C    P8    16W / 250W |   1448MiB / 11011MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  NVIDIA GeForce ...  On   | 00000000:B3:00.0 Off |                  N/A |
| 18%   36C    P8    21W / 250W |      3MiB / 11019MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A   2335870      C   python                           1445MiB |
+-----------------------------------------------------------------------------+

(pytorch-env) wfang@Precision-5820-Tower-X-Series:~/tempdir$ gpustat 
Precision-5820-Tower-X-Series  Sun Mar 20 19:44:46 2022  465.19.01
[0] NVIDIA GeForce RTX 2080 Ti | 36'C,   0 % |  1448 / 11011 MB | wfang(1445M)
[1] NVIDIA GeForce RTX 2080 Ti | 36'C,   0 % |     3 / 11019 MB |

(pytorch-env) wfang@Precision-5820-Tower-X-Series:/usr/local/cuda/bin$ ./nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Sun_Mar_21_19:15:46_PDT_2021
Cuda compilation tools, release 11.3, V11.3.58
Build cuda_11.3.r11.3/compiler.29745058_0

Additional Information

No response

The text was updated successfully, but these errors were encountered:

Yanqi-Chen · 2022-03-21T03:47:57Z

I further test multiple version of CuPy and confirm that version 10.0.0, 10.1.0, 10.2.0 will come across this behavior that the torch.cuda.current_device() will unexpectedly fall back to cuda:0, while version 9.4.0, 9.6.0 will not reproduce this behavior.

kmaehashi · 2022-03-22T01:21:57Z

Remove this line and it should work:

with cupy.cuda.Device(device_id):

The "current device" is semantics provided by CUDA and not by each library. torch.cuda.set_device() will change the current device of the current thread, so it will take effect on CuPy as well. Mixing multiple libraries to switch the current device may cause unexpected behavior.

fangwei123456 · 2022-03-22T01:40:44Z

Thanks, it works well. But if I also remove this line:

torch.cuda.set_device(device_id)

def relu(x: torch.Tensor):
    device_id = x.get_device()
    # torch.cuda.set_device(device_id)
    print('1:', torch.cuda.current_device())
    y = torch.zeros_like(x)
    assert device_id >= 0

    # with cupy.cuda.Device(device_id):

    kernel = cupy.RawKernel(kernel_code, 'relu')
    threads = 1024
    N = x.numel()

    blocks = (N + threads - 1) // threads
    x = x.contiguous()
    y = y.contiguous()
    N = cupy.asarray(N)
    print(f'N.device={N.device}')
    kernel((blocks,), (threads,), (x.data_ptr(), y.data_ptr(), N))
    print('2:', torch.cuda.current_device())
    return y
device = 'cuda:1'
x = torch.rand([8], device=device) - 0.5
y = relu(x)
print(f'x={x}')
print(f'y={y}')

Then I will get wrong outputs:

(pytorch-env) wfang@ubuntu:~/temp_dir$ python test.py 
1: 0
N.device=<CUDA Device 0>
2: 0
x=tensor([-0.4328, -0.0702, -0.1543, -0.1021, -0.2400, -0.0898, -0.3499, -0.0559],
       device='cuda:1')
y=tensor([0., 0., 0., 0., 0., 0., 0., 0.], device='cuda:1')

How can I set the device of a CuPy array (e.g., N = cupy.asarray(N)) to use the specific device without using with cupy.cuda.Device(device_id)?

fangwei123456 · 2022-03-22T01:41:38Z

Or, the best way is using torch.cuda.set_device(device_id) before I create a new CuPy array?

kmaehashi · 2022-03-22T01:48:24Z

Or, the best way is using torch.cuda.set_device(device_id) before I create a new CuPy array?

Yes, the point is to use the same library to switch the current device across the codebase.

fangwei123456 · 2022-03-22T01:53:11Z

OK, thanks!

fangwei123456 added the cat:bug Bugs label Mar 20, 2022

fangwei123456 changed the title ~~torch.cuda.current_device() is changed by CuPy in some machines~~ torch.cuda.current_device() is changed by CuPy Mar 21, 2022

fangwei123456 changed the title ~~torch.cuda.current_device() is changed by CuPy~~ torch.cuda.current_device() is changed by CuPy after 10.0 Mar 21, 2022

fangwei123456 closed this as completed Mar 22, 2022

fangwei123456 mentioned this issue Oct 26, 2023

cupy_backends.cuda.api.driver.CUDADriverError: CUDA_ERROR_INVALID_SOURCE: device kernel image is invalid fangwei123456/spikingjelly#437

Open

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`torch.cuda.current_device()` is changed by CuPy after 10.0 #6569

`torch.cuda.current_device()` is changed by CuPy after 10.0 #6569

fangwei123456 commented Mar 20, 2022 •

edited

Yanqi-Chen commented Mar 21, 2022 •

edited

kmaehashi commented Mar 22, 2022

fangwei123456 commented Mar 22, 2022

fangwei123456 commented Mar 22, 2022

kmaehashi commented Mar 22, 2022

fangwei123456 commented Mar 22, 2022

torch.cuda.current_device() is changed by CuPy after 10.0 #6569

torch.cuda.current_device() is changed by CuPy after 10.0 #6569

Comments

fangwei123456 commented Mar 20, 2022 • edited

Description

To Reproduce

Installation

Environment

Additional Information

Yanqi-Chen commented Mar 21, 2022 • edited

kmaehashi commented Mar 22, 2022

fangwei123456 commented Mar 22, 2022

fangwei123456 commented Mar 22, 2022

kmaehashi commented Mar 22, 2022

fangwei123456 commented Mar 22, 2022

`torch.cuda.current_device()` is changed by CuPy after 10.0 #6569

`torch.cuda.current_device()` is changed by CuPy after 10.0 #6569

fangwei123456 commented Mar 20, 2022 •

edited

Yanqi-Chen commented Mar 21, 2022 •

edited