cudart.cudaSetDevice allocates memory on GPU other than target

cuda-python 11.6.1
cuda toolkit 11.2
Ubuntu Linux

If you run something like the following on a multi-GPU machine

```
device_num = 5
err, = cuda.cuInit(0)
err, device = cuda.cuDeviceGet(device_num)
err, cuda_context = cuda.cuCtxCreate(0, device)
err, = cudart.cudaSetDevice(device)
```
The call to cudart.cudaSetDevice will properly set your device to '5', but it will also allocate ~305 MB of memory on device 0 (or whichever is the 0th device in the device list provided by CUDA_VISIBLE_DEVICES).  I think this issue (possibly in the C-CUDA runtime underneath?) may possibly be the root of many downstream issues in libraries like Tensorflow and Pytorch who have similar issues where a user selects a device but still gets tons of allocations on other devices.  This 305 MB may not sound like a lot, but I'm running a program on an Nvidia-DGX with 16 GPUs and I have 64 worker processes, causing 64*305 = 19GB of unusable space to be allocated on GPU 0, which crashes the program.  I cannot simply set CUDA_VISIBLE_DEVICES to correct this problem because the workers are communicating via shared GPU memory (via cuIPCMemHandle) with their parent process, and the parent process needs access to all GPUs.  Additionally, the worker processes are performing data augmentation on one GPU, while writing output to another GPU with a different device ID.

I am trying to investigate a workaround to not call 'cudart.cudaSetDevice' at all, but when it is not called I cannot properly use the pointer given by cuda.cuMemAlloc to create a PyTorch tensor.  When I call cudart.cudaSetDevice, I am able to use the pointer properly.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

cudart.cudaSetDevice allocates memory on GPU other than target #20

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

cudart.cudaSetDevice allocates memory on GPU other than target #20

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions