Set current device in `cupy.ndarray.get()`/`set()` #2169

hyabe · 2019-05-02T12:47:25Z

This PR aims to fix an unexpected GPU#0 memory usage during CPU from/to GPU array copy even if the array is not on the GPU#0.
It reproduces on CuPy 6.0.0rc1/CUDA 9.0/V100×4 with the code below.

I've notice the problem though the Chainer's ImageNet example.
When I run the training program with --gpu 1 2 to use GPU#1 and GPU#2, it consumes (~400 MiB) on GPU#0 after the first call to LogReport extension (which internally calls cupy.ndarray.get() via cupy.ndarray.__float__()).

The code to reproduce the problem (on cupy.ndarray.set()) is as follows:

import os
import cupy
import numpy

print(f'CuPy: {cupy.__version__}')
print(f'\nBefore allocation\n==========')
os.system('nvidia-smi --query-gpu=memory.used,memory.total --format=csv')

print(f'\nAllocate ~1 GiB on device #1\n==========')
with cupy.cuda.Device(1):
    gpu_array = cupy.empty(1<<30, dtype=cupy.uint8)
os.system('nvidia-smi --query-gpu=memory.used,memory.total --format=csv')

print(f'\nCopy to the array from CPU.\n==========')
cpu_array = numpy.arange(gpu_array.size, dtype=gpu_array.dtype)
gpu_array.set(cpu_array)
os.system('nvidia-smi --query-gpu=memory.used,memory.total --format=csv')

And the following is a result on my environment (equipped with four 16 GiB GPUs) which shows 438 MiB on GPU#0 is unexpectedly used after CPU-to-GPU copy:

CuPy: 6.0.0rc1

Before allocation
==========
memory.used [MiB], memory.total [MiB]
10 MiB, 16152 MiB
10 MiB, 16152 MiB
10 MiB, 16152 MiB
10 MiB, 16152 MiB

Allocate ~1 GiB on device #1
==========
memory.used [MiB], memory.total [MiB]
10 MiB, 16152 MiB
1462 MiB, 16152 MiB
10 MiB, 16152 MiB
10 MiB, 16152 MiB

Copy to the array from CPU.
==========
memory.used [MiB], memory.total [MiB]
438 MiB, 16152 MiB
1462 MiB, 16152 MiB
10 MiB, 16152 MiB
10 MiB, 16152 MiB

…on other than GPU#0

niboshi · 2019-05-27T07:32:30Z

I'm sorry for taking time to respond.
As an essential policy of CuPy, we make users responsible of swtiching the current device.
ndarray.set/get is not an exception.
We consider Chainer's PlotReport what to blame in this case.

Thank you for reporting, any way!

niboshi · 2019-05-31T02:02:28Z

We internally discussed this topic, and have come to the conclution that we may relax the policy for those copy functions, including asynchronous cases.

So, I'm sorry for changing the decision but could you move the with self.device: statements so that *_async variations are covered?

hyabe · 2019-06-03T09:32:36Z

Thank for your reply and sorry for late.

I rolled back to the first commit in which the with self.device: applies to asynchronous case too.

(I misunderstood that stream and device must always be consistent including calling cudaMemcpy(), by dropping the second sentence in the Stream and Event Behavior section of the CUDA document)

niboshi · 2019-06-03T09:35:47Z

Thanks!
Jenkins, test this please

pfn-ci-bot · 2019-06-03T09:35:51Z

Successfully created a job for commit 11dedd6:

Dashboard for commit 11dedd6

chainer-ci · 2019-06-03T11:04:19Z

Jenkins CI test (for commit 11dedd6, target branch master) failed with status FAILURE.

niboshi · 2019-06-03T11:07:21Z

LGTM
Thanks!

fix cupy.ndarray.get()/set() always use GPU#0 even when the array is …

11dedd6

…on other than GPU#0

niboshi self-assigned this May 7, 2019

hyabe force-pushed the fix-array-copy-fromto-CPU-always-uses-GPU0 branch from 0e56808 to 11dedd6 Compare June 3, 2019 09:30

niboshi added cat:enhancement Improvements to existing features st:test-and-merge (deprecated) Ready to merge after test pass. labels Jun 3, 2019

niboshi added this to the v7.0.0b1 milestone Jun 3, 2019

niboshi changed the title ~~Fix cupy.ndarray.get()/set() always use GPU#0 even if the array is not on GPU#0~~ Set current device in cupy.ndarray.get()/set() Jun 3, 2019

niboshi merged commit cb5872d into cupy:master Jun 3, 2019

hyabe deleted the fix-array-copy-fromto-CPU-always-uses-GPU0 branch June 4, 2019 05:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Set current device in `cupy.ndarray.get()`/`set()` #2169

Set current device in `cupy.ndarray.get()`/`set()` #2169

hyabe commented May 2, 2019

niboshi commented May 27, 2019

niboshi commented May 31, 2019

hyabe commented Jun 3, 2019

niboshi commented Jun 3, 2019

pfn-ci-bot commented Jun 3, 2019

chainer-ci commented Jun 3, 2019

niboshi commented Jun 3, 2019

Set current device in cupy.ndarray.get()/set() #2169

Set current device in cupy.ndarray.get()/set() #2169

Conversation

hyabe commented May 2, 2019

niboshi commented May 27, 2019

niboshi commented May 31, 2019

hyabe commented Jun 3, 2019

niboshi commented Jun 3, 2019

pfn-ci-bot commented Jun 3, 2019

chainer-ci commented Jun 3, 2019

niboshi commented Jun 3, 2019

Set current device in `cupy.ndarray.get()`/`set()` #2169

Set current device in `cupy.ndarray.get()`/`set()` #2169