Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set current device in cupy.ndarray.get()/set() #2169

Merged

Conversation

hyabe
Copy link
Contributor

@hyabe hyabe commented May 2, 2019

This PR aims to fix an unexpected GPU#0 memory usage during CPU from/to GPU array copy even if the array is not on the GPU#0.
It reproduces on CuPy 6.0.0rc1/CUDA 9.0/V100×4 with the code below.

I've notice the problem though the Chainer's ImageNet example.
When I run the training program with --gpu 1 2 to use GPU#1 and GPU#2, it consumes (~400 MiB) on GPU#0 after the first call to LogReport extension (which internally calls cupy.ndarray.get() via cupy.ndarray.__float__()).

The code to reproduce the problem (on cupy.ndarray.set()) is as follows:

import os
import cupy
import numpy

print(f'CuPy: {cupy.__version__}')
print(f'\nBefore allocation\n==========')
os.system('nvidia-smi --query-gpu=memory.used,memory.total --format=csv')

print(f'\nAllocate ~1 GiB on device #1\n==========')
with cupy.cuda.Device(1):
    gpu_array = cupy.empty(1<<30, dtype=cupy.uint8)
os.system('nvidia-smi --query-gpu=memory.used,memory.total --format=csv')

print(f'\nCopy to the array from CPU.\n==========')
cpu_array = numpy.arange(gpu_array.size, dtype=gpu_array.dtype)
gpu_array.set(cpu_array)
os.system('nvidia-smi --query-gpu=memory.used,memory.total --format=csv')

And the following is a result on my environment (equipped with four 16 GiB GPUs) which shows 438 MiB on GPU#0 is unexpectedly used after CPU-to-GPU copy:

CuPy: 6.0.0rc1

Before allocation
==========
memory.used [MiB], memory.total [MiB]
10 MiB, 16152 MiB
10 MiB, 16152 MiB
10 MiB, 16152 MiB
10 MiB, 16152 MiB

Allocate ~1 GiB on device #1
==========
memory.used [MiB], memory.total [MiB]
10 MiB, 16152 MiB
1462 MiB, 16152 MiB
10 MiB, 16152 MiB
10 MiB, 16152 MiB

Copy to the array from CPU.
==========
memory.used [MiB], memory.total [MiB]
438 MiB, 16152 MiB
1462 MiB, 16152 MiB
10 MiB, 16152 MiB
10 MiB, 16152 MiB

@niboshi niboshi self-assigned this May 7, 2019
@niboshi
Copy link
Member

niboshi commented May 27, 2019

I'm sorry for taking time to respond.
As an essential policy of CuPy, we make users responsible of swtiching the current device.
ndarray.set/get is not an exception.
We consider Chainer's PlotReport what to blame in this case.

Thank you for reporting, any way!

@niboshi
Copy link
Member

niboshi commented May 31, 2019

We internally discussed this topic, and have come to the conclution that we may relax the policy for those copy functions, including asynchronous cases.

So, I'm sorry for changing the decision but could you move the with self.device: statements so that *_async variations are covered?

@hyabe hyabe force-pushed the fix-array-copy-fromto-CPU-always-uses-GPU0 branch from 0e56808 to 11dedd6 Compare June 3, 2019 09:30
@hyabe
Copy link
Contributor Author

hyabe commented Jun 3, 2019

Thank for your reply and sorry for late.

I rolled back to the first commit in which the with self.device: applies to asynchronous case too.

(I misunderstood that stream and device must always be consistent including calling cudaMemcpy(), by dropping the second sentence in the Stream and Event Behavior section of the CUDA document)

@niboshi
Copy link
Member

niboshi commented Jun 3, 2019

Thanks!
Jenkins, test this please

@pfn-ci-bot
Copy link
Collaborator

Successfully created a job for commit 11dedd6:

@niboshi niboshi added cat:enhancement Improvements to existing features st:test-and-merge (deprecated) Ready to merge after test pass. labels Jun 3, 2019
@niboshi niboshi added this to the v7.0.0b1 milestone Jun 3, 2019
@chainer-ci
Copy link
Member

Jenkins CI test (for commit 11dedd6, target branch master) failed with status FAILURE.

@niboshi niboshi changed the title Fix cupy.ndarray.get()/set() always use GPU#0 even if the array is not on GPU#0 Set current device in cupy.ndarray.get()/set() Jun 3, 2019
@niboshi
Copy link
Member

niboshi commented Jun 3, 2019

LGTM
Thanks!

@niboshi niboshi merged commit cb5872d into cupy:master Jun 3, 2019
@hyabe hyabe deleted the fix-array-copy-fromto-CPU-always-uses-GPU0 branch June 4, 2019 05:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cat:enhancement Improvements to existing features st:test-and-merge (deprecated) Ready to merge after test pass.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants