CUDA 11.2: Add `MemoryAsyncPool` to support `malloc_async` #4592

leofang · 2021-02-01T18:05:50Z

~~Blocked by #4537. The new stuff starts at 0b840b1.~~

This PR enables the following canonical usage, similar to CuPy's other memory pools:

cupy.cuda.set_allocator(cupy.cuda.MemoryAsyncPool().malloc)

Similar to MemoryPool, this new MemoryAsyncPool supports multiple devices, which is not transparent if just using the bare malloc_async() from #4537.

Moreover, having a wrapper like MemoryAsyncPool allows more possibilities, such as

Setting which cudaMemPool_t handle to use for each device via the pool constructor
Enforce memory alignment
Error handling when OOM occurs
Matching the interface of other CuPy pools (despite most of them do not work as we are not doing any bookkeeping here)

In the future we could support MemoryAsyncPool.set_limit() by using cudaMemPoolTrimTo(). I do not add it in this PR because it's not yet clear to me how nicely it can be when working with other libraries that also use the async pool.

leofang · 2021-02-09T04:39:50Z

Jenkins, test this please

chainer-ci · 2021-02-09T05:56:12Z

Jenkins CI test (for commit b05f5a8, target branch master) failed with status FAILURE.

leofang · 2021-02-09T06:48:53Z

Jenkins, test this please

chainer-ci · 2021-02-09T08:40:33Z

Jenkins CI test (for commit 5b5ed3a, target branch master) succeeded!

chainer-ci · 2021-03-19T04:37:27Z

Jenkins CI test (for commit e6538b1, target branch master) failed with status FAILURE.

leofang · 2021-03-19T06:04:33Z

@emcastillo Looks like the CI could randomly hit OOM in the added tests. What should I do? Make the tests less memory-hungry? Mark them slow?

leofang · 2021-03-19T16:42:29Z

Jenkins, test this please

chainer-ci · 2021-03-19T19:00:16Z

Jenkins CI test (for commit a5e5255, target branch master) failed with status FAILURE.

leofang · 2021-03-20T01:48:45Z

Jenkins, test this please

chainer-ci · 2021-03-20T04:11:02Z

Jenkins CI test (for commit a5e5255, target branch master) succeeded!

leofang · 2021-03-20T23:02:53Z

@emcastillo Tests added in this PR passed twice in a row. Should be safe now?

chainer-ci · 2021-03-24T09:54:22Z

@emcastillo This pull-request is marked as st:test-and-merge, but there were no activities for the last 3 days. Could you check?

chainer-ci · 2021-03-28T09:54:15Z

@emcastillo This pull-request is marked as st:test-and-merge, but there were no activities for the last 3 days. Could you check?

emcastillo · 2021-03-28T10:10:35Z

lets rebase

leofang · 2021-03-28T21:25:03Z

lets rebase

Done!

Jenkins, test this please

chainer-ci · 2021-03-28T23:38:16Z

Jenkins CI test (for commit 9e1d636, target branch master) succeeded!

leofang · 2021-04-04T19:32:28Z

Jenkins, test this please

chainer-ci · 2021-04-04T21:15:54Z

Jenkins CI test (for commit c49e805, target branch master) succeeded!

CUDA 11.2: Add `MemoryAsyncPool` to support `malloc_async`

leofang added 16 commits January 21, 2021 01:10

[WIP] this works but needs cleanup

0b840b1

[WIP] clean up

2cdfad0

[WIP] improve docstring; remove set/get_limit

23272cb

[WIP] fix test

6698613

[WIP] fix free_all_blocks; clean up import

444ff75

[WIP] fix threading bug

bad1fee

[WIP] correct mistake

122adba

[WIP] try-except

6df63ae

[WIP] docstring

458d380

[WIP] add comments

d75a1f4

[WIP] pool can raise OOM error; round allocation size

af16e7f

[WIP] add tests for MemoryAsyncPool

5dd199b

add docs

882912e

fix flake8 and add stubs

86e6015

in favor of stream sync over device sync during alloc failure

ad52976

expand docstring

5e6bac6

leofang mentioned this pull request Feb 1, 2021

CUDA 11.2: Support the built-in Stream Ordered Memory Allocator #4537

Merged

4 tasks

leofang changed the title ~~[WIP] Add MemoryAsyncPool supporting stream-ordered memory allocator~~ [WIP] CUDA 11.2: Add MemoryAsyncPool supporting stream-ordered memory allocator Feb 1, 2021

kmaehashi assigned emcastillo Feb 2, 2021

kmaehashi added cat:feature New features/APIs prio:medium labels Feb 2, 2021

Merge branch 'master' into mempool_impl

b05f5a8

leofang changed the title ~~[WIP] CUDA 11.2: Add MemoryAsyncPool supporting stream-ordered memory allocator~~ CUDA 11.2: Add MemoryAsyncPool supporting malloc_async Feb 5, 2021

leofang marked this pull request as ready for review February 5, 2021 03:06

leofang changed the title ~~CUDA 11.2: Add MemoryAsyncPool supporting malloc_async~~ CUDA 11.2: Add MemoryAsyncPool to support malloc_async Feb 5, 2021

fix typo

5b5ed3a

leofang added 2 commits March 19, 2021 12:25

Merge branch 'master' into mempool_impl

a6e246f

tune down the large allocation a little more

a5e5255

leofang dismissed emcastillo’s stale review via a5e5255 March 19, 2021 16:41

leofang force-pushed the mempool_impl branch from e6538b1 to a5e5255 Compare March 19, 2021 16:41

emcastillo removed the st:test-and-merge (deprecated) Ready to merge after test pass. label Mar 28, 2021

Merge branch 'master' into mempool_impl

9e1d636

add new APIs to doc

c49e805

emcastillo approved these changes Apr 6, 2021

View reviewed changes

emcastillo added this to the v10.0.0a1 milestone Apr 6, 2021

emcastillo added st:test-and-merge (deprecated) Ready to merge after test pass. to-be-backported Pull-requests to be backported to stable branch labels Apr 6, 2021

mergify bot merged commit 5880b98 into cupy:master Apr 6, 2021

chainer-ci pushed a commit to chainer-ci/cupy that referenced this pull request Apr 6, 2021

Merge pull request cupy#4592 from leofang/mempool_impl

5ae2e95

CUDA 11.2: Add `MemoryAsyncPool` to support `malloc_async`

chainer-ci mentioned this pull request Apr 6, 2021

[backport] CUDA 11.2: Add MemoryAsyncPool to support malloc_async #5034

Merged

leofang deleted the mempool_impl branch April 6, 2021 01:29

kmaehashi mentioned this pull request May 26, 2021

Fix MemoryAsync to keep a weakref to stream #5264

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CUDA 11.2: Add `MemoryAsyncPool` to support `malloc_async` #4592

CUDA 11.2: Add `MemoryAsyncPool` to support `malloc_async` #4592

leofang commented Feb 1, 2021 •

edited

leofang commented Feb 9, 2021

chainer-ci commented Feb 9, 2021

leofang commented Feb 9, 2021

chainer-ci commented Feb 9, 2021

chainer-ci commented Mar 19, 2021

leofang commented Mar 19, 2021

leofang commented Mar 19, 2021

chainer-ci commented Mar 19, 2021

leofang commented Mar 20, 2021

chainer-ci commented Mar 20, 2021

leofang commented Mar 20, 2021

chainer-ci commented Mar 24, 2021

chainer-ci commented Mar 28, 2021

emcastillo commented Mar 28, 2021

leofang commented Mar 28, 2021

chainer-ci commented Mar 28, 2021

leofang commented Apr 4, 2021

chainer-ci commented Apr 4, 2021

CUDA 11.2: Add MemoryAsyncPool to support malloc_async #4592

CUDA 11.2: Add MemoryAsyncPool to support malloc_async #4592

Conversation

leofang commented Feb 1, 2021 • edited

leofang commented Feb 9, 2021

chainer-ci commented Feb 9, 2021

leofang commented Feb 9, 2021

chainer-ci commented Feb 9, 2021

chainer-ci commented Mar 19, 2021

leofang commented Mar 19, 2021

leofang commented Mar 19, 2021

chainer-ci commented Mar 19, 2021

leofang commented Mar 20, 2021

chainer-ci commented Mar 20, 2021

leofang commented Mar 20, 2021

chainer-ci commented Mar 24, 2021

chainer-ci commented Mar 28, 2021

emcastillo commented Mar 28, 2021

leofang commented Mar 28, 2021

chainer-ci commented Mar 28, 2021

leofang commented Apr 4, 2021

chainer-ci commented Apr 4, 2021

CUDA 11.2: Add `MemoryAsyncPool` to support `malloc_async` #4592

CUDA 11.2: Add `MemoryAsyncPool` to support `malloc_async` #4592

leofang commented Feb 1, 2021 •

edited