New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CUDA 11.2: Add MemoryAsyncPool
to support malloc_async
#4592
Conversation
MemoryAsyncPool
supporting stream-ordered memory allocatorMemoryAsyncPool
supporting stream-ordered memory allocator
MemoryAsyncPool
supporting stream-ordered memory allocatorMemoryAsyncPool
supporting malloc_async
MemoryAsyncPool
supporting malloc_async
MemoryAsyncPool
to support malloc_async
Jenkins, test this please |
Jenkins CI test (for commit b05f5a8, target branch master) failed with status FAILURE. |
Jenkins, test this please |
Jenkins CI test (for commit 5b5ed3a, target branch master) succeeded! |
Jenkins CI test (for commit e6538b1, target branch master) failed with status FAILURE. |
@emcastillo Looks like the CI could randomly hit OOM in the added tests. What should I do? Make the tests less memory-hungry? Mark them slow? |
Jenkins, test this please |
Jenkins CI test (for commit a5e5255, target branch master) failed with status FAILURE. |
Jenkins, test this please |
Jenkins CI test (for commit a5e5255, target branch master) succeeded! |
@emcastillo Tests added in this PR passed twice in a row. Should be safe now? |
@emcastillo This pull-request is marked as |
1 similar comment
@emcastillo This pull-request is marked as |
lets rebase |
Done! Jenkins, test this please |
Jenkins CI test (for commit 9e1d636, target branch master) succeeded! |
Jenkins, test this please |
Jenkins CI test (for commit c49e805, target branch master) succeeded! |
CUDA 11.2: Add `MemoryAsyncPool` to support `malloc_async`
Blocked by #4537. The new stuff starts at 0b840b1.This PR enables the following canonical usage, similar to CuPy's other memory pools:
Similar to
MemoryPool
, this newMemoryAsyncPool
supports multiple devices, which is not transparent if just using the baremalloc_async()
from #4537.Moreover, having a wrapper like
MemoryAsyncPool
allows more possibilities, such ascudaMemPool_t
handle to use for each device via the pool constructorIn the future we could support
MemoryAsyncPool.set_limit()
by usingcudaMemPoolTrimTo()
. I do not add it in this PR because it's not yet clear to me how nicely it can be when working with other libraries that also use the async pool.