New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
About unified memory in Cupy #3127
Comments
This looks like an error in cuSOLVER due to using too large arrays. But I don't think it is related to unified memory. Seems related or a duplicate of #2351. Can you try CUDA 9? @anaruse can you guys double check please? |
|
Seems this is because input matrix is bit too large so that it requires work buffer with more than 2G elements causing overflow in cusolverDnDgesvd_bufferSize... Maybe, size_t should be used as data type for |
Situation might be better in CUDA 10.2 since it should requires smaller work buffer compared with CUDA 10.1. But seems another issue happens in cusolverDnDgesvd called after cusolverDnDgesvd_bufferSize...
|
Thanks Akira! Very insightful as always 😄 A somewhat recent discussion came up in regards to So if users want to scale beyond this, I guess they should use some out-of-core processing library like Dask. Is that right? Or are there other options before using out-of-core tools? 😉 |
Not sure if this counts as your "out-of-core" solution, but I think cuSOLVER has multi-GPU routines. Unlike multi-GPU cuFFT, though, this is not yet supported in CuPy. An ongoing discussion is in #2742. |
Good point! Thanks Leo 😄 Yeah I would call that multi-core. Normally I think of out-of-core as a solution where not everything can fit in memory (admittedly that is not exactly the limitation here). Agree multi-core solutions are worth exploring as well 🙂 |
Thank you all for your comments and feedback. Good to know it is not a problem directly related to how CuPy's uses unified memory. @emcastillo @anaruse @leofang We are testing/benchmarking CuPy and NV Rapids with large memory allocations in Summit supercomputer using its production environment. Our ultimate goal is to offer scalable CPU and GPU based analytics to our users. |
FYI, respose from cuSolver team.
|
|
From what @anaruse reported, the problem is not coming from CuPy itself, so I think the issue is solved. |
Which version of Cuda toolkit is OK? |
There is not currently a working version, you will have to wait for the next CUDA release and the date has not been announced yet. |
I meet the same question as this, as the matrix input is too large(60k * 60k), how to solve this problem? |
We are waiting for nvidia folks to solve this, as this is a cuda related issue and not a cupy one. |
Hi CuPy team,
Is there any documentation describing which CuPy functions supports unified memory ?
So far I've tested two examples. The first one is a dot product between large vectors, which worked for me:
and the second, is a simple SVD test:
which fails with the following error:
We are doing benchmarking on Power9 to know the behavior of CuPy for datasets bigger than 16 GB and knowing about what CuPy features work and what doesn't with unified memory will allow us to progress faster.
PD, according to this technical report, section 3.6
https://developer.nvidia.com/sites/default/files/akamai/cuda/files/Misc/mygpu.pdf
unified memory can be expressed in cuSolver
System configuration
IBM Power System AC922. 2x POWER9 CPU (84 smt cores each) 512 GB RAM, 6x NVIDIA Volta GPU with 16 GB HBM2
GCC 6.4
CUDA 10.1.168
NVIDIA Driver 418.67
CuPy 7.1.1
Thanks,
Benjamin
The text was updated successfully, but these errors were encountered: