Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sgesvd_bufferSize int32 overflow with CUDA 10.1 #2351

Closed
econtal opened this issue Jul 30, 2019 · 11 comments
Closed

sgesvd_bufferSize int32 overflow with CUDA 10.1 #2351

econtal opened this issue Jul 30, 2019 · 11 comments
Assignees

Comments

@econtal
Copy link
Contributor

econtal commented Jul 30, 2019

After upgrading from CUDA 9.0 to CUDA 10.1, I noticed I'm not able to compute svd on big matrices because of a int32 overflow in sgesvd_bufferSize here: https://github.com/cupy/cupy/blob/master/cupy/linalg/decomposition.py#L257
When bufferSize is just above 2**31 = 2147483648, then sgesvd_bufferSize fails with CUSOLVER_STATUS_INVALID_VALUE, likely because of a wrong cast to negative values. If you continue increasing, you can get positive values again, but wrong ones. See graph below.

This might be an issue with cuSOLVER itself, not cupy, but since I'm not familiar with testing CUDA without cupy I can't tell.
It might be related to #1365 as well, but I haven't tested on CUDA 9.1 nor 10.0.

import numpy
from cupy.cuda import cusolver
from cupy.cuda import device

handle = device.get_cusolver_handle()

def test(m):
  try:
    return cusolver.sgesvd_bufferSize(handle, m, 1)
  except:
    return numpy.nan
values = [(m, test(m)) for m in numpy.linspace(1, 150000, 200).astype('int')]    

sgesvd_buffersize_overflow

CuPy Version          : 6.2.0
CUDA Build Version    : 10010
@toslunar
Copy link
Member

toslunar commented Aug 6, 2019

@kmaehashi and I confirmed that cupy.linalg.svd(cupy.random.randn(50000, 10), full_matrices=False) runs with CUDA 9.0 but not with CUDA 10.0.

@econtal
Copy link
Contributor Author

econtal commented Aug 6, 2019

In my opinion the first issue is on CUDA's side, because the memory required to perform an SVD on tall matrices should be linear with respect to the input size, and here it is clearly quadratic. I posted about this in nvidia developer channel (https://devtalk.nvidia.com/default/topic/1057891/gpu-accelerated-libraries/cusolver-can-t-compute-svd-on-tall-matrices-with-cuda-10-1-buffersize-grows-quadratically/) but haven't got any answer yet.

The second issue is the integer overflow. If CUDA's gesvd actually requires a quadratic amount of memory, then for values above 2**31, either we will get CUSOLVER_STATUS_INVALID_VALUE (overflow to negative), or we won't get the error but likely won't allocate enough memory (overflow to positive).

@emcastillo
Copy link
Member

emcastillo commented Aug 7, 2019

cusolverDnSgesvd_bufferSize grows linear in cuda 9.0.
I guess that they changed the allocation algorithm and together with the int return type it blowed up.

@anaruse do you have any insight on this issue?

@econtal
Copy link
Contributor Author

econtal commented Aug 7, 2019

Indeed, with previous version of CUDA cusolverDnSgesvd_bufferSize was strictly linear (actually cusolver.sgesvd_bufferSize(handle, m, 1) == m+224)

After doing more tests, I confirm the overflow is still here in CUDA9.2, but you would need tremendous matrices to reach it.

Here is on CUDA9.2 with the same function test as before:

>>> test((1<<31)-224-1) 
2147483647
>>> test((1<<31)-224)
-2147483648
>>> test((1<<31)-1)
-2147483553
>>> test(1<<31)
nan

This suggests the issue is not cupy at all, but a wrong logic in CUDA10's cusolverDnSgesvd_bufferSize; and that the overflow has always been there but never occurred in practical cases before this logic error.

image

@anaruse
Copy link
Contributor

anaruse commented Aug 7, 2019

Please wait for a while. This issue is being inquired to library team.

@anaruse
Copy link
Contributor

anaruse commented Aug 8, 2019

Thanks for your information again.
This issue is recognized by the library team now. This issue (requiring much more work space) seems to be a side effect of performance improvement in CUDA 10.1 from old version.

@emcastillo
Copy link
Member

@anaruse thanks a lot!
I guess that the library team will release a hot-fix for this :).

@jakirkham
Copy link
Member

@anaruse, is this still an issue with the recent updates applied to CUDA 10.1?

cc @pentschev (for awareness)

@anaruse
Copy link
Contributor

anaruse commented Sep 24, 2019

Yes, this is still an issue even with CUDA 10.1 Update 2.

@leofang
Copy link
Member

leofang commented Nov 19, 2020

@anaruse Is this fixed in CUDA 11.x?

@anaruse
Copy link
Contributor

anaruse commented Nov 20, 2020

This issue should have been fixed in CUDA 11.0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants