New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make SVD overwrite temporary array x #2277
Conversation
@pentschev SVD can now auto overwrite the temporary array x. Can improve performance possibly (haven't tested yet).
| @@ -290,12 +309,14 @@ def svd(a, full_matrices=True, compute_uv=True): | |||
| raise linalg.LinAlgError( | |||
| 'Parameter error (maybe caused by a bug in cupy.linalg?)') | |||
|
|
|||
| # Note that the returned array may need to be transporsed | |||
| # Note that the returned array may need to be transprosed | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A nitpick but a typo.
| # Note that the returned array may need to be transprosed | |
| # Note that the returned array may need to be transposed |
|
Thanks for the PR. Could you point us to the convo that you're mentioning? And as you touch upon it, it'd be great if you could provide some benchmarks as well. |
|
Oh don't worry now :) It was about some performance issues of SVD. It seems like my changes didn't make SVD faster, just it reduced overall memory usage. |
| trans_flag = False | ||
| X = a.astype(a_dtype, order = 'F', copy = True) | ||
|
|
||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A detail but let's remove one redundant empty line.
| X = a.astype(a_dtype, order = 'F', copy = True) | ||
|
|
||
|
|
||
| MIN = min(n, p) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Upper case variables looks like constants. How about min_mn?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think k is a quite common convention in the context of SVD.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, could you change the other upper-cased variables back to lower case too?
| m, n = a.shape | ||
| x = a.transpose().astype(a_dtype, order='C', copy=True) | ||
|
|
||
| n, p = a.shape |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not keep the previous m?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ye soz it was a quick refactor to test if my changes actually made SVD faster. So to make my life easier I decided to use n, p (statistics convention) rather than the general m, n.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you change it back to the previous naming convention?
| n, p = X.shape | ||
| else: | ||
| trans_flag = False | ||
| X = a.astype(a_dtype, order = 'F', copy = True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A style convention but can you skip the whitespaces when specifying kwargs?
| if a.flags.c_contiguous: | ||
| X = a.astype(a_dtype, order = 'C', copy = True).transpose() | ||
| else: | ||
| X = a.transpose().astype(a_dtype, order = 'F', copy = True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add tests for this code path?
| trans_flag = True | ||
| mn = min(m, n) | ||
| # Perform SVD(X.T) not SVD(X) This reduces complexity from | ||
| # O(np^2) to O(n^2p) if p > n |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be possible to provide some benchmarking results (speed and memory)? Also does this comment mean that you didn't see any speed improvements? #2277 (comment)
|
Oh hey @hvy :) Yep so the PR Im placing was primarily for speed improvements. I think @pentschev tested my code, and we found out that the latest version of cuSOLVER was far faster than the current version + it's faster than MKL. However, I guess the only benefit of my PR is that memory usage is reduced (ie by approx np or mn). Oh loll ye the styling is probs from my C / C++ coding style....... |
|
I see. It makes sense to make use of |
|
|
||
| if compute_uv: | ||
| if full_matrices: | ||
| u = cupy.empty((m, m), dtype=a_dtype) | ||
| vt = cupy.empty((n, n), dtype=a_dtype) | ||
| U = cupy.empty((n, n), dtype = a_dtype, order = 'F') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry another comment on the style but can you stick to the following?
| U = cupy.empty((n, n), dtype = a_dtype, order = 'F') | |
| U = cupy.empty((n, n), dtype=a_dtype, order='F') |
| workspace.data.ptr, buffersize, 0, dev_info.data.ptr) | ||
|
|
||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's skip this line.
|
@danielhanchen Do you still have interest in this PR? Otherwise, can I take this over or close? |
|
@takagi Hey! Sorry I kind of forget about this PR lol. Possibly it's best if someone else takes over / or closes this. Thanks! |
|
Thanks for your response, then let me take it over. |
|
Let me close this PR for now and open another issue to reduce |
|
@danielhanchen Is it ensured that, if you remember, |
|
@takagi In https://github.com/cupy/cupy/blob/master/cupy/linalg/decomposition.py, both paths m >=n and m < n copy the input matrix A: In https://docs.nvidia.com/cuda/cusolver/index.html#cuds-lt-t-gt-gesvd, only if |
|
Thanks, I got it! |
@pentschev
Regarding our convo.
SVD can now auto overwrite the temporary array x. Can improve performance possibly (haven't tested yet).
Should support slightly larger data sizes also.