Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make SVD overwrite temporary array x #2277

Closed
wants to merge 3 commits into from

Conversation

danielhanchen
Copy link

@pentschev
Regarding our convo.
SVD can now auto overwrite the temporary array x. Can improve performance possibly (haven't tested yet).
Should support slightly larger data sizes also.

@pentschev
SVD can now auto overwrite the temporary array x. Can improve performance possibly (haven't tested yet).
@@ -290,12 +309,14 @@ def svd(a, full_matrices=True, compute_uv=True):
raise linalg.LinAlgError(
'Parameter error (maybe caused by a bug in cupy.linalg?)')

# Note that the returned array may need to be transporsed
# Note that the returned array may need to be transprosed
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A nitpick but a typo.

Suggested change
# Note that the returned array may need to be transprosed
# Note that the returned array may need to be transposed

@hvy
Copy link
Member

hvy commented Jun 27, 2019

Thanks for the PR. Could you point us to the convo that you're mentioning? And as you touch upon it, it'd be great if you could provide some benchmarks as well.

@hvy hvy added the cat:performance Performance in terms of speed or memory consumption label Jun 27, 2019
@danielhanchen
Copy link
Author

Oh don't worry now :) It was about some performance issues of SVD. It seems like my changes didn't make SVD faster, just it reduced overall memory usage.
However, if memory reduction is useful, then I can continue on this PR :)

trans_flag = False
X = a.astype(a_dtype, order = 'F', copy = True)


Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A detail but let's remove one redundant empty line.

Suggested change

X = a.astype(a_dtype, order = 'F', copy = True)


MIN = min(n, p)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Upper case variables looks like constants. How about min_mn?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think k is a quite common convention in the context of SVD.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, could you change the other upper-cased variables back to lower case too?

m, n = a.shape
x = a.transpose().astype(a_dtype, order='C', copy=True)

n, p = a.shape
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not keep the previous m?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ye soz it was a quick refactor to test if my changes actually made SVD faster. So to make my life easier I decided to use n, p (statistics convention) rather than the general m, n.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you change it back to the previous naming convention?

n, p = X.shape
else:
trans_flag = False
X = a.astype(a_dtype, order = 'F', copy = True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A style convention but can you skip the whitespaces when specifying kwargs?

if a.flags.c_contiguous:
X = a.astype(a_dtype, order = 'C', copy = True).transpose()
else:
X = a.transpose().astype(a_dtype, order = 'F', copy = True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add tests for this code path?

trans_flag = True
mn = min(m, n)
# Perform SVD(X.T) not SVD(X) This reduces complexity from
# O(np^2) to O(n^2p) if p > n
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to provide some benchmarking results (speed and memory)? Also does this comment mean that you didn't see any speed improvements? #2277 (comment)

@danielhanchen
Copy link
Author

Oh hey @hvy :) Yep so the PR Im placing was primarily for speed improvements. I think @pentschev tested my code, and we found out that the latest version of cuSOLVER was far faster than the current version + it's faster than MKL.

However, I guess the only benefit of my PR is that memory usage is reduced (ie by approx np or mn).
I haven't tested if internally in cuSOLVER memory is reduced though.

Oh loll ye the styling is probs from my C / C++ coding style.......

@hvy
Copy link
Member

hvy commented Jul 26, 2019

I see. It makes sense to make use of 'O' when it's possible but I think we still want some benchmarks (both mem and speed) before merging this PR, does that sound reasonable? Also, could you take a look at my comments above? :)


if compute_uv:
if full_matrices:
u = cupy.empty((m, m), dtype=a_dtype)
vt = cupy.empty((n, n), dtype=a_dtype)
U = cupy.empty((n, n), dtype = a_dtype, order = 'F')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry another comment on the style but can you stick to the following?

Suggested change
U = cupy.empty((n, n), dtype = a_dtype, order = 'F')
U = cupy.empty((n, n), dtype=a_dtype, order='F')

workspace.data.ptr, buffersize, 0, dev_info.data.ptr)


Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's skip this line.

Suggested change

@hvy hvy added the st:awaiting-author Awaiting response from author label Oct 30, 2019
@emcastillo emcastillo added this to the v8.0.0b2 milestone Mar 23, 2020
@kmaehashi kmaehashi assigned takagi and unassigned hvy Mar 31, 2020
@takagi
Copy link
Member

takagi commented Apr 16, 2020

@danielhanchen Do you still have interest in this PR? Otherwise, can I take this over or close?

@danielhanchen
Copy link
Author

@takagi Hey! Sorry I kind of forget about this PR lol. Possibly it's best if someone else takes over / or closes this. Thanks!

@takagi
Copy link
Member

takagi commented Apr 17, 2020

Thanks for your response, then let me take it over.

@takagi takagi removed the st:awaiting-author Awaiting response from author label Apr 23, 2020
@asi1024 asi1024 modified the milestones: v8.0.0b2, v8.0.0b3 Apr 23, 2020
@takagi
Copy link
Member

takagi commented May 10, 2020

Let me close this PR for now and open another issue to reduce cupy.svd memory usage.

@takagi takagi closed this May 10, 2020
@takagi
Copy link
Member

takagi commented May 12, 2020

@danielhanchen Is it ensured that, if you remember, cusolverDn<t>gesvd() can safely take the same memory region for A and U (or VT)? I've read through cuSOLVER reference, but am not sure that. Just not officially documented?

@danielhanchen
Copy link
Author

@takagi In https://github.com/cupy/cupy/blob/master/cupy/linalg/decomposition.py, both paths m >=n and m < n copy the input matrix A: x = a.astype(a_dtype, order='C', copy=True) and x = a.transpose().astype(a_dtype, order='C', copy=True).

In https://docs.nvidia.com/cuda/cusolver/index.html#cuds-lt-t-gt-gesvd, only if full_matrices=False (which is generally the use case for SVD), jobu = 'S'/'O' and jobvt = 'S'/'O' already overwrites the data on A. (ie the docs mention that). So, to save an extra nn or mm memory space, reuse the already copied A.

@takagi
Copy link
Member

takagi commented May 12, 2020

Thanks, I got it!

@asi1024 asi1024 modified the milestones: v8.0.0b3, Closed PRs Dec 24, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cat:performance Performance in terms of speed or memory consumption
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants