Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove memory copy in matmul #6179

Merged
merged 10 commits into from
Dec 1, 2021
Merged

Remove memory copy in matmul #6179

merged 10 commits into from
Dec 1, 2021

Conversation

okuta
Copy link
Member

@okuta okuta commented Nov 30, 2021

The current implementation incurs extra memory consumption and memory copy in matmul operation. This PR solves it.

cupy/_core/_routines_linalg.pyx Outdated Show resolved Hide resolved
cupy/_core/_routines_linalg.pyx Outdated Show resolved Hide resolved
@@ -449,7 +449,7 @@ cpdef ndarray tensordot_core(
out = _ndarray_init(ret_shape, dtype)
else:
if out.dtype != dtype:
out = _ndarray_init(ret_shape, dtype)
raise NotImplementedError("The out array dtype is mismatched")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that the current implementation, which ignores out, seems wrong. I'm not sure if the change is better than commenting # TODO: Fix to write to out.

Copy link
Member

@jakirkham jakirkham Jan 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW NumPy's matmul will just error in this case (though it does allow casting to lower precision so float64 to float32).

import numpy as np

a = np.random.random((2, 3))
b = np.random.random((3, 2))
c = np.empty((2, 2), dtype=int)

np.matmul(a, b, out=c)
---------------------------------------------------------------------------
UFuncTypeError                            Traceback (most recent call last)
<ipython-input-7-a4f34170f335> in <module>
      5 c = np.empty((2, 2), dtype=int)
      6 
----> 7 np.matmul(a, b, out=c)

UFuncTypeError: Cannot cast ufunc 'matmul' output from dtype('float64') to dtype('int64') with casting rule 'same_kind'

Note: UFuncTypeError is just a TypeError subclass

@@ -10,7 +10,8 @@
from cupy.linalg import _util

_gu_func_matmul = _GUFunc(
_core.matmul, '(n?,k),(k,m?)->(n?,m?)', supports_batched=True)
_core.matmul, '(n?,k),(k,m?)->(n?,m?)', supports_batched=True,
supports_out=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. This is necessary to eliminate copy operations. The general out support in cupy._core._gufuncs._GUFunc cannot know C-contiguous output is assumed at the cublas call in cupy._core.matmul.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that there is no problem in normal usage. Do you think you need to take any measures?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks to supports_out=False (default), cupy.matmul did not hit this NotImplementedError. Thus, for correctness, the out support should be perfect to declare supports_out=True.

Please test something like

out = xp.zeros((2, 4), xp.float32)[::-1]
return xp.matmul(xp.ones((2, 3)), xp.ones((3, 4)), out=out)

and

out = xp.zeros((2, 4), bool)
xp.matmul(xp.ones((2, 3)), xp.ones((3, 4)), out=out, casting='unsafe')

BTW, I found a bug that cupy.matmul returns out's view instead of out.

@kmaehashi kmaehashi added cat:performance Performance in terms of speed or memory consumption prio:medium labels Nov 30, 2021
@@ -10,7 +10,8 @@
from cupy.linalg import _util

_gu_func_matmul = _GUFunc(
_core.matmul, '(n?,k),(k,m?)->(n?,m?)', supports_batched=True)
_core.matmul, '(n?,k),(k,m?)->(n?,m?)', supports_batched=True,
supports_out=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks to supports_out=False (default), cupy.matmul did not hit this NotImplementedError. Thus, for correctness, the out support should be perfect to declare supports_out=True.

Please test something like

out = xp.zeros((2, 4), xp.float32)[::-1]
return xp.matmul(xp.ones((2, 3)), xp.ones((3, 4)), out=out)

and

out = xp.zeros((2, 4), bool)
xp.matmul(xp.ones((2, 3)), xp.ones((3, 4)), out=out, casting='unsafe')

BTW, I found a bug that cupy.matmul returns out's view instead of out.

cupy/_core/_routines_linalg.pyx Outdated Show resolved Hide resolved
toslunar
toslunar previously approved these changes Nov 30, 2021
Copy link
Member

@toslunar toslunar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@toslunar
Copy link
Member

/test mini

cupy/_core/_routines_linalg.pyx Outdated Show resolved Hide resolved
okuta and others added 2 commits December 1, 2021 11:12
Co-authored-by: Toshiki Kataoka <tos.lunar@gmail.com>
@toslunar
Copy link
Member

toslunar commented Dec 1, 2021

/test mini

@toslunar toslunar merged commit 48f00cc into cupy:master Dec 1, 2021
@toslunar toslunar added this to the v11.0.0a1 milestone Dec 8, 2021
toslunar added a commit to toslunar/cupy that referenced this pull request Dec 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cat:performance Performance in terms of speed or memory consumption prio:medium
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants