Remove memory copy in matmul #6179

okuta · 2021-11-30T02:55:53Z

The current implementation incurs extra memory consumption and memory copy in matmul operation. This PR solves it.

cupy/_core/_routines_linalg.pyx

toslunar · 2021-11-30T03:44:42Z

cupy/_core/_routines_linalg.pyx

@@ -449,7 +449,7 @@ cpdef ndarray tensordot_core(
        out = _ndarray_init(ret_shape, dtype)
    else:
        if out.dtype != dtype:
-            out = _ndarray_init(ret_shape, dtype)
+            raise NotImplementedError("The out array dtype is mismatched")


I agree that the current implementation, which ignores out, seems wrong. I'm not sure if the change is better than commenting # TODO: Fix to write to out.

FWIW NumPy's matmul will just error in this case (though it does allow casting to lower precision so float64 to float32).

import numpy as np a = np.random.random((2, 3)) b = np.random.random((3, 2)) c = np.empty((2, 2), dtype=int) np.matmul(a, b, out=c)

--------------------------------------------------------------------------- UFuncTypeError Traceback (most recent call last) <ipython-input-7-a4f34170f335> in <module> 5 c = np.empty((2, 2), dtype=int) 6 ----> 7 np.matmul(a, b, out=c) UFuncTypeError: Cannot cast ufunc 'matmul' output from dtype('float64') to dtype('int64') with casting rule 'same_kind'

Note: UFuncTypeError is just a TypeError subclass

toslunar · 2021-11-30T03:52:52Z

cupy/linalg/_product.py

@@ -10,7 +10,8 @@
 from cupy.linalg import _util

 _gu_func_matmul = _GUFunc(
-    _core.matmul, '(n?,k),(k,m?)->(n?,m?)', supports_batched=True)
+    _core.matmul, '(n?,k),(k,m?)->(n?,m?)', supports_batched=True,
+    supports_out=True)


Yes. This is necessary to eliminate copy operations. The general out support in cupy._core._gufuncs._GUFunc cannot know C-contiguous output is assumed at the cublas call in cupy._core.matmul.

I think that there is no problem in normal usage. Do you think you need to take any measures?

Thanks to supports_out=False (default), cupy.matmul did not hit this NotImplementedError. Thus, for correctness, the out support should be perfect to declare supports_out=True.

Please test something like

out = xp.zeros((2, 4), xp.float32)[::-1] return xp.matmul(xp.ones((2, 3)), xp.ones((3, 4)), out=out)

and

out = xp.zeros((2, 4), bool) xp.matmul(xp.ones((2, 3)), xp.ones((3, 4)), out=out, casting='unsafe')

BTW, I found a bug that cupy.matmul returns out's view instead of out.

toslunar · 2021-11-30T06:49:12Z

cupy/linalg/_product.py

@@ -10,7 +10,8 @@
 from cupy.linalg import _util

 _gu_func_matmul = _GUFunc(
-    _core.matmul, '(n?,k),(k,m?)->(n?,m?)', supports_batched=True)
+    _core.matmul, '(n?,k),(k,m?)->(n?,m?)', supports_batched=True,
+    supports_out=True)


Thanks to supports_out=False (default), cupy.matmul did not hit this NotImplementedError. Thus, for correctness, the out support should be perfect to declare supports_out=True.

Please test something like

out = xp.zeros((2, 4), xp.float32)[::-1] return xp.matmul(xp.ones((2, 3)), xp.ones((3, 4)), out=out)

and

out = xp.zeros((2, 4), bool) xp.matmul(xp.ones((2, 3)), xp.ones((3, 4)), out=out, casting='unsafe')

BTW, I found a bug that cupy.matmul returns out's view instead of out.

cupy/_core/_routines_linalg.pyx

Co-authored-by: Toshiki Kataoka <tos.lunar@gmail.com>

toslunar

LGTM

toslunar · 2021-11-30T11:43:34Z

/test mini

cupy/_core/_routines_linalg.pyx

Co-authored-by: Toshiki Kataoka <tos.lunar@gmail.com>

toslunar · 2021-12-01T02:44:38Z

/test mini

Remove memory copy in matmul

Remove memory copy in matmul

7578f1c

toslunar requested changes Nov 30, 2021

View reviewed changes

kmaehashi assigned toslunar Nov 30, 2021

kmaehashi added cat:performance Performance in terms of speed or memory consumption prio:medium labels Nov 30, 2021

Fix exception and add comment

9abd5b1

toslunar requested changes Nov 30, 2021

View reviewed changes

okuta and others added 5 commits November 30, 2021 17:12

Update cupy/_core/_routines_linalg.pyx

e8c84de

Co-authored-by: Toshiki Kataoka <tos.lunar@gmail.com>

Change variable name

dc8d725

Fix

3170da9

Add tests

b860bd9

Fix tests

47dcbeb

toslunar previously approved these changes Nov 30, 2021

View reviewed changes

toslunar enabled auto-merge November 30, 2021 11:43

Fix matmul

b286401

okuta dismissed toslunar’s stale review via b286401 December 1, 2021 01:23

toslunar requested changes Dec 1, 2021

View reviewed changes

cupy/_core/_routines_linalg.pyx Outdated Show resolved Hide resolved

okuta and others added 2 commits December 1, 2021 11:12

Update cupy/_core/_routines_linalg.pyx

c8d1373

Co-authored-by: Toshiki Kataoka <tos.lunar@gmail.com>

flake8 and narrow error type

f7cebff

toslunar approved these changes Dec 1, 2021

View reviewed changes

toslunar merged commit 48f00cc into cupy:master Dec 1, 2021

This was referenced Dec 2, 2021

Make matmul support ufunc kwargs #6195

Merged

Undefined behavior on overlapping matmul out #6214

Closed

toslunar added this to the v11.0.0a1 milestone Dec 8, 2021

toslunar added a commit to toslunar/cupy that referenced this pull request Dec 15, 2021

Merge pull request cupy#6179 from okuta/remove-memory-copy

ef9dd8f

Remove memory copy in matmul

toslunar mentioned this pull request Dec 15, 2021

[backport] Remove memory copy in matmul #6241

Merged

emcastillo mentioned this pull request Jan 4, 2022

Performance measurements - cp.matmul slower than torch.matmul #5075

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove memory copy in matmul #6179

Remove memory copy in matmul #6179

okuta commented Nov 30, 2021

toslunar Nov 30, 2021

jakirkham Jan 5, 2022 •

edited

Loading

toslunar Nov 30, 2021

okuta Nov 30, 2021

toslunar Nov 30, 2021

toslunar Nov 30, 2021

toslunar left a comment

toslunar commented Nov 30, 2021

toslunar commented Dec 1, 2021

Remove memory copy in matmul #6179

Remove memory copy in matmul #6179

Conversation

okuta commented Nov 30, 2021

toslunar Nov 30, 2021

Choose a reason for hiding this comment

jakirkham Jan 5, 2022 • edited Loading

Choose a reason for hiding this comment

toslunar Nov 30, 2021

Choose a reason for hiding this comment

okuta Nov 30, 2021

Choose a reason for hiding this comment

toslunar Nov 30, 2021

Choose a reason for hiding this comment

toslunar Nov 30, 2021

Choose a reason for hiding this comment

toslunar left a comment

Choose a reason for hiding this comment

toslunar commented Nov 30, 2021

toslunar commented Dec 1, 2021

jakirkham Jan 5, 2022 •

edited

Loading