Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use cuTENSOR in cupy.sum #2939

Merged
merged 4 commits into from Jul 27, 2020
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
16 changes: 14 additions & 2 deletions cupy/core/_routines_math.pyx
Expand Up @@ -23,6 +23,13 @@ if not cupy.cuda.runtime.is_hip:
else:
cub = None

if cupy.cuda.cutensor_enabled:
import cupy_backends.cuda.libs.cutensor as cuda_cutensor
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this needed?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, we want to move everything to cupy_backends for these libs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah OK.

from cupy import cutensor
else:
cuda_cutensor = None
cutensor = None


# ndarray members

Expand Down Expand Up @@ -95,12 +102,17 @@ cdef ndarray _ndarray_prod(ndarray self, axis, dtype, out, keepdims):

cdef ndarray _ndarray_sum(ndarray self, axis, dtype, out, keepdims):
for accelerator in _accelerator._routine_accelerators:
result = None
if accelerator == _accelerator.ACCELERATOR_CUB:
# result will be None if the reduction is not compatible with CUB
result = cub.cub_reduction(
self, cub.CUPY_CUB_SUM, axis, dtype, out, keepdims)
if result is not None:
return result
if accelerator == _accelerator.ACCELERATOR_CUTENSOR:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you need to check if cutensor is None?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does _accelerator allows to set ACCELERATOR_CUTENSOR if cutensor is not available?
If such case then just checking this is fine.

Copy link
Member Author

@asi1024 asi1024 Jul 22, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't want the routines to fallback to CuPy's default reduction silently in such cases. I will fix _set_{routine/reduction}_accelerator in another PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By "in such cases" did you mean the library is absent but the user still requests to use it? If so, I think the current implementation makes sense!

result = cutensor._try_reduction_routine(
self, axis, dtype, out, keepdims, cuda_cutensor.OP_ADD, 1, 0)
if result is not None:
return result

if dtype is None:
return _sum_auto_dtype(self, axis, dtype, out, keepdims)
else:
Expand Down
64 changes: 64 additions & 0 deletions cupy/cutensor.py
Expand Up @@ -5,6 +5,7 @@
from cupy_backends.cuda.api import runtime
from cupy.cuda import cutensor
from cupy.cuda import device
from cupy.core import _reduction

_handles = {}
_tensor_descriptors = {}
Expand Down Expand Up @@ -496,3 +497,66 @@ def reduction(alpha, A, desc_A, mode_A, beta, C, desc_C, mode_C,
out.data.ptr, desc_C, mode_C.data,
reduce_op, cutensor_dtype, ws.data.ptr, ws_size)
return out


_cutensor_dtypes = [
# TODO(asi1024): Support float16
# numpy.float16,
numpy.float32,
numpy.float64,
numpy.complex64,
numpy.complex128,
]


def _try_reduction_routine(x, axis, dtype, out, keepdims, op, alpha, beta):
if dtype is None:
dtype = x.dtype

if dtype not in _cutensor_dtypes:
return None
if dtype != x.dtype:
return None

if x.size == 0:
return None
if not x._c_contiguous:
# TODO(asi1024): Support also for F-contiguous array
return None

in_arg = x

reduce_axis, out_axis = _reduction._get_axis(axis, x.ndim)
out_shape = _reduction._get_out_shape(
x.shape, reduce_axis, out_axis, keepdims)
if out is None:
out = cupy.empty(out_shape, dtype)
elif out.shape != out_shape:
emcastillo marked this conversation as resolved.
Show resolved Hide resolved
# TODO(asi1024): Support broadcast
return None
elif out.dtype != dtype:
return None
elif not out._c_contiguous:
# TODO(asi1024): Support also for F-contiguous array
return None

if keepdims:
out_arg = out.reshape(
_reduction._get_out_shape(x.shape, reduce_axis, out_axis, False))
else:
out_arg = out

# TODO(asi1024): Remove temporary fix
in_arg._set_contiguous_strides(in_arg.itemsize, True)
out_arg._set_contiguous_strides(out_arg.itemsize, True)

desc_in = create_tensor_descriptor(in_arg)
desc_out = create_tensor_descriptor(out_arg)
mode_in = list(range(in_arg.ndim))
mode_out = [axis for axis in mode_in if (axis not in reduce_axis)]

reduction(
alpha, in_arg, desc_in, mode_in, beta, out_arg, desc_out, mode_out,
op, dtype)

return out