-
-
Notifications
You must be signed in to change notification settings - Fork 792
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use cuTENSOR in cupy.prod
, cupy.max
, cupy.min
, cupy.ptp
and cupy.mean
#3765
Conversation
Nice! I was wondering when this PR would show up! 😁 Have you done any benchmarks? |
pfnCI, test this please. |
Jenkins, test this please |
Jenkins CI test (for commit be9bd9c, target branch master) failed with status FAILURE. |
Jenkins, test this please. |
Jenkins CI test (for commit 5285907, target branch master) failed with status FAILURE. |
@asi1024 Could you check test failures? |
Co-authored-by: Kenichi Maehashi <webmaster@kenichimaehashi.com>
5285907
to
2ccad4d
Compare
cupy.prod
, cupy.max
, cupy.min
, cupy.ptp
and cupy.mean
cupy.prod
, cupy.max
, cupy.min
, cupy.ptp
and cupy.mean
Rebased. |
pfnCI, test this please. |
cupy.prod
, cupy.max
, cupy.min
, cupy.ptp
and cupy.mean
cupy.prod
, cupy.max
, cupy.min
, cupy.ptp
and cupy.mean
pfnCI, test this please. |
Jenkins CI test (for commit a45dae1, target branch master) failed with status FAILURE. |
pfnCI, test this please. |
Jenkins CI test (for commit a45dae1, target branch master) failed with status FAILURE. |
cupy/core/_routines_statistics.pyx
Outdated
@@ -7,6 +7,7 @@ import cupy | |||
from cupy.core import _reduction | |||
from cupy.core._reduction import create_reduction_func | |||
from cupy.core._reduction import ReductionKernel | |||
from cupy import cutensor |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line needs to be removed.
pfnCI, test this please. |
Jenkins CI test (for commit 28ba922, target branch master) failed with status FAILURE. |
Jenkins, test this please. |
Jenkins CI test (for commit 28ba922, target branch master) failed with status FAILURE. |
pfnCI, test this please. |
Jenkins CI test (for commit 28ba922, target branch master) succeeded! |
LGTM! |
Use cuTENSOR in `cupy.prod`, `cupy.max`, `cupy.min`, `cupy.ptp` and `cupy.mean`
Looks like each backend has its own strength and there's no clear dominator. import cupy as cp
import numpy as np
from cupyx.time import repeat
##cp.show_config()
#CUB_device_targets = ['sum', 'prod', 'min', 'max', 'argmax', 'argmin', 'cumsum', 'cumprod', 'mean']
#full_targets = CUB_device_targets + \
# ['amin', 'amax', 'nanmin', 'nanmax', 'nanargmin', 'nanargmax',
# 'nanmean',
# 'var', 'nanvar', 'nansum', 'nanprod',
# 'all', 'any', 'count_nonzero']
CUB_device_targets = ['sum', 'prod', 'min', 'max', 'ptp', 'mean']
full_targets = CUB_device_targets
dtypes = [cp.float32, cp.float64, cp.complex64, cp.complex128]
shape = (512, 512, 512)
axes = ((2,), (1, 2), (0, 1, 2))
for dtype in dtypes:
a = cp.random.random(shape)
if dtype in (cp.complex64, cp.complex128):
a = a + 1j*cp.random.random(shape)
a = a.astype(dtype)
for target in full_targets:
for axis in axes:
if target in ('argmax', 'argmin', 'nanargmax', 'nanargmin', 'cumsum', 'cumprod'):
if len(axis) != a.ndim:
continue
else:
axis = None # NumPy limitation
if dtype in (cp.complex64, cp.complex128) and target in ('nanmin', 'nanmax', 'var', 'nanvar'):
continue
print(f"testing {target} with dtype={dtype} and axes={axis}...")
func_cp = getattr(cp, target)
func_np = getattr(np, target)
cp.core.set_routine_accelerators([])
cp.core.set_reduction_accelerators([])
print(repeat(func_cp, (a, axis), n_repeat=20, name=f'no acce, {target}'))
if not cp.allclose(func_cp(a, axis), func_np(cp.asnumpy(a), axis)):
print(f"WARNING: CuPy's kernel might have a problem with {target} and {dtype}")
if target in CUB_device_targets:
cp.core.set_routine_accelerators(['cub'])
cp.core.set_reduction_accelerators([])
print(repeat(func_cp, (a, axis), n_repeat=20, name=f'CUB device, {target}'))
if not cp.allclose(func_cp(a, axis), func_np(cp.asnumpy(a), axis)):
print(f"WARNING: CUB device might have a problem with {target} and {dtype}")
cp.core.set_routine_accelerators([])
cp.core.set_reduction_accelerators(['cub'])
print(repeat(func_cp, (a, axis), n_repeat=20, name=f'CUB block, {target}'))
if not cp.allclose(func_cp(a, axis), func_np(cp.asnumpy(a), axis)):
print(f"WARNING: CUB block might have a problem with {target} and {dtype}")
cp.core.set_routine_accelerators(['cutensor'])
cp.core.set_reduction_accelerators([])
print(repeat(func_cp, (a, axis), n_repeat=20, name=f'cuTENSOR, {target}'))
if not cp.allclose(func_cp(a, axis), func_np(cp.asnumpy(a), axis)):
print(f"WARNING: cuTENSOR might have a problem with {target} and {dtype}") Output (nn the master branch + GTX 2080 Ti + CUDA 10.2):
|
Blocked by #3700 and #3732.