Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add apply_along_axis #4008

Merged
merged 7 commits into from
Oct 2, 2020
Merged

Add apply_along_axis #4008

merged 7 commits into from
Oct 2, 2020

Conversation

grlee77
Copy link
Contributor

@grlee77 grlee77 commented Sep 17, 2020

This PR implements apply_along_axis which is a utility to repeatedly apply a 1d function along a given axis, looping over all other axes.

This function is slightly simplified from NumPy's because it doesn't support a separate matrix class and doesn't try to preserve array subclasses like numpy.ma.core.MaskedArray via __array_wrap__ (neither masked arrays or __array_wrap__ are currently implemented by CuPy)

@grlee77
Copy link
Contributor Author

grlee77 commented Sep 17, 2020

In most cases, for ufuncs that take an axis argument it will be preferable to use that rather than this function. However, here is a demo of a case where reduction along a non-contiguous axis via cp.apply_along_axis is currently faster than using the axis argument for cp.mean (with CUB enabled):

import cupy as cp
from cupyx.time import repeat

a = cp.random.randn(1000000, 16)
perf1 = repeat(cp.mean, (a, 0), n_warmup=20, n_repeat=20)
perf2 = repeat(cp.apply_along_axis, (cp.mean, 0, a), n_warmup=20, n_repeat=20)

print("Result for cp.mean:")
print(perf1)

print("Result for cp.apply_along_axis:")
print(perf2)
Result for cp.mean:
mean                :    CPU:   31.015 us   +/-13.272 (min:   21.701 / max:   81.904) us     GPU-0:15901.830 us   +/-213.947 (min:15736.736 / max:16391.808) us
Result for cp.apply_along_axis:
apply_along_axis    :    CPU:  532.148 us   +/-26.029 (min:  504.228 / max:  612.164) us     GPU-0:10424.643 us   +/- 8.711 (min:10407.936 / max:10451.968) us

On the contrary, if I create a with shape (16, 1000000) and take the reduction along axis 1 (contiguous), the computation is much faster and the built-in cupy.mean comes out twice as fast.

a = cp.random.randn(16, 1000000)
perf1 = repeat(cp.mean, (a, 1), n_warmup=20, n_repeat=20)
perf2 = repeat(cp.apply_along_axis, (cp.mean, 1, a), n_warmup=20, n_repeat=20)

print("Result for cp.mean:")
print(perf1)

print("Result for cp.apply_along_axis:")
print(perf2)
Result for cp.mean:
mean                :    CPU:   67.724 us   +/-23.307 (min:   51.729 / max:  147.081) us     GPU-0:  394.077 us   +/-19.734 (min:  380.032 / max:  465.088) us
Result for cp.apply_along_axis:
apply_along_axis    :    CPU:  793.639 us   +/-105.092 (min:  723.818 / max: 1096.841) us     GPU-0:  800.146 us   +/-105.610 (min:  730.112 / max: 1104.896) us

cupy/lib/shape_base.py Outdated Show resolved Hide resolved
cupy/lib/shape_base.py Outdated Show resolved Hide resolved
@grlee77
Copy link
Contributor Author

grlee77 commented Sep 18, 2020

If the new scalar moveaxis function is not desired let me know and I will refactor to only include the minor Cython improvements to the existing moveaxis function.

This PR reduces the time for a call like cp.moveaxis(arr, 0, 1) from just over 3 us to around 1.8 us (or 1.5 us if source and destination axis are the same). I found four additional functions that use moveaxis with scalar axes only and may also benefit from this: linspace, pad, polynomial and linalg.product.

@grlee77
Copy link
Contributor Author

grlee77 commented Sep 18, 2020

The _normalize_axis_index helper in _routines_manipulation is a cdef copy of the one defined in cupy/_util.pyx. I'm not sure if the one in _util.pyx should now just call the one from _routines_manipulation internally?

@leofang
Copy link
Member

leofang commented Sep 21, 2020

The _normalize_axis_index helper in _routines_manipulation is a cdef copy of the one defined in cupy/_util.pyx. I'm not sure if the one in _util.pyx should now just call the one from _routines_manipulation internally?

Just my two cents, you should wait for @asi1024's reply: The helper should be moved to cupy/core/internal.pyx.

cupy/lib/shape_base.py Outdated Show resolved Hide resolved
cupy/lib/shape_base.py Outdated Show resolved Hide resolved
@grlee77
Copy link
Contributor Author

grlee77 commented Sep 21, 2020

Just my two cents, you should wait for @asi1024's reply: The helper should be moved to cupy/core/internal.pyx

That seems like a reasonable location. If we do that then the existing _normalize_axis_tuple should also probably be moved there.

The other style I see is in _routines.sorting.pyx, the functions _ndarray_sort, _ndarray_argsort, _ndarray_partition, _ndarray_argpartition all just use duplicate in-line code like the following:

if axis < 0:
axis += ndim
if not (0 <= axis < ndim):
raise numpy.AxisError('Axis out of range')

I am not sure if that helps avoid any function call overhead or if those should also just call this utility.

A couple of other places in the core module (_fusion_trace.pyx and _routines_math.pyx), currently use the version from cupy/_util.pyx.

I think my preference would be to move to internal as you suggest and just use this version for everything within core.

@asi1024 asi1024 added this to the v9.0.0a1 milestone Sep 28, 2020
'Cannot apply_along_axis when any iteration dimensions are 0'
)
# cupy.asarray needed in case func1d returns a scalar
res = cupy.asarray(func1d(inarr_view[ind0], *args, **kwargs))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If func1d returns a numpy.ndarray, it will be converted to cupy.ndarray in this line. However, ValueError will be raised in L64 only if the length of the reduction axis is greater than 1.
How about checking if the return value of func1d is scalar or cupy.ndarray value?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See if the change in 3f6ec11 matches what your were suggesting? Now if func1d returns a non-scalar numpy array, a value ValueError ("non-scalar numpy.ndarray cannot be used for fill") will occur when trying to assign the NumPy result to buff.

cupy/lib/shape_base.py Outdated Show resolved Hide resolved
@asi1024
Copy link
Member

asi1024 commented Oct 1, 2020

I agree to move _normalize_axis_{index, indices} to cupy/core/internal.pyx too! I will work on it after this PR is merged.

@asi1024
Copy link
Member

asi1024 commented Oct 2, 2020

Jenkins, test this please.

@asi1024 asi1024 changed the title Add apply_along_axis Add apply_along_axis Oct 2, 2020
@chainer-ci
Copy link
Member

Jenkins CI test (for commit 3f6ec11, target branch master) succeeded!

@asi1024
Copy link
Member

asi1024 commented Oct 2, 2020

LGTM!

@asi1024 asi1024 merged commit f5417c4 into cupy:master Oct 2, 2020
@leofang leofang mentioned this pull request Oct 22, 2020
@grlee77 grlee77 deleted the apply_along_axis branch December 18, 2020 00:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants