Add `apply_along_axis` #4008

grlee77 · 2020-09-17T22:03:59Z

This PR implements apply_along_axis which is a utility to repeatedly apply a 1d function along a given axis, looping over all other axes.

This function is slightly simplified from NumPy's because it doesn't support a separate matrix class and doesn't try to preserve array subclasses like numpy.ma.core.MaskedArray via __array_wrap__ (neither masked arrays or __array_wrap__ are currently implemented by CuPy)

grlee77 · 2020-09-17T22:04:07Z

In most cases, for ufuncs that take an axis argument it will be preferable to use that rather than this function. However, here is a demo of a case where reduction along a non-contiguous axis via cp.apply_along_axis is currently faster than using the axis argument for cp.mean (with CUB enabled):

import cupy as cp
from cupyx.time import repeat

a = cp.random.randn(1000000, 16)
perf1 = repeat(cp.mean, (a, 0), n_warmup=20, n_repeat=20)
perf2 = repeat(cp.apply_along_axis, (cp.mean, 0, a), n_warmup=20, n_repeat=20)

print("Result for cp.mean:")
print(perf1)

print("Result for cp.apply_along_axis:")
print(perf2)

Result for cp.mean:
mean                :    CPU:   31.015 us   +/-13.272 (min:   21.701 / max:   81.904) us     GPU-0:15901.830 us   +/-213.947 (min:15736.736 / max:16391.808) us
Result for cp.apply_along_axis:
apply_along_axis    :    CPU:  532.148 us   +/-26.029 (min:  504.228 / max:  612.164) us     GPU-0:10424.643 us   +/- 8.711 (min:10407.936 / max:10451.968) us

On the contrary, if I create a with shape (16, 1000000) and take the reduction along axis 1 (contiguous), the computation is much faster and the built-in cupy.mean comes out twice as fast.

a = cp.random.randn(16, 1000000)
perf1 = repeat(cp.mean, (a, 1), n_warmup=20, n_repeat=20)
perf2 = repeat(cp.apply_along_axis, (cp.mean, 1, a), n_warmup=20, n_repeat=20)

print("Result for cp.mean:")
print(perf1)

print("Result for cp.apply_along_axis:")
print(perf2)

Result for cp.mean:
mean                :    CPU:   67.724 us   +/-23.307 (min:   51.729 / max:  147.081) us     GPU-0:  394.077 us   +/-19.734 (min:  380.032 / max:  465.088) us
Result for cp.apply_along_axis:
apply_along_axis    :    CPU:  793.639 us   +/-105.092 (min:  723.818 / max: 1096.841) us     GPU-0:  800.146 us   +/-105.610 (min:  730.112 / max: 1104.896) us

cupy/lib/shape_base.py

grlee77 · 2020-09-18T17:08:48Z

If the new scalar moveaxis function is not desired let me know and I will refactor to only include the minor Cython improvements to the existing moveaxis function.

This PR reduces the time for a call like cp.moveaxis(arr, 0, 1) from just over 3 us to around 1.8 us (or 1.5 us if source and destination axis are the same). I found four additional functions that use moveaxis with scalar axes only and may also benefit from this: linspace, pad, polynomial and linalg.product.

grlee77 · 2020-09-18T17:40:10Z

The _normalize_axis_index helper in _routines_manipulation is a cdef copy of the one defined in cupy/_util.pyx. I'm not sure if the one in _util.pyx should now just call the one from _routines_manipulation internally?

leofang · 2020-09-21T14:20:12Z

The _normalize_axis_index helper in _routines_manipulation is a cdef copy of the one defined in cupy/_util.pyx. I'm not sure if the one in _util.pyx should now just call the one from _routines_manipulation internally?

Just my two cents, you should wait for @asi1024's reply: The helper should be moved to cupy/core/internal.pyx.

cupy/lib/shape_base.py

grlee77 · 2020-09-21T16:48:44Z

Just my two cents, you should wait for @asi1024's reply: The helper should be moved to cupy/core/internal.pyx

That seems like a reasonable location. If we do that then the existing _normalize_axis_tuple should also probably be moved there.

The other style I see is in _routines.sorting.pyx, the functions _ndarray_sort, _ndarray_argsort, _ndarray_partition, _ndarray_argpartition all just use duplicate in-line code like the following:

cupy/cupy/core/_routines_sorting.pyx

Lines 34 to 37 in dd2fb09

    
           if axis < 0: 
        
               axis += ndim 
        
           if not (0 <= axis < ndim): 
        
               raise numpy.AxisError('Axis out of range')

I am not sure if that helps avoid any function call overhead or if those should also just call this utility.

A couple of other places in the core module (_fusion_trace.pyx and _routines_math.pyx), currently use the version from cupy/_util.pyx.

I think my preference would be to move to internal as you suggest and just use this version for everything within core.

asi1024 · 2020-10-01T08:47:45Z

cupy/lib/shape_base.py

+            'Cannot apply_along_axis when any iteration dimensions are 0'
+        )
+    # cupy.asarray needed in case func1d returns a scalar
+    res = cupy.asarray(func1d(inarr_view[ind0], *args, **kwargs))


If func1d returns a numpy.ndarray, it will be converted to cupy.ndarray in this line. However, ValueError will be raised in L64 only if the length of the reduction axis is greater than 1.
How about checking if the return value of func1d is scalar or cupy.ndarray value?

See if the change in 3f6ec11 matches what your were suggesting? Now if func1d returns a non-scalar numpy array, a value ValueError ("non-scalar numpy.ndarray cannot be used for fill") will occur when trying to assign the NumPy result to buff.

cupy/lib/shape_base.py

asi1024 · 2020-10-01T08:58:56Z

I agree to move _normalize_axis_{index, indices} to cupy/core/internal.pyx too! I will work on it after this PR is merged.

DOC: add apply_along_axis to the API docs

improve efficiency of _routines_manipulation.moveaxis by avoiding repeated acces of a.ndim TST: add tests for invalid scalar inputs to moveaxis

…_axes

asi1024 · 2020-10-02T04:49:24Z

Jenkins, test this please.

chainer-ci · 2020-10-02T05:25:07Z

Jenkins CI test (for commit 3f6ec11, target branch master) succeeded!

asi1024 · 2020-10-02T16:49:02Z

LGTM!

kmaehashi added cat:feature New features/APIs prio:medium labels Sep 18, 2020

kmaehashi assigned asi1024 Sep 18, 2020

asi1024 reviewed Sep 18, 2020

View reviewed changes

cupy/lib/shape_base.py Outdated Show resolved Hide resolved

cupy/lib/shape_base.py Outdated Show resolved Hide resolved

leofang reviewed Sep 21, 2020

View reviewed changes

cupy/lib/shape_base.py Outdated Show resolved Hide resolved

leofang reviewed Sep 21, 2020

View reviewed changes

cupy/lib/shape_base.py Outdated Show resolved Hide resolved

asi1024 added this to the v9.0.0a1 milestone Sep 28, 2020

asi1024 reviewed Oct 1, 2020

View reviewed changes

grlee77 added 7 commits October 1, 2020 15:37

ENH: add apply_along_axis

c512eb3

DOC: add apply_along_axis to the API docs

simplify using cupy.moveaxis

eb99be3

ENH: add faster moveaxis code path for integer case

f037063

improve efficiency of _routines_manipulation.moveaxis by avoiding repeated acces of a.ndim TST: add tests for invalid scalar inputs to moveaxis

remove unused import

99e1f4b

use cupy.empty for the buffer

f5a72ce

MAINT: simplify code permuting the axes for the output in apply_along…

3b2b63a

…_axes

only call cupy.asarray on scalar results

3f6ec11

grlee77 force-pushed the apply_along_axis branch from e968b55 to 3f6ec11 Compare October 1, 2020 19:57

asi1024 changed the title ~~Add apply_along_axis~~ Add apply_along_axis Oct 2, 2020

asi1024 merged commit f5417c4 into cupy:master Oct 2, 2020

asi1024 mentioned this pull request Oct 5, 2020

Move _normalize_axis_index to cupy/core/internal.pyx #4057

Merged

leofang mentioned this pull request Oct 22, 2020

apply_along_axis #2332

Closed

grlee77 deleted the apply_along_axis branch December 18, 2020 00:16

leofang mentioned this pull request Mar 8, 2021

Initializing CuPy array ~30x slower than NumPy #4767

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `apply_along_axis` #4008

Add `apply_along_axis` #4008

grlee77 commented Sep 17, 2020

grlee77 commented Sep 17, 2020

grlee77 commented Sep 18, 2020 •

edited

Loading

grlee77 commented Sep 18, 2020 •

edited

Loading

leofang commented Sep 21, 2020

grlee77 commented Sep 21, 2020

asi1024 Oct 1, 2020

grlee77 Oct 1, 2020

asi1024 commented Oct 1, 2020

asi1024 commented Oct 2, 2020

chainer-ci commented Oct 2, 2020

asi1024 commented Oct 2, 2020

Add apply_along_axis #4008

Add apply_along_axis #4008

Conversation

grlee77 commented Sep 17, 2020

grlee77 commented Sep 17, 2020

grlee77 commented Sep 18, 2020 • edited Loading

grlee77 commented Sep 18, 2020 • edited Loading

leofang commented Sep 21, 2020

grlee77 commented Sep 21, 2020

asi1024 Oct 1, 2020

Choose a reason for hiding this comment

grlee77 Oct 1, 2020

Choose a reason for hiding this comment

asi1024 commented Oct 1, 2020

asi1024 commented Oct 2, 2020

chainer-ci commented Oct 2, 2020

asi1024 commented Oct 2, 2020

Add `apply_along_axis` #4008

Add `apply_along_axis` #4008

grlee77 commented Sep 18, 2020 •

edited

Loading

grlee77 commented Sep 18, 2020 •

edited

Loading