`max` and `min` for float32 cupy array may be slow #2085

xu3kev · 2019-03-05T06:34:20Z

Conditions (you can just paste the output of python -c 'import cupy; cupy.show_config()')
- CuPy version = commit 2146ce2
- OS/Platform = Ubuntu 16.04/ V100
- CUDA version = 10.0
Code to reproduce

import cupy as cp
from contextlib import contextmanager

@contextmanager
def sync_time(name):
    start = cp.cuda.Event()
    end = cp.cuda.Event()
    start.record()
    start.synchronize()
    yield
    end.record()
    end.synchronize()
    t = cp.cuda.get_elapsed_time(start,end)
    print("{} : {} ms".format(name,t))

x = cp.random.normal(size=((400, 32, 28, 28))).astype(cp.float32)

with sync_time("cupy"):
    for i in range(1000):
        x.max()

x = cp.asnumpy(x) #move to cpu

with sync_time("numpy"):
    for i in range(1000):
        x.max()

Results

cupy : 8457.2451171875 ms
numpy : 3005.8154296875 ms

In this case, Cupy is slower then Numpy.

The text was updated successfully, but these errors were encountered:

anaruse · 2019-03-05T11:49:29Z

"reduction" operations including "max" and "min" in cupy are currently implemented in rather general way and are not so optimized in terms of performance, as far as I know. You may be able to get better performance by using some reduction implementation in cuDNN, CUB or Thrust, though those are not used in cupy for now.

CuPy team: Is anyone already working on performance improvement of "reduction" operations? I'm considering to speed up cupy reductions with CUB. Is there any concerns on use of CUB in cupy?

samrere · 2022-04-11T12:31:46Z

Hi, as of the latest version of cupy, the max function is still much slower than its numpy equivalence:

cupy : 8153.27001953125 ms
numpy : 2700.735595703125 ms

May I please get more information on what may be a workaround? Thanks

asi1024 · 2022-04-11T13:16:39Z

#6549 resolved this issue.

cupy : 49.38854217529297 ms
numpy : 3057.1181640625 ms

anaruse mentioned this issue Mar 7, 2019

Use CUB to speed up sum/min/max #2090

Merged

kmaehashi added cat:performance Performance in terms of speed or memory consumption pr-ongoing labels Mar 12, 2019

wonghang mentioned this issue Jul 8, 2019

Improve reduction performance by cudnnReduceTensor #2294

Closed

leofang mentioned this issue Sep 29, 2019

Fix bug in CUB + support complex numbers using CUB #2508

Closed

leofang mentioned this issue Apr 2, 2021

More documentation on the supported backends #5019

Merged

asi1024 closed this as completed in #5019 Apr 6, 2021

samrere mentioned this issue Apr 11, 2022

slow max/min #6645

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`max` and `min` for float32 cupy array may be slow #2085

`max` and `min` for float32 cupy array may be slow #2085

xu3kev commented Mar 5, 2019 •

edited

anaruse commented Mar 5, 2019

samrere commented Apr 11, 2022 •

edited

asi1024 commented Apr 11, 2022

max and min for float32 cupy array may be slow #2085

max and min for float32 cupy array may be slow #2085

Comments

xu3kev commented Mar 5, 2019 • edited

anaruse commented Mar 5, 2019

samrere commented Apr 11, 2022 • edited

asi1024 commented Apr 11, 2022

`max` and `min` for float32 cupy array may be slow #2085

`max` and `min` for float32 cupy array may be slow #2085

xu3kev commented Mar 5, 2019 •

edited

samrere commented Apr 11, 2022 •

edited