Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

max and min for float32 cupy array may be slow #2085

Closed
xu3kev opened this issue Mar 5, 2019 · 3 comments · Fixed by #5019
Closed

max and min for float32 cupy array may be slow #2085

xu3kev opened this issue Mar 5, 2019 · 3 comments · Fixed by #5019
Labels
cat:performance Performance in terms of speed or memory consumption pr-ongoing

Comments

@xu3kev
Copy link

xu3kev commented Mar 5, 2019

  • Conditions (you can just paste the output of python -c 'import cupy; cupy.show_config()')
    • CuPy version = commit 2146ce2
    • OS/Platform = Ubuntu 16.04/ V100
    • CUDA version = 10.0
  • Code to reproduce
import cupy as cp
from contextlib import contextmanager

@contextmanager
def sync_time(name):
    start = cp.cuda.Event()
    end = cp.cuda.Event()
    start.record()
    start.synchronize()
    yield
    end.record()
    end.synchronize()
    t = cp.cuda.get_elapsed_time(start,end)
    print("{} : {} ms".format(name,t))

x = cp.random.normal(size=((400, 32, 28, 28))).astype(cp.float32)

with sync_time("cupy"):
    for i in range(1000):
        x.max()

x = cp.asnumpy(x) #move to cpu

with sync_time("numpy"):
    for i in range(1000):
        x.max()
  • Results
cupy : 8457.2451171875 ms
numpy : 3005.8154296875 ms

In this case, Cupy is slower then Numpy.

@anaruse
Copy link
Contributor

anaruse commented Mar 5, 2019

"reduction" operations including "max" and "min" in cupy are currently implemented in rather general way and are not so optimized in terms of performance, as far as I know. You may be able to get better performance by using some reduction implementation in cuDNN, CUB or Thrust, though those are not used in cupy for now.

CuPy team: Is anyone already working on performance improvement of "reduction" operations? I'm considering to speed up cupy reductions with CUB. Is there any concerns on use of CUB in cupy?

@samrere
Copy link

samrere commented Apr 11, 2022

Hi, as of the latest version of cupy, the max function is still much slower than its numpy equivalence:

cupy : 8153.27001953125 ms
numpy : 2700.735595703125 ms

May I please get more information on what may be a workaround? Thanks

@samrere samrere mentioned this issue Apr 11, 2022
@asi1024
Copy link
Member

asi1024 commented Apr 11, 2022

#6549 resolved this issue.

cupy : 49.38854217529297 ms
numpy : 3057.1181640625 ms

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cat:performance Performance in terms of speed or memory consumption pr-ongoing
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants