Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Newly created instances of cupy.ElementwiseKernel always synchronize #2469

Closed
niboshi opened this issue Sep 17, 2019 · 1 comment · Fixed by #2474
Closed

Newly created instances of cupy.ElementwiseKernel always synchronize #2469

niboshi opened this issue Sep 17, 2019 · 1 comment · Fixed by #2474

Comments

@niboshi
Copy link
Member

niboshi commented Sep 17, 2019

#725 moved the kernel cache from the global (@util.memoize) to each instance of ElementwiseKernel (ElementwiseKernel._kernel_memo). With this change, if ElementwiseKernel instance is created in each repeated call with the identical CUDA code, the loaded module is not reused and always loaded from disk, causing host-device synchronization at this line.

Example:

import cupy

h = cupy.ones([1000, 1000, 1000], 'float32')
a = cupy.array([1, 2], 'float32')
b = cupy.array([1, 2], 'float32')

for i in range(10):
    h.sum()  # dummy load on GPU

    # This call synchronizes
    cupy.ElementwiseKernel(
        'T a, T b',
        'T out',
        'out = a + b')(a, b, a)

print(a)
@niboshi
Copy link
Member Author

niboshi commented Sep 20, 2019

Several functions in CuPy are affected by this (like tri, percentile, random.shuffle, random.permutation and those indirectly using them).
I consider this a bug.

@niboshi niboshi added cat:bug Bugs and removed cat:enhancement Improvements to existing features labels Sep 20, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants