You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
#725 moved the kernel cache from the global (@util.memoize) to each instance of ElementwiseKernel (ElementwiseKernel._kernel_memo). With this change, if ElementwiseKernel instance is created in each repeated call with the identical CUDA code, the loaded module is not reused and always loaded from disk, causing host-device synchronization at this line.
Example:
importcupyh=cupy.ones([1000, 1000, 1000], 'float32')
a=cupy.array([1, 2], 'float32')
b=cupy.array([1, 2], 'float32')
foriinrange(10):
h.sum() # dummy load on GPU# This call synchronizescupy.ElementwiseKernel(
'T a, T b',
'T out',
'out = a + b')(a, b, a)
print(a)
The text was updated successfully, but these errors were encountered:
Several functions in CuPy are affected by this (like tri, percentile, random.shuffle, random.permutation and those indirectly using them).
I consider this a bug.
#725 moved the kernel cache from the global (
@util.memoize
) to each instance ofElementwiseKernel
(ElementwiseKernel._kernel_memo
). With this change, ifElementwiseKernel
instance is created in each repeated call with the identical CUDA code, the loaded module is not reused and always loaded from disk, causing host-device synchronization at this line.Example:
The text was updated successfully, but these errors were encountered: