- 
                Notifications
    You must be signed in to change notification settings 
- Fork 217
Description
Is this a duplicate?
- I confirmed there appear to be no duplicate issues for this bug and that I agree to the Code of Conduct
Type of Bug
Runtime Error
Component
cuda.core
Describe the bug
Hello!
There seems to be a bug in explicit shape/stride tuple construction. Program accessing the shape or stride repeatedly in a loop ends up with memory corruption. I think I tracked the problem to carray_int64_t_to_tuple utility, as replacing it with tuple(<list comprehension>) call, makes the problem go away.
Digging deeper, in the cpp code cython generates, it looks like cython decreases refcount of the newly created python int, even though the tuple is supposed to steal it (last line below).
    /* "cuda/core/experimental/_utils/cuda_utils.pxd":39
 *     result = cpython.PyTuple_New(length)
 *     for i in range(length):
 *         cpython.PyTuple_SET_ITEM(result, i, cpython.PyLong_FromLongLong(ptr[i]))             # <<<<<<<<<<<<<<
 *     return result
*/
    __pyx_t_1 = PyLong_FromLongLong((__pyx_v_ptr[__pyx_v_i])); if (unlikely(!__pyx_t_1)) __PYX_ERR(1, 39, __pyx_L1_error)
    __Pyx_GOTREF(__pyx_t_1);
    PyTuple_SET_ITEM(__pyx_v_result, __pyx_v_i, __pyx_t_1);
    __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;
If I add explicit incref, the problem seems to go away
cdef inline tuple carray_int64_t_to_tuple(int64_t *ptr, int length):
    # Construct shape and strides tuples using the Python/C API for speed
    result = cpython.PyTuple_New(length)
    cdef object item
    for i in range(length):
        item = cpython.PyLong_FromLongLong(ptr[i])
        cpython.Py_INCREF(item)
        cpython.PyTuple_SET_ITEM(result, i, item)
    return result
As the low-level Python references are tricky, I'd like someone to take a look and confirm the root-cause and proposed fix. Thank you!
How to Reproduce
Repro: the following code randomly segfaults
import cupy as cp
import cuda.core.experimental as ccx
d = ccx.Device()
d.set_current()
s = d.default_stream
a = cp.arange(11171, dtype=cp.uint8)
av = ccx.utils.StridedMemoryView(a, stream_ptr=s.handle)
for i in range(1000*1000):
    av.shape
Expected behavior
No memory corruption:)
Possibly, by explicit incref when creation the tuple.
Operating System
Ubuntu 22.04/Python 3.10.12
nvidia-smi output
No response