Skip to content

[BUG]: Segmentation fault with StridedMemoryView.shape #1186

@stiepan

Description

@stiepan

Is this a duplicate?

Type of Bug

Runtime Error

Component

cuda.core

Describe the bug

Hello!

There seems to be a bug in explicit shape/stride tuple construction. Program accessing the shape or stride repeatedly in a loop ends up with memory corruption. I think I tracked the problem to carray_int64_t_to_tuple utility, as replacing it with tuple(<list comprehension>) call, makes the problem go away.

Digging deeper, in the cpp code cython generates, it looks like cython decreases refcount of the newly created python int, even though the tuple is supposed to steal it (last line below).

    /* "cuda/core/experimental/_utils/cuda_utils.pxd":39
 *     result = cpython.PyTuple_New(length)
 *     for i in range(length):
 *         cpython.PyTuple_SET_ITEM(result, i, cpython.PyLong_FromLongLong(ptr[i]))             # <<<<<<<<<<<<<<
 *     return result
*/
    __pyx_t_1 = PyLong_FromLongLong((__pyx_v_ptr[__pyx_v_i])); if (unlikely(!__pyx_t_1)) __PYX_ERR(1, 39, __pyx_L1_error)
    __Pyx_GOTREF(__pyx_t_1);
    PyTuple_SET_ITEM(__pyx_v_result, __pyx_v_i, __pyx_t_1);
    __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;

If I add explicit incref, the problem seems to go away

cdef inline tuple carray_int64_t_to_tuple(int64_t *ptr, int length):
    # Construct shape and strides tuples using the Python/C API for speed
    result = cpython.PyTuple_New(length)
    cdef object item
    for i in range(length):
        item = cpython.PyLong_FromLongLong(ptr[i])
        cpython.Py_INCREF(item)
        cpython.PyTuple_SET_ITEM(result, i, item)
    return result

As the low-level Python references are tricky, I'd like someone to take a look and confirm the root-cause and proposed fix. Thank you!

How to Reproduce

Repro: the following code randomly segfaults

import cupy as cp

import cuda.core.experimental as ccx
d = ccx.Device()
d.set_current()
s = d.default_stream
a = cp.arange(11171, dtype=cp.uint8)
av = ccx.utils.StridedMemoryView(a, stream_ptr=s.handle)
for i in range(1000*1000):
    av.shape

Expected behavior

No memory corruption:)

Possibly, by explicit incref when creation the tuple.

Operating System

Ubuntu 22.04/Python 3.10.12

nvidia-smi output

No response

Metadata

Metadata

Assignees

Labels

P0High priority - Must do!bugSomething isn't workingcuda.coreEverything related to the cuda.core moduletriageNeeds the team's attention

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions