- 
                Notifications
    You must be signed in to change notification settings 
- Fork 217
Fix #1186: Fix segmentation fault when accessing StridedMemoryView #1190
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | 
|---|---|---|
|  | @@ -3,6 +3,7 @@ | |
| # SPDX-License-Identifier: Apache-2.0 | ||
|  | ||
| cimport cpython | ||
| from cpython.object cimport PyObject | ||
| from libc.stdint cimport int64_t | ||
|  | ||
| from cuda.bindings cimport cydriver | ||
|  | @@ -32,9 +33,17 @@ cpdef int _check_nvrtc_error(error) except?-1 | |
| cpdef check_or_create_options(type cls, options, str options_description=*, bint keep_none=*) | ||
|  | ||
|  | ||
| # Create low-level externs so Cython won't "helpfully" handle reference counting | ||
| # for us. Prefixing with an underscore to distinguish it from the definition in | ||
| # cpython.long. | ||
| cdef extern from "Python.h": | ||
| PyObject *_PyLong_FromLongLong "PyLong_FromLongLong" (long long val) except NULL | ||
| void _PyTuple_SET_ITEM "PyTuple_SET_ITEM" (object p, Py_ssize_t pos, PyObject *o) | ||
|         
                  leofang marked this conversation as resolved.
              Show resolved
            Hide resolved There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It is wild that this is the way to handle the problem. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It could have been handled by adding an explicit incref to counteract the implicit decref that Cython generates, but this is the only way I could find to avoid the unnecessary nonsense. Cython and performance optimization are at odds. This fixes the problem at hand, but it caused me to look more closely at the reference counting the Cython generates, and there is a /lot/ of unnecessary work it does. I don't know how much of that the C compiler can see through and optimize away, but I doubt it's everything. Cython really needs its own reduction pass, probably. This will become more of a problem for free-threaded builds where reference counting is more expensive. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So what was the code that Cython generated without this  There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is the code it generated before: That DECREF is incorrect because  There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @mdboom I see. IIUC this will generate the equivalent code with less boilerplate and without unnecessary churns (untested): item = cpython.PyLong_FromLongLong(ptr[i])
cpython.Py_INCREF(item)
cpython.PyTuple_SET_ITEM(result, i, <cpython.PyObject*>(item))This is because in Cython the 3rd arg is typed as  There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah, I tried this.  The cast doesn't work because Cython still wants you to cast back into an  It also causes unnecessary reference count churn. (We incref the item only because Cython is implicitly and unnecessarily decref'ing it later). The C compiler generally can't see through and optimize away that kind of thing because it can't reason that the reference count won't ever go to zero and that freeing the object is always a no-op. As I look at this stuff more, I think a better approach is probably to drop all the way down to C for these performance-critical sections.  The whole  There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 
 Ah right. This makes me sad... | ||
|  | ||
|  | ||
| cdef inline tuple carray_int64_t_to_tuple(int64_t *ptr, int length): | ||
| # Construct shape and strides tuples using the Python/C API for speed | ||
| result = cpython.PyTuple_New(length) | ||
| cdef tuple result = cpython.PyTuple_New(length) | ||
| for i in range(length): | ||
| cpython.PyTuple_SET_ITEM(result, i, cpython.PyLong_FromLongLong(ptr[i])) | ||
| _PyTuple_SET_ITEM(result, i, _PyLong_FromLongLong(ptr[i])) | ||
| return result | ||
| Original file line number | Diff line number | Diff line change | 
|---|---|---|
| @@ -0,0 +1,29 @@ | ||
| .. SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved. | ||
| .. SPDX-License-Identifier: Apache-2.0 | ||
|  | ||
| .. currentmodule:: cuda.core.experimental | ||
|  | ||
| ``cuda.core`` 0.4.X Release Notes | ||
| ================================= | ||
|  | ||
|  | ||
| Highlights | ||
| ---------- | ||
|  | ||
|  | ||
| Breaking Changes | ||
| ---------------- | ||
|  | ||
|  | ||
| New features | ||
| ------------ | ||
|  | ||
|  | ||
| New examples | ||
| ------------ | ||
|  | ||
|  | ||
| Fixes and enhancements | ||
| ---------------------- | ||
|  | ||
| - Fixed a segmentation fault when accessing :class:`StridedMemoryView` ``shape`` and ``strides`` members. | 
Uh oh!
There was an error while loading. Please reload this page.