Bug description
cuda-core==1.0.0 fails to import when paired with cuda-bindings==13.0.x, despite the documented compatibility claim that cuda-core 1.0 works with cuda-bindings 13.0, 13.1, 13.2, etc.
Error
ImportError while loading conftest '/.../tests/conftest.py'.
from cuda.core import Device
/.../cuda/core/__init__.py:71: in <module>
from cuda.core import checkpoint, system, utils
/.../cuda/core/cu13/checkpoint.py:12: in <module>
from cuda.core.typing import ProcessStateType as _ProcessStateType
/.../cuda/core/cu13/typing.py:13: in <module>
from cuda.core._context import DeviceResourcesType
cuda/core/_context.pyx:1: in init cuda.core._context
cuda/core/_device_resources.pyx:1: in init cuda.core._device_resources
E ImportError: cuda.bindings.cydriver does not export expected C function cuDevSmResourceSplit
Observed in: https://github.com/NVIDIA/numba-cuda-mlir/actions/runs/25706811043/job/75479358550?pr=23#step:18:155
Root cause
_device_resources.pyx uses Cython cimport to access cydriver.cuDevSmResourceSplit(...) inside an IF CUDA_CORE_BUILD_MAJOR >= 13: block (line 303). When Cython cimports a module, it validates at C-level module init (PyInit__device_resources) that every referenced cdef function exists in the target module's __pyx_capi__ dict — before any Python-level code can execute.
cuDevSmResourceSplit was introduced in CUDA 13.1. cuda-bindings 13.0 only has cuDevSmResourceSplitByCount.
- The
IF CUDA_CORE_BUILD_MAJOR >= 13 compile-time guard doesn't distinguish 13.0 from 13.1, so the cuDevSmResourceSplit reference is compiled into the cu13/ variant of the wheel.
- cuda-core 1.0.0 was built against cuda-bindings ≥ 13.1 (which exports
cuDevSmResourceSplit), so the compiled .so has a hard ABI dependency on that symbol.
- The runtime version check in
_can_use_structured_sm_split() (line 208–218) correctly guards against calling the function on older bindings, but it never gets a chance to run because the module fails to load at the Cython __Pyx_ImportFunction level.
This is a different layer from the PyCapsule-based indirection used in _resource_handles.pyx, which does Python-level __pyx_capi__ dict lookups with try/except and therefore handles missing functions gracefully.
Verified
$ pip download cuda-bindings==13.0.3 ...
# cydriver.pxd only declares cuDevSmResourceSplitByCount — no cuDevSmResourceSplit
# CU_DEV_SM_RESOURCE_GROUP_PARAMS and CUdevSmResourceGroup_flags also absent from .pxd
# (but struct/enum types don't cause __pyx_capi__ failures — only cdef functions do)
Suggested fix
Replace the direct cydriver.cuDevSmResourceSplit(...) call in _device_resources.pyx with a PyCapsule-based function pointer lookup, following the pattern established in _resource_handles.pyx:
- Add
p_cuDevSmResourceSplit = _get_optional_driver_fn("cuDevSmResourceSplit") in _resource_handles.pyx
- Declare the function pointer in
resource_handles.hpp / resource_handles.cpp
- Replace
cydriver.cuDevSmResourceSplit(...) in _device_resources.pyx with a call through the function pointer
- The struct/enum types (
CU_DEV_SM_RESOURCE_GROUP_PARAMS, CUdevSmResourceGroup_flags) can remain as direct cimports — they are resolved at compile time and do not trigger __pyx_capi__ lookups
Environment
- cuda-core 1.0.0
- cuda-bindings 13.0.3
- Python 3.11, Linux aarch64
Bug description
cuda-core==1.0.0fails to import when paired withcuda-bindings==13.0.x, despite the documented compatibility claim that cuda-core 1.0 works with cuda-bindings 13.0, 13.1, 13.2, etc.Error
Observed in: https://github.com/NVIDIA/numba-cuda-mlir/actions/runs/25706811043/job/75479358550?pr=23#step:18:155
Root cause
_device_resources.pyxuses Cythoncimportto accesscydriver.cuDevSmResourceSplit(...)inside anIF CUDA_CORE_BUILD_MAJOR >= 13:block (line 303). When Cythoncimports a module, it validates at C-level module init (PyInit__device_resources) that every referencedcdeffunction exists in the target module's__pyx_capi__dict — before any Python-level code can execute.cuDevSmResourceSplitwas introduced in CUDA 13.1. cuda-bindings 13.0 only hascuDevSmResourceSplitByCount.IF CUDA_CORE_BUILD_MAJOR >= 13compile-time guard doesn't distinguish 13.0 from 13.1, so thecuDevSmResourceSplitreference is compiled into thecu13/variant of the wheel.cuDevSmResourceSplit), so the compiled.sohas a hard ABI dependency on that symbol._can_use_structured_sm_split()(line 208–218) correctly guards against calling the function on older bindings, but it never gets a chance to run because the module fails to load at the Cython__Pyx_ImportFunctionlevel.This is a different layer from the PyCapsule-based indirection used in
_resource_handles.pyx, which does Python-level__pyx_capi__dict lookups with try/except and therefore handles missing functions gracefully.Verified
Suggested fix
Replace the direct
cydriver.cuDevSmResourceSplit(...)call in_device_resources.pyxwith a PyCapsule-based function pointer lookup, following the pattern established in_resource_handles.pyx:p_cuDevSmResourceSplit = _get_optional_driver_fn("cuDevSmResourceSplit")in_resource_handles.pyxresource_handles.hpp/resource_handles.cppcydriver.cuDevSmResourceSplit(...)in_device_resources.pyxwith a call through the function pointerCU_DEV_SM_RESOURCE_GROUP_PARAMS,CUdevSmResourceGroup_flags) can remain as directcimports — they are resolved at compile time and do not trigger__pyx_capi__lookupsEnvironment