Skip to content

cuda-core 1.0 fails to import with cuda-bindings 13.0: cuDevSmResourceSplit not found #2063

@leofang

Description

@leofang

Bug description

cuda-core==1.0.0 fails to import when paired with cuda-bindings==13.0.x, despite the documented compatibility claim that cuda-core 1.0 works with cuda-bindings 13.0, 13.1, 13.2, etc.

Error

ImportError while loading conftest '/.../tests/conftest.py'.
    from cuda.core import Device
/.../cuda/core/__init__.py:71: in <module>
    from cuda.core import checkpoint, system, utils
/.../cuda/core/cu13/checkpoint.py:12: in <module>
    from cuda.core.typing import ProcessStateType as _ProcessStateType
/.../cuda/core/cu13/typing.py:13: in <module>
    from cuda.core._context import DeviceResourcesType
cuda/core/_context.pyx:1: in init cuda.core._context
cuda/core/_device_resources.pyx:1: in init cuda.core._device_resources
E   ImportError: cuda.bindings.cydriver does not export expected C function cuDevSmResourceSplit

Observed in: https://github.com/NVIDIA/numba-cuda-mlir/actions/runs/25706811043/job/75479358550?pr=23#step:18:155

Root cause

_device_resources.pyx uses Cython cimport to access cydriver.cuDevSmResourceSplit(...) inside an IF CUDA_CORE_BUILD_MAJOR >= 13: block (line 303). When Cython cimports a module, it validates at C-level module init (PyInit__device_resources) that every referenced cdef function exists in the target module's __pyx_capi__ dict — before any Python-level code can execute.

  • cuDevSmResourceSplit was introduced in CUDA 13.1. cuda-bindings 13.0 only has cuDevSmResourceSplitByCount.
  • The IF CUDA_CORE_BUILD_MAJOR >= 13 compile-time guard doesn't distinguish 13.0 from 13.1, so the cuDevSmResourceSplit reference is compiled into the cu13/ variant of the wheel.
  • cuda-core 1.0.0 was built against cuda-bindings ≥ 13.1 (which exports cuDevSmResourceSplit), so the compiled .so has a hard ABI dependency on that symbol.
  • The runtime version check in _can_use_structured_sm_split() (line 208–218) correctly guards against calling the function on older bindings, but it never gets a chance to run because the module fails to load at the Cython __Pyx_ImportFunction level.

This is a different layer from the PyCapsule-based indirection used in _resource_handles.pyx, which does Python-level __pyx_capi__ dict lookups with try/except and therefore handles missing functions gracefully.

Verified

$ pip download cuda-bindings==13.0.3 ...
# cydriver.pxd only declares cuDevSmResourceSplitByCount — no cuDevSmResourceSplit
# CU_DEV_SM_RESOURCE_GROUP_PARAMS and CUdevSmResourceGroup_flags also absent from .pxd
# (but struct/enum types don't cause __pyx_capi__ failures — only cdef functions do)

Suggested fix

Replace the direct cydriver.cuDevSmResourceSplit(...) call in _device_resources.pyx with a PyCapsule-based function pointer lookup, following the pattern established in _resource_handles.pyx:

  1. Add p_cuDevSmResourceSplit = _get_optional_driver_fn("cuDevSmResourceSplit") in _resource_handles.pyx
  2. Declare the function pointer in resource_handles.hpp / resource_handles.cpp
  3. Replace cydriver.cuDevSmResourceSplit(...) in _device_resources.pyx with a call through the function pointer
  4. The struct/enum types (CU_DEV_SM_RESOURCE_GROUP_PARAMS, CUdevSmResourceGroup_flags) can remain as direct cimports — they are resolved at compile time and do not trigger __pyx_capi__ lookups

Environment

  • cuda-core 1.0.0
  • cuda-bindings 13.0.3
  • Python 3.11, Linux aarch64

Metadata

Metadata

Assignees

Labels

bugSomething isn't workingcuda.coreEverything related to the cuda.core module

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions