Motivation
cuda.core is intended to be a high-level Pythonic wrapper around lower-level bindings in cuda.bindings. In idiomatic Python, errors should be communicated via exceptions rather than requiring callers to inspect return values. This audit looked at all public functions and methods in cuda.core for places where the C convention of returning error/status codes leaks through — or more broadly, anywhere the caller must inspect the returned object for correctness rather than relying on exception flow.
Summary
The codebase is largely well-designed. The HANDLE_RETURN() macro and handle_return() function consistently convert CUDA error codes into Python exceptions across the vast majority of the API. However, there are several notable deviations.
Findings
1. Event.is_done — boolean derived from CUDA error code
_event.pyx: Converts CUDA_SUCCESS → True and CUDA_ERROR_NOT_READY → False. The caller must inspect the return value rather than relying on exception flow. This is a common idiom in async GPU APIs and is arguably reasonable for polling, but it is worth noting as a deliberate deviation from pure exception-based error handling.
2. Program.pch_status — string status code the caller must interpret
_program.pyx: Returns "created", "not_attempted", "failed", or None. The "failed" case is notable — PCH creation failure is reported as a string value rather than raised as an exception. The caller must know to check for "failed" and handle it. Internally, the helper _read_pch_status() also uses None as a sentinel for "heap exhausted, retry needed" (a classic C-style error pattern, though internal-only).
3. Linker.get_error_log() / get_info_log() — unchecked CUDA calls
_linker.pyx: These return diagnostic strings, but the underlying CUDA calls to nvJitLinkGetErrorLogSize / nvJitLinkGetErrorLog are not checked via HANDLE_RETURN — the results are used directly without error checking. If these calls fail, the failure is silently ignored.
4. _MP_deallocate silently swallows CUDA_ERROR_INVALID_CONTEXT
_memory_pool.pyx: The deallocation path explicitly suppresses CUDA_ERROR_INVALID_CONTEXT. The function is marked noexcept so it cannot raise, but this means a real error (e.g., deallocating after context destruction) is silently ignored. Callers have no way to know deallocation failed.
5. DeviceProperties._get_attribute() returns a default on CUDA_ERROR_INVALID_VALUE
_device.pyx: When querying device attributes, CUDA_ERROR_INVALID_VALUE (which often means "this attribute isn't supported on this GPU") is silently converted to a default value (typically 0) rather than raising. A caller reading device.properties.some_attribute could get 0 and not know whether the attribute is genuinely 0 or unsupported on their hardware.
6. Kernel._get_arguments_info() uses CUDA_ERROR_INVALID_VALUE as end-of-list sentinel
_module.pyx: Loops calling cuKernelGetParamInfo until it gets CUDA_ERROR_INVALID_VALUE, which it interprets as "no more parameters" rather than an error. This mirrors the C API convention. Any genuinely invalid-value error would also be silently consumed.
7. Device_resolve_device_id() returns 0 on CUDA_ERROR_INVALID_CONTEXT
_device.pyx: When no context exists, instead of raising, it defaults to device 0 (mimicking cudart behavior). This is an internal function but affects public API behavior — Device(None) silently falls back to device 0 rather than informing the caller there is no active context.
8. DMR_mempool_get_access() — returns magic strings instead of a typed enum
_device_memory_resource.pyx: Returns "rw", "r", or "". The empty string "" (meaning "no access") is a value the caller must check — attempting to use a buffer without access would only fail later at a less helpful point. A proper enum would make this more self-documenting and less error-prone.
Suggestions
Ranked roughly from most to least impactful:
Program.pch_status returning "failed" — consider raising an exception (or at least a warning) during compile() when PCH creation fails, rather than silently storing a status string the user must remember to check.
Linker.get_error_log() / get_info_log() — check the CUDA return values from the underlying log-retrieval calls via HANDLE_RETURN.
_MP_deallocate suppressing CUDA_ERROR_INVALID_CONTEXT — at minimum log a warning so failures are observable.
DeviceProperties returning 0 for unsupported attributes — consider raising AttributeError or returning a distinct sentinel so callers can distinguish "genuinely 0" from "not supported".
DMR_mempool_get_access — return a proper enum rather than magic strings.
Kernel._get_arguments_info() end-of-list sentinel — document or assert that CUDA_ERROR_INVALID_VALUE is only expected at the boundary, to avoid masking real errors.
Device_resolve_device_id() defaulting to device 0 — consider raising when there is no active context, rather than silently choosing a device.
Not flagged (correct patterns)
For completeness, these were reviewed and found to handle errors properly:
Graph.update() — raises CUDAError with diagnostic info on GRAPH_EXEC_UPDATE_FAILURE
GraphBuilder.complete() / _instantiate_graph() — raises RuntimeError with error reason
Event.__sub__() — handles error codes inline but always raises exceptions with contextual messages
- All
close() methods — delegate to C++ RAII handles; idempotent no-op behavior is standard
- All memory resource
allocate() / deallocate() public methods — consistently use HANDLE_RETURN or raise_if_driver_error()
- All
Stream, Device, Context public methods — consistently raise via HANDLE_RETURN
- All graph node factory methods — consistently raise via
HANDLE_RETURN
system subpackage functions — consistently raise ValueError / RuntimeError on failure
Motivation
cuda.core is intended to be a high-level Pythonic wrapper around lower-level bindings in cuda.bindings. In idiomatic Python, errors should be communicated via exceptions rather than requiring callers to inspect return values. This audit looked at all public functions and methods in cuda.core for places where the C convention of returning error/status codes leaks through — or more broadly, anywhere the caller must inspect the returned object for correctness rather than relying on exception flow.
Summary
The codebase is largely well-designed. The
HANDLE_RETURN()macro andhandle_return()function consistently convert CUDA error codes into Python exceptions across the vast majority of the API. However, there are several notable deviations.Findings
1.
Event.is_done— boolean derived from CUDA error code_event.pyx: ConvertsCUDA_SUCCESS→TrueandCUDA_ERROR_NOT_READY→False. The caller must inspect the return value rather than relying on exception flow. This is a common idiom in async GPU APIs and is arguably reasonable for polling, but it is worth noting as a deliberate deviation from pure exception-based error handling.2.
Program.pch_status— string status code the caller must interpret_program.pyx: Returns"created","not_attempted","failed", orNone. The"failed"case is notable — PCH creation failure is reported as a string value rather than raised as an exception. The caller must know to check for"failed"and handle it. Internally, the helper_read_pch_status()also usesNoneas a sentinel for "heap exhausted, retry needed" (a classic C-style error pattern, though internal-only).3.
Linker.get_error_log()/get_info_log()— unchecked CUDA calls_linker.pyx: These return diagnostic strings, but the underlying CUDA calls tonvJitLinkGetErrorLogSize/nvJitLinkGetErrorLogare not checked viaHANDLE_RETURN— the results are used directly without error checking. If these calls fail, the failure is silently ignored.4.
_MP_deallocatesilently swallowsCUDA_ERROR_INVALID_CONTEXT_memory_pool.pyx: The deallocation path explicitly suppressesCUDA_ERROR_INVALID_CONTEXT. The function is markednoexceptso it cannot raise, but this means a real error (e.g., deallocating after context destruction) is silently ignored. Callers have no way to know deallocation failed.5.
DeviceProperties._get_attribute()returns a default onCUDA_ERROR_INVALID_VALUE_device.pyx: When querying device attributes,CUDA_ERROR_INVALID_VALUE(which often means "this attribute isn't supported on this GPU") is silently converted to a default value (typically0) rather than raising. A caller readingdevice.properties.some_attributecould get0and not know whether the attribute is genuinely 0 or unsupported on their hardware.6.
Kernel._get_arguments_info()usesCUDA_ERROR_INVALID_VALUEas end-of-list sentinel_module.pyx: Loops callingcuKernelGetParamInfountil it getsCUDA_ERROR_INVALID_VALUE, which it interprets as "no more parameters" rather than an error. This mirrors the C API convention. Any genuinely invalid-value error would also be silently consumed.7.
Device_resolve_device_id()returns0onCUDA_ERROR_INVALID_CONTEXT_device.pyx: When no context exists, instead of raising, it defaults to device 0 (mimicking cudart behavior). This is an internal function but affects public API behavior —Device(None)silently falls back to device 0 rather than informing the caller there is no active context.8.
DMR_mempool_get_access()— returns magic strings instead of a typed enum_device_memory_resource.pyx: Returns"rw","r", or"". The empty string""(meaning "no access") is a value the caller must check — attempting to use a buffer without access would only fail later at a less helpful point. A proper enum would make this more self-documenting and less error-prone.Suggestions
Ranked roughly from most to least impactful:
Program.pch_statusreturning"failed"— consider raising an exception (or at least a warning) duringcompile()when PCH creation fails, rather than silently storing a status string the user must remember to check.Linker.get_error_log()/get_info_log()— check the CUDA return values from the underlying log-retrieval calls viaHANDLE_RETURN._MP_deallocatesuppressingCUDA_ERROR_INVALID_CONTEXT— at minimum log a warning so failures are observable.DevicePropertiesreturning0for unsupported attributes — consider raisingAttributeErroror returning a distinct sentinel so callers can distinguish "genuinely 0" from "not supported".DMR_mempool_get_access— return a proper enum rather than magic strings.Kernel._get_arguments_info()end-of-list sentinel — document or assert thatCUDA_ERROR_INVALID_VALUEis only expected at the boundary, to avoid masking real errors.Device_resolve_device_id()defaulting to device 0 — consider raising when there is no active context, rather than silently choosing a device.Not flagged (correct patterns)
For completeness, these were reviewed and found to handle errors properly:
Graph.update()— raisesCUDAErrorwith diagnostic info onGRAPH_EXEC_UPDATE_FAILUREGraphBuilder.complete()/_instantiate_graph()— raisesRuntimeErrorwith error reasonEvent.__sub__()— handles error codes inline but always raises exceptions with contextual messagesclose()methods — delegate to C++ RAII handles; idempotent no-op behavior is standardallocate()/deallocate()public methods — consistently useHANDLE_RETURNorraise_if_driver_error()Stream,Device,Contextpublic methods — consistently raise viaHANDLE_RETURNHANDLE_RETURNsystemsubpackage functions — consistently raiseValueError/RuntimeErroron failure