cuda.core: audit of C-style error-code return patterns in public API

## Motivation

cuda.core is intended to be a high-level Pythonic wrapper around lower-level bindings in cuda.bindings. In idiomatic Python, errors should be communicated via exceptions rather than requiring callers to inspect return values. This audit looked at all public functions and methods in cuda.core for places where the C convention of returning error/status codes leaks through — or more broadly, anywhere the caller must inspect the returned object for correctness rather than relying on exception flow.

## Summary

The codebase is largely well-designed. The `HANDLE_RETURN()` macro and `handle_return()` function consistently convert CUDA error codes into Python exceptions across the vast majority of the API. However, there are several notable deviations.

## Findings

### 1. `Event.is_done` — boolean derived from CUDA error code

`_event.pyx`: Converts `CUDA_SUCCESS` → `True` and `CUDA_ERROR_NOT_READY` → `False`. The caller must inspect the return value rather than relying on exception flow. This is a common idiom in async GPU APIs and is arguably reasonable for polling, but it is worth noting as a deliberate deviation from pure exception-based error handling.

### 2. `Program.pch_status` — string status code the caller must interpret

`_program.pyx`: Returns `"created"`, `"not_attempted"`, `"failed"`, or `None`. The `"failed"` case is notable — PCH creation failure is reported as a string value rather than raised as an exception. The caller must know to check for `"failed"` and handle it. Internally, the helper `_read_pch_status()` also uses `None` as a sentinel for "heap exhausted, retry needed" (a classic C-style error pattern, though internal-only).

### 3. `Linker.get_error_log()` / `get_info_log()` — unchecked CUDA calls

`_linker.pyx`: These return diagnostic strings, but the underlying CUDA calls to `nvJitLinkGetErrorLogSize` / `nvJitLinkGetErrorLog` are not checked via `HANDLE_RETURN` — the results are used directly without error checking. If these calls fail, the failure is silently ignored.

### 4. `_MP_deallocate` silently swallows `CUDA_ERROR_INVALID_CONTEXT`

`_memory_pool.pyx`: The deallocation path explicitly suppresses `CUDA_ERROR_INVALID_CONTEXT`. The function is marked `noexcept` so it cannot raise, but this means a real error (e.g., deallocating after context destruction) is silently ignored. Callers have no way to know deallocation failed.

### 5. `DeviceProperties._get_attribute()` returns a default on `CUDA_ERROR_INVALID_VALUE`

`_device.pyx`: When querying device attributes, `CUDA_ERROR_INVALID_VALUE` (which often means "this attribute isn't supported on this GPU") is silently converted to a default value (typically `0`) rather than raising. A caller reading `device.properties.some_attribute` could get `0` and not know whether the attribute is genuinely 0 or unsupported on their hardware.

### 6. `Kernel._get_arguments_info()` uses `CUDA_ERROR_INVALID_VALUE` as end-of-list sentinel

`_module.pyx`: Loops calling `cuKernelGetParamInfo` until it gets `CUDA_ERROR_INVALID_VALUE`, which it interprets as "no more parameters" rather than an error. This mirrors the C API convention. Any genuinely invalid-value error would also be silently consumed.

### 7. `Device_resolve_device_id()` returns `0` on `CUDA_ERROR_INVALID_CONTEXT`

`_device.pyx`: When no context exists, instead of raising, it defaults to device 0 (mimicking cudart behavior). This is an internal function but affects public API behavior — `Device(None)` silently falls back to device 0 rather than informing the caller there is no active context.

### 8. `DMR_mempool_get_access()` — returns magic strings instead of a typed enum

`_device_memory_resource.pyx`: Returns `"rw"`, `"r"`, or `""`. The empty string `""` (meaning "no access") is a value the caller must check — attempting to use a buffer without access would only fail later at a less helpful point. A proper enum would make this more self-documenting and less error-prone.

## Suggestions

Ranked roughly from most to least impactful:

1. **`Program.pch_status` returning `"failed"`** — consider raising an exception (or at least a warning) during `compile()` when PCH creation fails, rather than silently storing a status string the user must remember to check.
2. **`Linker.get_error_log()` / `get_info_log()`** — check the CUDA return values from the underlying log-retrieval calls via `HANDLE_RETURN`.
3. **`_MP_deallocate` suppressing `CUDA_ERROR_INVALID_CONTEXT`** — at minimum log a warning so failures are observable.
4. **`DeviceProperties` returning `0` for unsupported attributes** — consider raising `AttributeError` or returning a distinct sentinel so callers can distinguish "genuinely 0" from "not supported".
5. **`DMR_mempool_get_access`** — return a proper enum rather than magic strings.
6. **`Kernel._get_arguments_info()` end-of-list sentinel** — document or assert that `CUDA_ERROR_INVALID_VALUE` is only expected at the boundary, to avoid masking real errors.
7. **`Device_resolve_device_id()` defaulting to device 0** — consider raising when there is no active context, rather than silently choosing a device.

## Not flagged (correct patterns)

For completeness, these were reviewed and found to handle errors properly:

- `Graph.update()` — raises `CUDAError` with diagnostic info on `GRAPH_EXEC_UPDATE_FAILURE`
- `GraphBuilder.complete()` / `_instantiate_graph()` — raises `RuntimeError` with error reason
- `Event.__sub__()` — handles error codes inline but always raises exceptions with contextual messages
- All `close()` methods — delegate to C++ RAII handles; idempotent no-op behavior is standard
- All memory resource `allocate()` / `deallocate()` public methods — consistently use `HANDLE_RETURN` or `raise_if_driver_error()`
- All `Stream`, `Device`, `Context` public methods — consistently raise via `HANDLE_RETURN`
- All graph node factory methods — consistently raise via `HANDLE_RETURN`
- `system` subpackage functions — consistently raise `ValueError` / `RuntimeError` on failure


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cuda.core: audit of C-style error-code return patterns in public API #1951

Motivation

Summary

Findings

1. `Event.is_done` — boolean derived from CUDA error code

2. `Program.pch_status` — string status code the caller must interpret

3. `Linker.get_error_log()` / `get_info_log()` — unchecked CUDA calls

4. `_MP_deallocate` silently swallows `CUDA_ERROR_INVALID_CONTEXT`

5. `DeviceProperties._get_attribute()` returns a default on `CUDA_ERROR_INVALID_VALUE`

6. `Kernel._get_arguments_info()` uses `CUDA_ERROR_INVALID_VALUE` as end-of-list sentinel

7. `Device_resolve_device_id()` returns `0` on `CUDA_ERROR_INVALID_CONTEXT`

8. `DMR_mempool_get_access()` — returns magic strings instead of a typed enum

Suggestions

Not flagged (correct patterns)

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

cuda.core: audit of C-style error-code return patterns in public API #1951

Description

Motivation

Summary

Findings

1. Event.is_done — boolean derived from CUDA error code

2. Program.pch_status — string status code the caller must interpret

3. Linker.get_error_log() / get_info_log() — unchecked CUDA calls

4. _MP_deallocate silently swallows CUDA_ERROR_INVALID_CONTEXT

5. DeviceProperties._get_attribute() returns a default on CUDA_ERROR_INVALID_VALUE

6. Kernel._get_arguments_info() uses CUDA_ERROR_INVALID_VALUE as end-of-list sentinel

7. Device_resolve_device_id() returns 0 on CUDA_ERROR_INVALID_CONTEXT

8. DMR_mempool_get_access() — returns magic strings instead of a typed enum

Suggestions

Not flagged (correct patterns)

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

1. `Event.is_done` — boolean derived from CUDA error code

2. `Program.pch_status` — string status code the caller must interpret

3. `Linker.get_error_log()` / `get_info_log()` — unchecked CUDA calls

4. `_MP_deallocate` silently swallows `CUDA_ERROR_INVALID_CONTEXT`

5. `DeviceProperties._get_attribute()` returns a default on `CUDA_ERROR_INVALID_VALUE`

6. `Kernel._get_arguments_info()` uses `CUDA_ERROR_INVALID_VALUE` as end-of-list sentinel

7. `Device_resolve_device_id()` returns `0` on `CUDA_ERROR_INVALID_CONTEXT`

8. `DMR_mempool_get_access()` — returns magic strings instead of a typed enum