Move enum explanations and health checks from cuda_core to cuda_bindings by rwgk · Pull Request #1805 · NVIDIA/cuda-python

rwgk · 2026-03-23T05:21:46Z

The DRIVER_CU_RESULT_EXPLANATIONS and RUNTIME_CUDA_ERROR_EXPLANATIONS dicts are fundamentally tied to the cuda-bindings release (they must match the enums shipped in that release). Having them live exclusively in cuda_core meant the health-check tests failed whenever cuda_core was tested against a different version of cuda-bindings (nvbug 5932944).

Changes

Move the dicts to cuda_bindings/cuda/bindings/_utils/ as the single authoritative source (renamed to _EXPLANATIONS with a _CTK_MAJOR_MINOR_PATCH version tag).
Delete the copies from cuda_core. cuda_utils.pyx now imports directly from cuda.bindings._utils, with a ModuleNotFoundError fallback to an empty dict.
Move the exhaustive health-check tests to cuda_bindings/tests/test_enum_explanations.py, where they belong alongside the dicts they verify.

Impact on error messages for cuda-core users

When cuda-core raises a CUDAError, it tries to include a human-readable explanation of the error code (e.g. "This indicates that one or more of the parameters passed to the API call is not within an acceptable range of values").

With this change:

cuda-bindings >= this PR: Error messages continue to include explanations, exactly as before.
cuda-bindings < this PR (older releases that don't ship _utils): Error messages fall back to the driver/runtime error name and description string obtained from cuGetErrorString / cudaGetErrorString. The explanations are a nice-to-have supplement, and the error name + description are still informative. Upgrading to a current cuda-bindings release restores the full explanations.

…VIDIA#1712) The explanation dicts are fundamentally tied to the bindings version, so they belong in cuda_bindings. This copies them (keeping the cuda_core originals for backward compatibility) and adds the corresponding health tests under cuda_bindings/tests/. Made-with: Cursor

These tests now live in cuda_bindings/tests/test_enum_explanations.py, where they belong alongside the explanation dicts they verify. Made-with: Cursor

…llback (NVIDIA#1712) Each explanation module now tries to import the authoritative dict from cuda.bindings._utils (ModuleNotFoundError-guarded) and falls back to its own copy for older cuda-bindings that don't ship it yet. Smoke tests added for both dicts. Made-with: Cursor

NVIDIA#1712) Rename explanation dicts to _EXPLANATIONS / _FALLBACK_EXPLANATIONS, add _CTK_MAJOR_MINOR_PATCH to each module, and enforce that the cuda_core fallback copy is as new as (and in-sync with) cuda_bindings. Parametrize the smoke and version-check tests to cover both driver and runtime without duplication. Made-with: Cursor

…tring (NVIDIA#1712) Made-with: Cursor

copy-pr-bot · 2026-03-23T05:21:51Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

rwgk · 2026-03-23T05:23:20Z

/ok to test

github-actions · 2026-03-23T05:39:14Z

Doc Preview CI
🚀 View preview at https://nvidia.github.io/cuda-python/pr-preview/pr-1805/
https://nvidia.github.io/cuda-python/pr-preview/pr-1805/cuda-core/
https://nvidia.github.io/cuda-python/pr-preview/pr-1805/cuda-bindings/
https://nvidia.github.io/cuda-python/pr-preview/pr-1805/cuda-pathfinder/
Preview will be ready when the GitHub Pages deployment is complete.

rwgk · 2026-03-23T06:26:04Z

/ok to test

rwgk · 2026-03-24T19:20:40Z

For easy reference, the CI at commit fb12195 was successful:

https://github.com/NVIDIA/cuda-python/actions/runs/23444195523?pr=1805

(I'm about to push git merge master, which will hide it. Not rerunning the CI for now, waiting for a review.)

cpcloud · 2026-03-25T15:29:24Z

What's stopping us from moving this codegen into the code generator and re-exporting it here to avoid breaking stuff?

We can't continue to live with steps like "copy x manually". Let's just do the work to move it to the generator. It doesn't really make sense that we've got tools parsing C headers in Python and producing code from that, and yet we're still copying dictionaries by hand.

rparolin · 2026-03-25T21:33:03Z

cuda_core/cuda/core/_utils/runtime_cuda_error_explanations.py

-RUNTIME_CUDA_ERROR_EXPLANATIONS = {
+_FALLBACK_EXPLANATIONS = {
    0: (
        "The API call returned with no errors. In the case of query calls, this"


Do we have to duplicate this error text list? Can we hoist it into a central location?

Do we have to duplicate this error text list?

Originally my proposal was to avoid this copy (see the #1712 issue description, Backward compatibility section), but @leofang argued for vendoring (see issue comments).

Can we hoist it into a central location?

Not if we want future cuda-core releases to produce the enhanced error messages even if used in combination with cuda-binding releases made before this PR was merged.

On balance, I still feel the better compromise is to delete this copy, and to change cuda_core/cuda/core/_utils/cuda_utils.pyx to skip enhancing the error messages if the dict is not in cuda-bindings. It's really only a nice-to-have that will be easy to get back by using the latest cuda-bindings.

rparolin

Please remove the duplicated error text array.

rwgk · 2026-03-25T22:07:36Z

What's stopping us from moving this codegen into the code generator and re-exporting it here to avoid breaking stuff?

We can't continue to live with steps like "copy x manually". Let's just do the work to move it to the generator. It doesn't really make sense that we've got tools parsing C headers in Python and producing code from that, and yet we're still copying dictionaries by hand.

I totally agree, but this PR is about solving nvbug 5932944, which is related to but different from the code-gen question. I opened cuda-python-private issue 289 to track your suggestion.

…gs (NVIDIA#1712) Remove the vendored explanation dicts from cuda_core. cuda_utils.pyx now imports directly from cuda.bindings._utils with a ModuleNotFoundError fallback to an empty dict, so error messages gracefully degrade when paired with older cuda-bindings that don't ship the dicts. Made-with: Cursor

rwgk · 2026-03-26T00:13:17Z

Please remove the duplicated error text array.

Done, Cursor said this:

Committed as 6fc77b7. Net -966 lines -- much cleaner.

I converted this PR back to Draft mode while retesting.

rwgk · 2026-03-26T00:20:45Z

/ok to test

…#1712) Restore DRIVER_CU_RESULT_EXPLANATIONS / RUNTIME_CUDA_ERROR_EXPLANATIONS as the dict names in cuda_bindings and remove the _CTK_MAJOR_MINOR_PATCH / _EXPLANATIONS indirection that is no longer needed without the cuda_core fallback copies. Made-with: Cursor

rwgk · 2026-03-26T04:51:30Z

/ok to test

rwgk added 5 commits March 22, 2026 20:44

Remove enum explanation health tests from cuda_core (NVIDIA#1712)

f35670a

These tests now live in cuda_bindings/tests/test_enum_explanations.py, where they belong alongside the explanation dicts they verify. Made-with: Cursor

Clean up test code: parametrize bindings health tests, drop no-op f-s…

3f6c30f

…tring (NVIDIA#1712) Made-with: Cursor

rwgk self-assigned this Mar 23, 2026

rwgk added bug Something isn't working P0 High priority - Must do! cuda.bindings Everything related to the cuda.bindings module cuda.core Everything related to the cuda.core module labels Mar 23, 2026

Update pathfinder descriptor catalogs for cusparseLt release 0.9.0

b1645c0

rwgk marked this pull request as ready for review March 23, 2026 06:50

Merge branch 'main' into move_enum_explanations

fb12195

rwgk requested a review from leofang March 23, 2026 18:43

Merge branch 'main' into move_enum_explanations

47e9c2d

rparolin reviewed Mar 25, 2026

View reviewed changes

rparolin requested changes Mar 25, 2026

View reviewed changes

rwgk added 2 commits March 25, 2026 16:25

Merge branch 'main' into move_enum_explanations

89a2052

rwgk marked this pull request as draft March 26, 2026 00:10

Merge branch 'main' into move_enum_explanations

112fb41

Conversation

rwgk commented Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Impact on error messages for cuda-core users

Uh oh!

copy-pr-bot bot commented Mar 23, 2026

Uh oh!

rwgk commented Mar 23, 2026

Uh oh!

github-actions bot commented Mar 23, 2026

Preview will be ready when the GitHub Pages deployment is complete.

Uh oh!

rwgk commented Mar 23, 2026

Uh oh!

rwgk commented Mar 24, 2026

Uh oh!

cpcloud commented Mar 25, 2026

Uh oh!

rparolin Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

rwgk Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

rparolin left a comment

Choose a reason for hiding this comment

Uh oh!

rwgk commented Mar 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rwgk commented Mar 26, 2026

Uh oh!

rwgk commented Mar 26, 2026

Uh oh!

rwgk commented Mar 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

rwgk commented Mar 23, 2026 •

edited

Loading

rwgk commented Mar 25, 2026 •

edited

Loading