Cythonize `cuda.core` #1041

leofang · 2025-09-28T23:00:58Z

Description

closes #866

related to #846.

This PR implements the infrastructure so that all cuda.core modules can be properly cythonized and that accessing the cuda.bindings Cython layer (from cuda.bindings.cyXXXXX cimport ...) is safe. For local development, nothing is changed except for needing to specify CUDA_PATH, same as when building cuda.bindings.

The changes involve:

Build system: We implement a custom build backend that defers cythonization, so that we can detect the CUDA version and install the correct cuda-bindings version to the build-isolation environment, prior to firing up the Cython compiler.
Internal implementation: Many calls to Python bindings of driver APIs are now lowered to their Cython counterparts, though the change is not made exhaustively and can be followed up in future PRs. The selection of the cu12/cu13 extension modules is done at the import time.
CI: We build cuda.core wheels against two CUDA major versions, and use a script to merge the payloads. Only the final wheel is stored and used in the test stage (and can be released as before using the release workflow); the cu12/cu13 wheels are thrown away after the merge.
- PIP_FIND_LINKS is the key to allow building against the local, unreleased cuda.bindings wheels

Checklist

New or existing tests cover these changes.
The documentation is up to date with these changes.

copy-pr-bot · 2025-09-28T23:01:02Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

leofang · 2025-09-28T23:01:42Z

/ok to test 277a5b7

leofang · 2025-09-28T23:10:50Z

/ok to test 0b945cb

leofang · 2025-09-28T23:26:59Z

/ok to test 8a636e5

leofang · 2025-09-28T23:45:13Z

/ok to test fbe9325

leofang · 2025-09-29T00:03:23Z

/ok to test d3c7877

leofang · 2025-09-29T00:11:51Z

/ok to test a59e7c0

leofang · 2025-09-29T00:24:29Z

/ok to test 37fe7eb

leofang · 2025-10-01T23:24:42Z

/ok to test 473f8f2

kkraus14 · 2025-10-02T00:56:51Z

@leofang given the scope of this change, I think we should consider punting this until after the 0.4.0 release. That way we can kick the tires a bit on it in development before releasing it outwards?

leofang · 2025-10-02T01:03:00Z

What would be the concern(s) that we must address before merging? I'd rather punt on pushing out 0.4.0 by a few more days instead of punting on this PR. It's an arbitrary timeline that we set to ourselves, and it seems even without considering this PR we have other P0 that could potentially also be slipping.

leofang · 2025-10-02T01:20:48Z

There is yet another merge conflict... We need to enforce the practice that only P0 PRs can be merged on the week of release 🤬

leofang · 2025-10-02T01:25:17Z

I'll review/merge #1020 first, and then solve the merge conflicts at once.

kkraus14 · 2025-10-02T01:38:01Z

What would be the concern(s) that we must address before merging? I'd rather punt on pushing out 0.4.0 by a few more days instead of punting on this PR. It's an arbitrary timeline that we set to ourselves, and it seems even without considering this PR we have other P0 that could potentially also be slipping.

If you feel confident we can always roll it back if it turns out to be an issue, but that sounds like a potentially painful 0.4.1.

The biggest concern that I have is that we now have two different wheel builds and only the combined version is tested in CI. Local users building from source will be using a package that isn't tested in CI.

cuda_core/cuda/core/experimental/__init__.py

leofang · 2025-10-02T02:53:41Z

The biggest concern that I have is that we now have two different wheel builds and only the combined version is tested in CI. Local users building from source will be using a package that isn't tested in CI.

Ah! I guess the approach of the current PR is based on the implicit assumption that I've been sticking to... that we optimize over end user experience over developer experience 😅

I think we will find issues during daily development if it does not work, because the expectation is that locally one would only build against a single CUDA version. But if adding an CI can help ensuring this, I can add a separate CI pipeline in the next PR? (It'd have to be a standalone pipeline, though, no participation in the build/test matrices.)

kkraus14 · 2025-10-02T04:52:09Z

Ah! I guess the approach of the current PR is based on the implicit assumption that I've been sticking to... that we optimize over end user experience over developer experience 😅

I think we will find issues during daily development if it does not work, because the expectation is that locally one would only build against a single CUDA version. But if adding an CI can help ensuring this, I can add a separate CI pipeline in the next PR? (It'd have to be a standalone pipeline, though, no participation in the build/test matrices.)

Of course we optimize for the end user, but if we accidentally break being able to build locally because its uncovered in CI that will inevitably impact the quality of software we deliver.

Adding a separate CI pipeline sounds reasonable and doing it in a follow up is fine. Ideally we cover both CUDA 12.x and 13.x on both Windows and Linux?

rwgk

A couple minor suggestions. (This isn't a real review, I only looked through top to bottom for general background.)

rwgk · 2025-10-02T19:05:39Z

cuda_core/cuda/core/experimental/__init__.py

-from cuda.core.experimental._graph import (
+try:
+    import cuda.bindings
+except ImportError:


Could ModuleNotFoundError be better here (less likely to mask bugs)?

(Also below, current line 21)

Since we've been catching ImportError just fine and the linter does not complain, I suppose it is OK.

rwgk · 2025-10-02T19:09:15Z

cuda_core/cuda/core/experimental/__init__.py

+    # Import all symbols from the module
+    globals().update(versioned_mod.__dict__)


Could this be moved into the else: branch a few lines down?

Yes we can, but my slight preference is to keep everything in the try block to make the intention clear, and use the else block to only do cleanup (to keep the main namespace tidy).

leofang · 2025-10-03T02:50:45Z

/ok to test 4a37f97

github-actions · 2025-10-03T03:39:33Z

Doc Preview CI
Preview removed because the pull request was closed or merged.

leofang added 8 commits September 27, 2025 04:51

set up build system for targeting different cuda-bindings major verions

7557967

defer cythonization until cuda-bindings is installed

1976597

cythonize stream module

67db25e

nit: move dlpack.h to the include dir

07df441

purge cu11

6be8e7d

check in a working merger script

021e0f3

support loading from the versioned module if any exists

19020b2

fix linter errors

e51f910

leofang self-assigned this Sep 28, 2025

leofang added enhancement Any code-related improvements P0 High priority - Must do! CI/CD CI/CD infrastructure cuda.core Everything related to the cuda.core module labels Sep 28, 2025

leofang added this to the cuda.core beta 7 milestone Sep 28, 2025

leofang force-pushed the merge_wheel branch from 277a5b7 to 0b945cb Compare September 28, 2025 23:10

leofang force-pushed the merge_wheel branch from 0b945cb to 8a636e5 Compare September 28, 2025 23:26

leofang force-pushed the merge_wheel branch from fbe9325 to d3c7877 Compare September 29, 2025 00:02

leofang force-pushed the merge_wheel branch from d3c7877 to a59e7c0 Compare September 29, 2025 00:11

leofang force-pushed the merge_wheel branch from a59e7c0 to 37fe7eb Compare September 29, 2025 00:24

leofang added 2 commits September 29, 2025 00:34

set up double-build CI workflow

61617cf

ensure CUDA_PATH is honored by the build backend

9e799e4

leofang force-pushed the merge_wheel branch from 37fe7eb to 9e799e4 Compare September 29, 2025 00:34

leofang mentioned this pull request Oct 1, 2025

CI: Add a spellchecker to pre-commit #1064

Open

kkraus14 previously approved these changes Oct 2, 2025

View reviewed changes

cuda_core/cuda/core/experimental/__init__.py Outdated Show resolved Hide resolved

switch to use FutureWarning

d79e317

leofang dismissed kkraus14’s stale review via d79e317 October 2, 2025 14:14

kkraus14 previously approved these changes Oct 2, 2025

View reviewed changes

rwgk reviewed Oct 2, 2025

View reviewed changes

This was referenced Oct 2, 2025

Support Python 3.14 & Drop 3.9 #846

Open

Cythonize cuda.core more #1070

Merged

cuda-core: Release GIL when calling cimport'd CUDA APIs #1065

Closed

CI: Create a mini pipeline to test developer UX (build + test) #1073

Open

Merge branch 'main' into merge_wheel

4a37f97

leofang dismissed kkraus14’s stale review via 4a37f97 October 3, 2025 02:49

leofang requested a review from kkraus14 October 3, 2025 03:02

leofang mentioned this pull request Oct 3, 2025

[DONT MERGE] Make stream creation faster #677

Closed

kkraus14 approved these changes Oct 3, 2025

View reviewed changes

leofang enabled auto-merge (squash) October 3, 2025 03:14

leofang merged commit 279c943 into NVIDIA:main Oct 3, 2025
71 checks passed

leofang deleted the merge_wheel branch October 3, 2025 03:27

This was referenced Oct 6, 2025

cuda-core editable install is broken #1088

Closed

Fix editable installation for cuda-core #1091

Merged

		# Import all symbols from the module
		globals().update(versioned_mod.__dict__)

Cythonize cuda.core #1041

Cythonize cuda.core #1041

Uh oh!

Conversation

leofang commented Sep 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Checklist

Uh oh!

copy-pr-bot bot commented Sep 28, 2025

Uh oh!

leofang commented Sep 28, 2025

Uh oh!

leofang commented Sep 28, 2025

Uh oh!

leofang commented Sep 28, 2025

Uh oh!

leofang commented Sep 28, 2025

Uh oh!

leofang commented Sep 29, 2025

Uh oh!

leofang commented Sep 29, 2025

Uh oh!

leofang commented Sep 29, 2025

Uh oh!

leofang commented Oct 1, 2025

Uh oh!

kkraus14 commented Oct 2, 2025

Uh oh!

leofang commented Oct 2, 2025

Uh oh!

leofang commented Oct 2, 2025

Uh oh!

leofang commented Oct 2, 2025

Uh oh!

kkraus14 commented Oct 2, 2025

Uh oh!

Uh oh!

leofang commented Oct 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kkraus14 commented Oct 2, 2025

Uh oh!

rwgk left a comment

Choose a reason for hiding this comment

Uh oh!

rwgk Oct 2, 2025

Choose a reason for hiding this comment

Uh oh!

leofang Oct 3, 2025

Choose a reason for hiding this comment

Uh oh!

rwgk Oct 2, 2025

Choose a reason for hiding this comment

Uh oh!

leofang Oct 3, 2025

Choose a reason for hiding this comment

Uh oh!

leofang commented Oct 3, 2025

Uh oh!

Uh oh!

github-actions bot commented Oct 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

Cythonize `cuda.core` #1041

Cythonize `cuda.core` #1041

leofang commented Sep 28, 2025 •

edited

Loading

leofang commented Oct 2, 2025 •

edited

Loading