Skip to content

Conversation

leofang
Copy link
Member

@leofang leofang commented Sep 28, 2025

Description

closes #866

related to #846.

This PR implements the infrastructure so that all cuda.core modules can be properly cythonized and that accessing the cuda.bindings Cython layer (from cuda.bindings.cyXXXXX cimport ...) is safe. For local development, nothing is changed except for needing to specify CUDA_PATH, same as when building cuda.bindings.

The changes involve:

  • Build system: We implement a custom build backend that defers cythonization, so that we can detect the CUDA version and install the correct cuda-bindings version to the build-isolation environment, prior to firing up the Cython compiler.
  • Internal implementation: Many calls to Python bindings of driver APIs are now lowered to their Cython counterparts, though the change is not made exhaustively and can be followed up in future PRs. The selection of the cu12/cu13 extension modules is done at the import time.
  • CI: We build cuda.core wheels against two CUDA major versions, and use a script to merge the payloads. Only the final wheel is stored and used in the test stage (and can be released as before using the release workflow); the cu12/cu13 wheels are thrown away after the merge.
    • PIP_FIND_LINKS is the key to allow building against the local, unreleased cuda.bindings wheels

Checklist

  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

Copy link
Contributor

copy-pr-bot bot commented Sep 28, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@leofang leofang self-assigned this Sep 28, 2025
@leofang leofang added enhancement Any code-related improvements P0 High priority - Must do! CI/CD CI/CD infrastructure cuda.core Everything related to the cuda.core module labels Sep 28, 2025
@leofang leofang added this to the cuda.core beta 7 milestone Sep 28, 2025
@leofang
Copy link
Member Author

leofang commented Sep 28, 2025

/ok to test 277a5b7

@leofang
Copy link
Member Author

leofang commented Sep 28, 2025

/ok to test 0b945cb

@leofang
Copy link
Member Author

leofang commented Sep 28, 2025

/ok to test 8a636e5

@leofang
Copy link
Member Author

leofang commented Sep 28, 2025

/ok to test fbe9325

@leofang
Copy link
Member Author

leofang commented Sep 29, 2025

/ok to test d3c7877

@leofang
Copy link
Member Author

leofang commented Sep 29, 2025

/ok to test a59e7c0

@leofang
Copy link
Member Author

leofang commented Sep 29, 2025

/ok to test 37fe7eb

@leofang
Copy link
Member Author

leofang commented Oct 1, 2025

/ok to test 473f8f2

@kkraus14
Copy link
Collaborator

kkraus14 commented Oct 2, 2025

@leofang given the scope of this change, I think we should consider punting this until after the 0.4.0 release. That way we can kick the tires a bit on it in development before releasing it outwards?

@leofang
Copy link
Member Author

leofang commented Oct 2, 2025

What would be the concern(s) that we must address before merging? I'd rather punt on pushing out 0.4.0 by a few more days instead of punting on this PR. It's an arbitrary timeline that we set to ourselves, and it seems even without considering this PR we have other P0 that could potentially also be slipping.

@leofang
Copy link
Member Author

leofang commented Oct 2, 2025

There is yet another merge conflict... We need to enforce the practice that only P0 PRs can be merged on the week of release 🤬

@leofang
Copy link
Member Author

leofang commented Oct 2, 2025

I'll review/merge #1020 first, and then solve the merge conflicts at once.

@kkraus14
Copy link
Collaborator

kkraus14 commented Oct 2, 2025

What would be the concern(s) that we must address before merging? I'd rather punt on pushing out 0.4.0 by a few more days instead of punting on this PR. It's an arbitrary timeline that we set to ourselves, and it seems even without considering this PR we have other P0 that could potentially also be slipping.

If you feel confident we can always roll it back if it turns out to be an issue, but that sounds like a potentially painful 0.4.1.

The biggest concern that I have is that we now have two different wheel builds and only the combined version is tested in CI. Local users building from source will be using a package that isn't tested in CI.

kkraus14
kkraus14 previously approved these changes Oct 2, 2025
@leofang
Copy link
Member Author

leofang commented Oct 2, 2025

The biggest concern that I have is that we now have two different wheel builds and only the combined version is tested in CI. Local users building from source will be using a package that isn't tested in CI.

Ah! I guess the approach of the current PR is based on the implicit assumption that I've been sticking to... that we optimize over end user experience over developer experience 😅

I think we will find issues during daily development if it does not work, because the expectation is that locally one would only build against a single CUDA version. But if adding an CI can help ensuring this, I can add a separate CI pipeline in the next PR? (It'd have to be a standalone pipeline, though, no participation in the build/test matrices.)

@kkraus14
Copy link
Collaborator

kkraus14 commented Oct 2, 2025

Ah! I guess the approach of the current PR is based on the implicit assumption that I've been sticking to... that we optimize over end user experience over developer experience 😅

I think we will find issues during daily development if it does not work, because the expectation is that locally one would only build against a single CUDA version. But if adding an CI can help ensuring this, I can add a separate CI pipeline in the next PR? (It'd have to be a standalone pipeline, though, no participation in the build/test matrices.)

Of course we optimize for the end user, but if we accidentally break being able to build locally because its uncovered in CI that will inevitably impact the quality of software we deliver.

Adding a separate CI pipeline sounds reasonable and doing it in a follow up is fine. Ideally we cover both CUDA 12.x and 13.x on both Windows and Linux?

kkraus14
kkraus14 previously approved these changes Oct 2, 2025
Copy link
Collaborator

@rwgk rwgk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple minor suggestions. (This isn't a real review, I only looked through top to bottom for general background.)

from cuda.core.experimental._graph import (
try:
import cuda.bindings
except ImportError:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could ModuleNotFoundError be better here (less likely to mask bugs)?

(Also below, current line 21)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we've been catching ImportError just fine and the linter does not complain, I suppose it is OK.

Comment on lines +19 to +20
# Import all symbols from the module
globals().update(versioned_mod.__dict__)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this be moved into the else: branch a few lines down?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes we can, but my slight preference is to keep everything in the try block to make the intention clear, and use the else block to only do cleanup (to keep the main namespace tidy).

@leofang
Copy link
Member Author

leofang commented Oct 3, 2025

/ok to test 4a37f97

@leofang leofang enabled auto-merge (squash) October 3, 2025 03:14
@leofang leofang merged commit 279c943 into NVIDIA:main Oct 3, 2025
71 checks passed
@leofang leofang deleted the merge_wheel branch October 3, 2025 03:27
Copy link

github-actions bot commented Oct 3, 2025

Doc Preview CI
Preview removed because the pull request was closed or merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CI/CD CI/CD infrastructure cuda.core Everything related to the cuda.core module enhancement Any code-related improvements P0 High priority - Must do!

Projects

None yet

Development

Successfully merging this pull request may close these issues.

RFC: Cythonize cuda.core while keeping it CUDA-agnostic

8 participants