Skip to content

Conversation

kkraus14
Copy link
Collaborator

@kkraus14 kkraus14 commented Sep 9, 2025

Description

closes #820

TODO:

  • Rerun codegen with 12.9.1
  • Update driver / runtime result explanations
  • Add release notes

Checklist

  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

Copy link
Contributor

copy-pr-bot bot commented Sep 9, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@kkraus14 kkraus14 changed the base branch from main to 12.9.x September 9, 2025 21:05
@leofang
Copy link
Member

leofang commented Sep 9, 2025

ah.. this reminds me, we haven't backported #889 to 12.9.x, should we do it before this PR?

@kkraus14
Copy link
Collaborator Author

kkraus14 commented Sep 9, 2025

ah.. this reminds me, we haven't backported #889 to 12.9.x, should we do it before this PR?

100%. I'll do that as part of this PR.

@leofang
Copy link
Member

leofang commented Sep 9, 2025

I wonder if the backport bot can handle it

@leofang leofang added enhancement Any code-related improvements P1 Medium priority - Should do cuda.bindings Everything related to the cuda.bindings module labels Sep 10, 2025
@leofang leofang added this to the cuda-python 13-next, 12-next milestone Sep 10, 2025
@leofang
Copy link
Member

leofang commented Sep 10, 2025

I wonder if the backport bot can handle it

Looks like not... #889 (comment)

Let me know if I (or Copilot) should handle this for you.

@kkraus14
Copy link
Collaborator Author

I wonder if the backport bot can handle it

Looks like not... #889 (comment)

Let me know if I (or Copilot) should handle this for you.

I'll take care of it, thanks!

…ith hard-coded values, optimized checkout, and prepared Windows self-hosted runner migration (NVIDIA#889)

* Initial plan

* Consolidate test matrices from workflows into ci/test-matrix.json

Co-authored-by: leofang <5534781+leofang@users.noreply.github.com>

* Hard-code all GPU and ARCH values in test-matrix.json with 6 fields per entry

Co-authored-by: leofang <5534781+leofang@users.noreply.github.com>

* Update Windows test matrix with a100 GPU and latest-1 driver, configure self-hosted runners

Co-authored-by: leofang <5534781+leofang@users.noreply.github.com>

* fix

* Revert eed0b71 and change Windows DRIVER from latest-1 to latest

Co-authored-by: leofang <5534781+leofang@users.noreply.github.com>

* Add proxy cache setup to Windows workflow for self-hosted runners

Co-authored-by: leofang <5534781+leofang@users.noreply.github.com>

* Remove Git for Windows and gh CLI installation steps, add T4 GPU support to Windows matrix

Co-authored-by: leofang <5534781+leofang@users.noreply.github.com>

* Set fetch-depth: 1 for checkout steps and favor L4/T4 over A100 GPUs for Windows testing

Co-authored-by: leofang <5534781+leofang@users.noreply.github.com>

* Revert Windows workflow to GitHub-hosted runners with TODO comments for future self-hosted migration

Co-authored-by: leofang <5534781+leofang@users.noreply.github.com>

* [pre-commit.ci] auto code formatting

* Revert Win runner name change for now

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: leofang <5534781+leofang@users.noreply.github.com>
Co-authored-by: Leo Fang <leof@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Copy link
Contributor

copy-pr-bot bot commented Sep 10, 2025

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@kkraus14
Copy link
Collaborator Author

/ok to test

@kkraus14
Copy link
Collaborator Author

/ok to test

@kkraus14 kkraus14 marked this pull request as ready for review September 10, 2025 03:28
Copy link
Contributor

copy-pr-bot bot commented Sep 10, 2025

Auto-sync is disabled for ready for review pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@cryos
Copy link
Collaborator

cryos commented Sep 10, 2025

/ok to test

@cryos
Copy link
Collaborator

cryos commented Sep 10, 2025

/ok to test

build-type: pull-request
host-platform: ${{ matrix.host-platform }}
build-ctk-ver: ${{ needs.ci-vars.outputs.CUDA_BUILD_VER }}
matrix_filter: "select(([.CUDA_VER // empty | split(\".\")[] | tonumber] as $v | ($v[0] < 13)))"
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: can we programmatically derive 13 here and so we can have this unified across main and 12.9.x and allow it to nicely carry over to the future where we have main and 13.x.y?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You want to derive the CUDA major version from the branch name? I see the motivation, just thinking through what we want to key from programatically.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or from the version json file since it would always be aligned to the major version we want to use? It's not a big deal though where if we want to punt to a future PR that's totally fine.

Copy link
Collaborator

@cryos cryos Sep 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Third time was the charm 🎉 I can look at it for a follow up, it shouldn't be much, but it would be good to get this merged now. Lots of meetings today sadly...

@cryos
Copy link
Collaborator

cryos commented Sep 10, 2025

/ok to test

@cryos
Copy link
Collaborator

cryos commented Sep 10, 2025

/ok to test

cryos
cryos previously approved these changes Sep 10, 2025
Copy link
Collaborator

@cryos cryos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CI changes look good to me

@kkraus14
Copy link
Collaborator Author

@cryos I think we need to backport using the self hosted Windows runners as well. Want to do that here or a separate PR?

@cryos
Copy link
Collaborator

cryos commented Sep 10, 2025

@cryos I think we need to backport using the self hosted Windows runners as well. Want to do that here or a separate PR?

I was about to ask - I think we should, but I was going to do it in a follow up. Happy to do it here if preferred (it is a pretty small patch).

@kkraus14
Copy link
Collaborator Author

@cryos I think we need to backport using the self hosted Windows runners as well. Want to do that here or a separate PR?

I was about to ask - I think we should, but I was going to do it in a follow up. Happy to do it here if preferred (it is a pretty small patch).

If you don't mind lets do it here. Thanks!

Migrate the Windows testing to use the new NV GHA runners.
Cherry-pick NVIDIA#958.
@cryos
Copy link
Collaborator

cryos commented Sep 10, 2025

/ok to test

@kkraus14
Copy link
Collaborator Author

LGTM, thanks @cryos. There's one open question related to cufile that I'd like to hear from @rwgk or @leofang before we merge

@cryos
Copy link
Collaborator

cryos commented Sep 10, 2025

It looks like everything is passing in CI @kkraus14, it looks good to go from my perspective pending the open questions you mentioned.

Copy link
Member

@leofang leofang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what the matrix_filter is about (I assume it's to avoid editing the json file?), but LGTM

@leofang leofang merged commit 3508f75 into NVIDIA:12.9.x Sep 11, 2025
35 checks passed
@leofang
Copy link
Member

leofang commented Sep 11, 2025

Thanks, Keith, Marcus, Ralf!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cuda.bindings Everything related to the cuda.bindings module enhancement Any code-related improvements P1 Medium priority - Should do

Projects

None yet

Development

Successfully merging this pull request may close these issues.

MNT: Use CTK 12.9.1 to build cuda-bindings 12.9.x

4 participants