Skip to content

Conversation

mdboom
Copy link
Contributor

@mdboom mdboom commented Sep 4, 2025

Description

Instead of using pywin32, just calls the win32 APIs directly using Cython extern.

closes #938

This has a measurable impact on import time of about 9% (import cuda.bindings.driver in a fresh interpreter), mainly by not spending time importing win32api:

  • Before: 92.1 ms
  • After: 84.6 ms

It also improves "time to first call" by about 10% (since the first call resolves all of the dynamic function pointers and makes many win32 API calls):

  • Before: 103.2 ms
  • After: 93.4 ms

Checklist

  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@copy-pr-bot
Copy link
Contributor

copy-pr-bot bot commented Sep 4, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@mdboom mdboom marked this pull request as draft September 4, 2025 12:47
@mdboom
Copy link
Contributor Author

mdboom commented Sep 4, 2025

/ok to test

@copy-pr-bot
Copy link
Contributor

copy-pr-bot bot commented Sep 4, 2025

/ok to test

@mdboom, there was an error processing your request: E1

See the following link for more information: https://docs.gha-runners.nvidia.com/cpr/e/1/

@mdboom
Copy link
Contributor Author

mdboom commented Sep 4, 2025

/ok to test

@copy-pr-bot
Copy link
Contributor

copy-pr-bot bot commented Sep 4, 2025

/ok to test

@mdboom, there was an error processing your request: E1

See the following link for more information: https://docs.gha-runners.nvidia.com/cpr/e/1/

@mdboom
Copy link
Contributor Author

mdboom commented Sep 4, 2025

/ok to test 7828876

@github-actions

This comment has been minimized.

@mdboom mdboom marked this pull request as ready for review September 4, 2025 14:36
@mdboom mdboom requested review from leofang and rwgk September 4, 2025 14:36
rwgk
rwgk previously approved these changes Sep 4, 2025
Copy link
Collaborator

@rwgk rwgk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wonderful!

It'd be ideal to also reduce the code duplication (get_cuda_version(), cdef extern from "windows.h":) in a set of follow-on PRs. I believe it'll be pretty straightforward after this PR and the associated codegen PRs are merged.

@mdboom
Copy link
Contributor Author

mdboom commented Sep 4, 2025

It'd be ideal to also reduce the code duplication (get_cuda_version(), cdef extern from "windows.h":) in a set of follow-on PRs. I believe it'll be pretty straightforward after this PR and the associated codegen PRs are merged.

The files that are generated by cybind are including the externs in a template, so they at least aren't copy-pasted. That is, all of the templates for each library have a ${snippet_windows_externs} that inserts all of this boilerplate (including the get_cuda_version function).

It might have been nicer to cimport cuda.bindings._lib.windll instead, but that would add a build-time dependency on cuda_bindings that not all of the cybind-generated packages have (though I might be misunderstanding that part). So I thought it better to make them "standalone" by having cybind put this snippet in all of them. See the cybind PR I also filed for more details.

@NVIDIA NVIDIA deleted a comment from copy-pr-bot bot Sep 4, 2025
@mdboom
Copy link
Contributor Author

mdboom commented Sep 4, 2025

/ok to test 8c7ea2e

@rwgk
Copy link
Collaborator

rwgk commented Sep 4, 2025

Wow

CI / Test linux-64 / py3.10, 13.0.0, wheels, GPU l4 (push) Failing after 4m

>       assert delay_ms - generous_tolerance <= elapsed_time_ms < delay_ms + generous_tolerance
E       assert 520.4592895507812 < (500.0 + 20)

I haven't seen that flake for a while.

Just rerun and ignore. The timing being off by a small margin certainly isn't due to a problem in cuda-bindings.

rwgk
rwgk previously approved these changes Sep 4, 2025
Copy link
Collaborator

@rwgk rwgk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests passed, except for that one flake. I think a rerun of that test will resolve the flake.

@leofang leofang added the cuda.bindings Everything related to the cuda.bindings module label Sep 4, 2025
@leofang
Copy link
Member

leofang commented Sep 4, 2025

Let's merge not merge until the internal MR is approved, since it involves other teams as well.

@leofang leofang added enhancement Any code-related improvements P2 Low priority - Nice to have labels Sep 4, 2025
@leofang leofang added this to the cuda-python 13-next, 12-next milestone Sep 4, 2025
Copy link
Member

@leofang leofang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, I noticed the cuFile module is not changed?

@mdboom
Copy link
Contributor Author

mdboom commented Sep 5, 2025

FYI, I noticed the cuFile module is not changed?

Ah, I think that's just an oversight on my part. I will regenerate that as well.

@mdboom
Copy link
Contributor Author

mdboom commented Sep 5, 2025

/ok to test e0e868a

@mdboom
Copy link
Contributor Author

mdboom commented Sep 5, 2025

/ok to test 7a46173

@mdboom
Copy link
Contributor Author

mdboom commented Sep 5, 2025

/ok to test 488bdc2

@leofang leofang requested a review from tpn September 5, 2025 20:35
tpn
tpn previously approved these changes Sep 5, 2025
kkraus14
kkraus14 previously approved these changes Sep 9, 2025
@mdboom
Copy link
Contributor Author

mdboom commented Sep 10, 2025

/ok to test 673974c

@copy-pr-bot
Copy link
Contributor

copy-pr-bot bot commented Sep 10, 2025

/ok to test 673974c

@mdboom, there was an error processing your request: E2

See the following link for more information: https://docs.gha-runners.nvidia.com/cpr/e/2/

@mdboom
Copy link
Contributor Author

mdboom commented Sep 10, 2025

/ok to test dcce4f5

@mdboom mdboom requested a review from kkraus14 September 10, 2025 12:33
@mdboom
Copy link
Contributor Author

mdboom commented Sep 10, 2025

@kkraus14: I merged main into here for one final test, but otherwise no changes since your last "approved" review.

@mdboom mdboom enabled auto-merge (squash) September 10, 2025 12:37
Copy link
Member

@leofang leofang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot, @mdboom! Be sure to backport this PR.

@mdboom mdboom merged commit acfe654 into NVIDIA:main Sep 10, 2025
49 checks passed
@github-actions
Copy link

Doc Preview CI
Preview removed because the pull request was closed or merged.

@kkraus14 kkraus14 mentioned this pull request Sep 29, 2025
2 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cuda.bindings Everything related to the cuda.bindings module enhancement Any code-related improvements P2 Low priority - Nice to have

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[ENH]: Remove the use of win32api

5 participants