-
Notifications
You must be signed in to change notification settings - Fork 217
Fix #938: Call win32 APIs directly #942
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
/ok to test |
@mdboom, there was an error processing your request: See the following link for more information: https://docs.gha-runners.nvidia.com/cpr/e/1/ |
/ok to test |
@mdboom, there was an error processing your request: See the following link for more information: https://docs.gha-runners.nvidia.com/cpr/e/1/ |
/ok to test 7828876 |
This comment has been minimized.
This comment has been minimized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wonderful!
It'd be ideal to also reduce the code duplication (get_cuda_version()
, cdef extern from "windows.h":
) in a set of follow-on PRs. I believe it'll be pretty straightforward after this PR and the associated codegen PRs are merged.
The files that are generated by cybind are including the externs in a template, so they at least aren't copy-pasted. That is, all of the templates for each library have a It might have been nicer to |
/ok to test 8c7ea2e |
Wow CI / Test linux-64 / py3.10, 13.0.0, wheels, GPU l4 (push) Failing after 4m
I haven't seen that flake for a while. Just rerun and ignore. The timing being off by a small margin certainly isn't due to a problem in cuda-bindings. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tests passed, except for that one flake. I think a rerun of that test will resolve the flake.
Let's |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI, I noticed the cuFile module is not changed?
Ah, I think that's just an oversight on my part. I will regenerate that as well. |
/ok to test e0e868a |
/ok to test 7a46173 |
/ok to test 488bdc2 |
/ok to test 673974c |
@mdboom, there was an error processing your request: See the following link for more information: https://docs.gha-runners.nvidia.com/cpr/e/2/ |
/ok to test dcce4f5 |
@kkraus14: I merged main into here for one final test, but otherwise no changes since your last "approved" review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot, @mdboom! Be sure to backport this PR.
|
Description
Instead of using
pywin32
, just calls the win32 APIs directly using Cythonextern
.closes #938
This has a measurable impact on import time of about 9% (
import cuda.bindings.driver
in a fresh interpreter), mainly by not spending time importingwin32api
:It also improves "time to first call" by about 10% (since the first call resolves all of the dynamic function pointers and makes many win32 API calls):
Checklist