Fix #938: Call win32 APIs directly #942

mdboom · 2025-09-04T12:40:25Z

Description

Instead of using pywin32, just calls the win32 APIs directly using Cython extern.

closes #938

This has a measurable impact on import time of about 9% (import cuda.bindings.driver in a fresh interpreter), mainly by not spending time importing win32api:

Before: 92.1 ms
After: 84.6 ms

It also improves "time to first call" by about 10% (since the first call resolves all of the dynamic function pointers and makes many win32 API calls):

Before: 103.2 ms
After: 93.4 ms

Checklist

New or existing tests cover these changes.
The documentation is up to date with these changes.

copy-pr-bot · 2025-09-04T12:40:29Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

mdboom · 2025-09-04T12:47:45Z

/ok to test

copy-pr-bot · 2025-09-04T12:47:48Z

/ok to test

@mdboom, there was an error processing your request: E1

See the following link for more information: https://docs.gha-runners.nvidia.com/cpr/e/1/

mdboom · 2025-09-04T13:05:20Z

/ok to test

copy-pr-bot · 2025-09-04T13:05:23Z

/ok to test

@mdboom, there was an error processing your request: E1

See the following link for more information: https://docs.gha-runners.nvidia.com/cpr/e/1/

mdboom · 2025-09-04T13:23:05Z

/ok to test 7828876

rwgk

Wonderful!

It'd be ideal to also reduce the code duplication (get_cuda_version(), cdef extern from "windows.h":) in a set of follow-on PRs. I believe it'll be pretty straightforward after this PR and the associated codegen PRs are merged.

mdboom · 2025-09-04T18:41:21Z

It'd be ideal to also reduce the code duplication (get_cuda_version(), cdef extern from "windows.h":) in a set of follow-on PRs. I believe it'll be pretty straightforward after this PR and the associated codegen PRs are merged.

The files that are generated by cybind are including the externs in a template, so they at least aren't copy-pasted. That is, all of the templates for each library have a ${snippet_windows_externs} that inserts all of this boilerplate (including the get_cuda_version function).

It might have been nicer to cimport cuda.bindings._lib.windll instead, but that would add a build-time dependency on cuda_bindings that not all of the cybind-generated packages have (though I might be misunderstanding that part). So I thought it better to make them "standalone" by having cybind put this snippet in all of them. See the cybind PR I also filed for more details.

mdboom · 2025-09-04T18:42:18Z

/ok to test 8c7ea2e

rwgk · 2025-09-04T19:10:34Z

Wow

CI / Test linux-64 / py3.10, 13.0.0, wheels, GPU l4 (push) Failing after 4m

>       assert delay_ms - generous_tolerance <= elapsed_time_ms < delay_ms + generous_tolerance
E       assert 520.4592895507812 < (500.0 + 20)

I haven't seen that flake for a while.

Just rerun and ignore. The timing being off by a small margin certainly isn't due to a problem in cuda-bindings.

rwgk

Tests passed, except for that one flake. I think a rerun of that test will resolve the flake.

leofang · 2025-09-04T20:18:17Z

Let's ~~merge~~ not merge until the internal MR is approved, since it involves other teams as well.

leofang

FYI, I noticed the cuFile module is not changed?

cuda_bindings/cuda/bindings/_lib/windll.pxd

mdboom · 2025-09-05T16:26:14Z

FYI, I noticed the cuFile module is not changed?

Ah, I think that's just an oversight on my part. I will regenerate that as well.

mdboom · 2025-09-05T17:02:19Z

/ok to test e0e868a

mdboom · 2025-09-05T17:04:49Z

/ok to test 7a46173

cuda_bindings/cuda/bindings/_lib/windll.pxd

mdboom · 2025-09-05T19:17:48Z

/ok to test 488bdc2

cuda_bindings/cuda/bindings/_internal/cufile_linux.pyx

cuda_bindings/cuda/bindings/_internal/nvjitlink_windows.pyx

mdboom · 2025-09-10T12:30:03Z

/ok to test 673974c

copy-pr-bot · 2025-09-10T12:30:06Z

/ok to test 673974c

@mdboom, there was an error processing your request: E2

See the following link for more information: https://docs.gha-runners.nvidia.com/cpr/e/2/

mdboom · 2025-09-10T12:31:01Z

/ok to test dcce4f5

mdboom · 2025-09-10T12:34:00Z

@kkraus14: I merged main into here for one final test, but otherwise no changes since your last "approved" review.

leofang

Thanks a lot, @mdboom! Be sure to backport this PR.

github-actions · 2025-09-10T23:21:53Z

Doc Preview CI
Preview removed because the pull request was closed or merged.

mdboom marked this pull request as draft September 4, 2025 12:47

Fix NVIDIA#938: Call win32 APIs directly

7828876

mdboom force-pushed the issue938 branch from d2d1542 to 7828876 Compare September 4, 2025 13:05

mdboom mentioned this pull request Sep 4, 2025

Fix #702: Update cyruntime.getLocalRuntimeVersion to use pathfinder #929

Merged

2 tasks

This comment has been minimized.

Sign in to view

mdboom marked this pull request as ready for review September 4, 2025 14:36

mdboom requested review from leofang and rwgk September 4, 2025 14:36

rwgk previously approved these changes Sep 4, 2025

View reviewed changes

Address comments from PR

8c7ea2e

mdboom dismissed rwgk’s stale review via 8c7ea2e September 4, 2025 18:37

NVIDIA deleted a comment from copy-pr-bot bot Sep 4, 2025

rwgk previously approved these changes Sep 4, 2025

View reviewed changes

leofang assigned mdboom Sep 4, 2025

leofang added the cuda.bindings Everything related to the cuda.bindings module label Sep 4, 2025

leofang added enhancement Any code-related improvements P2 Low priority - Nice to have labels Sep 4, 2025

leofang added this to the cuda-python 13-next, 12-next milestone Sep 4, 2025

leofang reviewed Sep 4, 2025

View reviewed changes

leofang requested changes Sep 5, 2025

View reviewed changes

Address comments from PR

e0e868a

mdboom dismissed rwgk’s stale review via e0e868a September 5, 2025 17:01

Remove APIs

7a46173

leofang reviewed Sep 5, 2025

View reviewed changes

cuda_bindings/cuda/bindings/_lib/windll.pxd Outdated Show resolved Hide resolved

Don't check return type

488bdc2

leofang requested a review from tpn September 5, 2025 20:35

tpn previously approved these changes Sep 5, 2025

View reviewed changes

kkraus14 reviewed Sep 6, 2025

View reviewed changes

cuda_bindings/cuda/bindings/_internal/cufile_linux.pyx Show resolved Hide resolved

cuda_bindings/cuda/bindings/_internal/nvjitlink_windows.pyx Show resolved Hide resolved

cuda_bindings/cuda/bindings/_internal/nvjitlink_windows.pyx Show resolved Hide resolved

Address comments in PR

673974c

mdboom dismissed tpn’s stale review via 673974c September 8, 2025 12:52

kkraus14 previously approved these changes Sep 9, 2025

View reviewed changes

mdboom dismissed kkraus14’s stale review via 4da19ad September 10, 2025 12:27

mdboom force-pushed the issue938 branch from 4da19ad to 673974c Compare September 10, 2025 12:29

Merge branch 'main' into issue938

dcce4f5

mdboom requested a review from kkraus14 September 10, 2025 12:33

mdboom enabled auto-merge (squash) September 10, 2025 12:37

leofang approved these changes Sep 10, 2025

View reviewed changes

mdboom merged commit acfe654 into NVIDIA:main Sep 10, 2025
49 checks passed

kkraus14 mentioned this pull request Sep 29, 2025

Backport #986 & #1005 #1042

Merged

2 tasks

Fix #938: Call win32 APIs directly #942

Fix #938: Call win32 APIs directly #942

Uh oh!

Conversation

mdboom commented Sep 4, 2025

Description

Checklist

Uh oh!

copy-pr-bot bot commented Sep 4, 2025

Uh oh!

mdboom commented Sep 4, 2025

Uh oh!

copy-pr-bot bot commented Sep 4, 2025

Uh oh!

mdboom commented Sep 4, 2025

Uh oh!

copy-pr-bot bot commented Sep 4, 2025

Uh oh!

mdboom commented Sep 4, 2025

Uh oh!

This comment has been minimized.

rwgk left a comment

Choose a reason for hiding this comment

Uh oh!

mdboom commented Sep 4, 2025

Uh oh!

mdboom commented Sep 4, 2025

Uh oh!

rwgk commented Sep 4, 2025

Uh oh!

rwgk left a comment

Choose a reason for hiding this comment

Uh oh!

leofang commented Sep 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

leofang left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mdboom commented Sep 5, 2025

Uh oh!

mdboom commented Sep 5, 2025

Uh oh!

mdboom commented Sep 5, 2025

Uh oh!

Uh oh!

mdboom commented Sep 5, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mdboom commented Sep 10, 2025

Uh oh!

copy-pr-bot bot commented Sep 10, 2025

Uh oh!

mdboom commented Sep 10, 2025

Uh oh!

mdboom commented Sep 10, 2025

Uh oh!

leofang left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions bot commented Sep 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

leofang commented Sep 4, 2025 •

edited

Loading