Replace OS sleep with GPU nanosleep kernel in event timing test #1285

rwgk · 2025-11-24T23:03:15Z

This change replaces the timing-based event test's use of time.sleep() with a GPU-side nanosleep kernel, eliminating flakiness from OS/driver timing characteristics while maintaining deterministic test behavior.

Changes

Added NanosleepKernel helper that uses __nanosleep() to create a guaranteed 20 ms GPU-side delay
Updated test_timing_success to use the nanosleep kernel instead of time.sleep()
Removed OS-specific timing tolerance logic (Windows/WSL special cases)
Simplified assertions to check for finite elapsed time and a minimum threshold (>10ms)

Benefits

Deterministic: GPU-side delay is consistent across platforms, eliminating flakiness on Windows/WDDM and WSL
Simpler: Removes platform-specific tolerance calculations and OS timing dependencies
Reliable: Tests Event.__sub__ functionality without depending on OS timer resolution or driver scheduling behavior

This PR makes PR #1279 obsolete.

The previous test attempted to measure a real sleep delay between two event records, which introduced flakiness (especially on Windows/WDDM) and tested OS/driver timing behavior rather than the __sub__ implementation itself. This change replaces the test with a minimal, deterministic version that: * records two back-to-back events on the same stream * synchronizes on the second event to ensure both timestamps are valid * asserts that cuEventElapsedTime returns a finite, non-negative float This exercises the success path of Event.__sub__ without depending on actual GPU/OS timing characteristics, or requiring artificial GPU work.

copy-pr-bot · 2025-11-24T23:03:18Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

rwgk · 2025-11-24T23:10:40Z

/ok to test

copy-pr-bot · 2025-11-25T00:40:18Z

Auto-sync is disabled for ready for review pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

leofang · 2025-11-25T00:43:02Z

LGTM but it'd be nice for @kkraus14 to take another look.

cuda_core/tests/helpers/__init__.py

cuda_core/tests/test_event.py

This reverts commit 605f1ef.

kkraus14

LGTM! Thanks @rwgk

…ic timing Replace the back-to-back event record test with a version that uses a __nanosleep kernel between events. This ensures a guaranteed positive elapsed time (delta_ms > 10) without depending on OS/driver timing characteristics or requiring artificial GPU work beyond the minimal nanosleep delay. The kernel sleeps for 20ms (double the assertion threshold of 10ms), providing a large safety margin above the ~0.5 microsecond resolution of cudaEventElapsedTime, making this test deterministic and non-flaky across platforms including Windows/WDDM.

Replace single __nanosleep() call with clock64()-based loop to ensure the kernel actually waits for the full 20ms duration. A single __nanosleep() call doesn't guarantee the full sleep duration, which caused measured times to be orders of magnitude less than expected (~0.2ms instead of ~20ms). The new implementation: - Uses clock64() to measure actual elapsed time - Loops until 20ms worth of clock cycles have elapsed - Uses __nanosleep(1000000) inside the loop to yield and avoid 100% CPU spin This ensures delta_ms > 10 assertion is reliable and the test passes deterministically.

rwgk · 2025-11-25T06:34:59Z

/ok to test

rwgk · 2025-11-25T06:36:35Z

I (actually cursor-agent) added a couple commits to insert a kernel that causes a delay of 20 ms. I was surprised to see that __nanosleep by itself didn't lead to predictable behavior, is that expected?

rwgk · 2025-11-25T06:39:13Z

LGTM! Thanks @rwgk

Oops, sorry, I somehow missed this response.

I was curious to see how much trouble it is to add the kernel, and apart from the nanosleep surprise, cursor did that in seconds.

I'll let the test finish, then we can still decide if we want to keep the kernel, or remove it again.

kkraus14 · 2025-11-25T17:20:38Z

My 2c: is that we should remove it in the name of simplicity. This adds a lot of machinery and a lot of things that can go wrong in order to test event timing, but I don't have a strong opinion if you think that this is valuable.

I would move all of the kernel definition and compilation out of the test into the module if we decide to keep it though.

cuda_core/tests/test_event.py

leofang · 2025-11-25T18:12:06Z

cuda_core/tests/test_event.py

+    # Using a 10 ms threshold (half the sleep duration) provides a large safety margin above
+    # the ~0.5 microsecond resolution of cudaEventElapsedTime, making this test deterministic
+    # and non-flaky.
+    assert delta_ms > 10


Q: Should equality be included?

Suggested change

assert delta_ms > 10

assert delta_ms >= 10

Technically: Because of the large safety margin (expected 10 ms) it shouldn't matter at all.

Readability aspect: Making an effort to be precise here would send the wrong message, by distracting from the large safety margin.

significant changes

rwgk · 2025-11-25T19:08:22Z

I verified conclusively that test_event_elapsed_time_basic (as of commit 5eba5ac)
PASSES with the Windows WDDM driver model, but HAGS OFF.

Command used:

python -m pytest -ra -s -vv tests\test_event.py::test_event_elapsed_time_basic --count=1000

Complete test script: test_event_elapsed_time_basic.cmd

Full log: test_event_elapsed_time_basic_hags_off_2025-11-25+105446_log.txt

For completeness: The same pytest command also passes with HAGS ON.

rwgk · 2025-11-25T19:12:38Z

I would move all of the kernel definition and compilation out of the test into the module if we decide to keep it though.

Given all the effort that went into this already (mostly before this PR, including the back and forth with SWQA) I don't want to stop a few inches before the finish line.

I'll move the new kernel code into a module and address Leo's comments.

https://docs.nvidia.com/cuda/cuda-c-programming-guide/#time-function

rwgk · 2025-11-25T23:21:12Z

I would move all of the kernel definition and compilation out of the test into the module if we decide to keep it though.

Done. That particular step was again just seconds worth of cursor-agent effort.

I changed the naming back to be more similar to the original code, and polished the comments manually, to convince myself it's all accurate.

Interactive testing with HAGS OFF still passes (pytest -ra -s -v tests\test_event.py::test_timing_success --count=1000).

I'll reboot my machine to get it back to HAGS ON (where I want to keep it) and then test again, while the CI is running here.

rwgk · 2025-11-25T23:21:54Z

/ok to test

cuda_core/tests/helpers/nanosleep_kernel.py

rwgk · 2025-11-26T05:31:34Z

/ok to test

kkraus14

LGTM!

rwgk · 2025-11-26T16:35:09Z

Thanks for all the feedback!

github-actions · 2025-11-26T16:44:27Z

Doc Preview CI
Preview removed because the pull request was closed or merged.

cuda_core/tests/helpers/__init__.py: also use CUDA_HOME

605f1ef

rwgk mentioned this pull request Nov 24, 2025

Check Hardware Accelerated GPU Scheduling (HAGS) status on Windows #1279

Closed

This comment has been minimized.

Sign in to view

rwgk marked this pull request as ready for review November 25, 2025 00:40

leofang self-assigned this Nov 25, 2025

leofang added enhancement Any code-related improvements P0 High priority - Must do! cuda.core Everything related to the cuda.core module labels Nov 25, 2025

leofang added this to the cuda.core beta 10 milestone Nov 25, 2025

rparolin reviewed Nov 25, 2025

View reviewed changes

cuda_core/tests/helpers/__init__.py Outdated Show resolved Hide resolved

rparolin reviewed Nov 25, 2025

View reviewed changes

cuda_core/tests/test_event.py Outdated Show resolved Hide resolved

rparolin reviewed Nov 25, 2025

View reviewed changes

cuda_core/tests/test_event.py Outdated Show resolved Hide resolved

Revert "cuda_core/tests/helpers/__init__.py: also use CUDA_HOME"

29f7882

This reverts commit 605f1ef.

kkraus14 previously approved these changes Nov 25, 2025

View reviewed changes

rwgk added 2 commits November 24, 2025 22:33

leofang assigned rwgk and unassigned leofang Nov 25, 2025

leofang approved these changes Nov 25, 2025

View reviewed changes

rwgk added 8 commits November 25, 2025 14:10

Merge branch 'main' into cuda_core_test_event_sub_basic

58f6685

clock64() return type is documented as long long int:

ad16933

https://docs.nvidia.com/cuda/cuda-c-programming-guide/#time-function

Use device.arch instead of joining device.compute_capability

55d9b44

cusor-generated cuda_core/tests/helpers/nanosleep_kernel.py

3762490

Change NanosleepKernel API to sleep_duration_ms

528e77e

Rename back to test_timing_success

f585ce0

Streamline a comment

909f380

Polish comments. Make the code more similar to the existing code.

18563e8

kkraus14 reviewed Nov 26, 2025

View reviewed changes

cuda_core/tests/helpers/nanosleep_kernel.py Outdated Show resolved Hide resolved

Simplify nanosleep_kernel implementation.

35510e6

kkraus14 approved these changes Nov 26, 2025

View reviewed changes

rwgk changed the title ~~Replace timing-based event test with deterministic elapsed-time check~~ Replace OS sleep with GPU nanosleep kernel in event timing test Nov 26, 2025

rwgk merged commit f0af76d into NVIDIA:main Nov 26, 2025
61 checks passed

rwgk deleted the cuda_core_test_event_sub_basic branch November 26, 2025 16:35

Replace OS sleep with GPU nanosleep kernel in event timing test #1285

Replace OS sleep with GPU nanosleep kernel in event timing test #1285

Uh oh!

Conversation

rwgk commented Nov 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Benefits

Uh oh!

copy-pr-bot bot commented Nov 24, 2025

Uh oh!

rwgk commented Nov 24, 2025

Uh oh!

This comment has been minimized.

copy-pr-bot bot commented Nov 25, 2025

Uh oh!

leofang commented Nov 25, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kkraus14 left a comment

Choose a reason for hiding this comment

Uh oh!

rwgk commented Nov 25, 2025

Uh oh!

rwgk commented Nov 25, 2025

Uh oh!

rwgk commented Nov 25, 2025

Uh oh!

kkraus14 commented Nov 25, 2025

Uh oh!

Uh oh!

Uh oh!

leofang Nov 25, 2025

Choose a reason for hiding this comment

Uh oh!

rwgk Nov 25, 2025

Choose a reason for hiding this comment

Uh oh!

rwgk commented Nov 25, 2025

Uh oh!

rwgk commented Nov 25, 2025

Uh oh!

rwgk commented Nov 25, 2025

Uh oh!

rwgk commented Nov 25, 2025

Uh oh!

Uh oh!

rwgk commented Nov 26, 2025

Uh oh!

kkraus14 left a comment

Choose a reason for hiding this comment

Uh oh!

rwgk commented Nov 26, 2025

Uh oh!

Uh oh!

github-actions bot commented Nov 26, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

rwgk commented Nov 24, 2025 •

edited

Loading