-
Notifications
You must be signed in to change notification settings - Fork 224
Replace OS sleep with GPU nanosleep kernel in event timing test #1285
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The previous test attempted to measure a real sleep delay between two event records, which introduced flakiness (especially on Windows/WDDM) and tested OS/driver timing behavior rather than the __sub__ implementation itself. This change replaces the test with a minimal, deterministic version that: * records two back-to-back events on the same stream * synchronizes on the second event to ensure both timestamps are valid * asserts that cuEventElapsedTime returns a finite, non-negative float This exercises the success path of Event.__sub__ without depending on actual GPU/OS timing characteristics, or requiring artificial GPU work.
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
|
/ok to test |
This comment has been minimized.
This comment has been minimized.
|
Auto-sync is disabled for ready for review pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
|
LGTM but it'd be nice for @kkraus14 to take another look. |
This reverts commit 605f1ef.
kkraus14
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks @rwgk
…ic timing Replace the back-to-back event record test with a version that uses a __nanosleep kernel between events. This ensures a guaranteed positive elapsed time (delta_ms > 10) without depending on OS/driver timing characteristics or requiring artificial GPU work beyond the minimal nanosleep delay. The kernel sleeps for 20ms (double the assertion threshold of 10ms), providing a large safety margin above the ~0.5 microsecond resolution of cudaEventElapsedTime, making this test deterministic and non-flaky across platforms including Windows/WDDM.
Replace single __nanosleep() call with clock64()-based loop to ensure the kernel actually waits for the full 20ms duration. A single __nanosleep() call doesn't guarantee the full sleep duration, which caused measured times to be orders of magnitude less than expected (~0.2ms instead of ~20ms). The new implementation: - Uses clock64() to measure actual elapsed time - Loops until 20ms worth of clock cycles have elapsed - Uses __nanosleep(1000000) inside the loop to yield and avoid 100% CPU spin This ensures delta_ms > 10 assertion is reliable and the test passes deterministically.
|
/ok to test |
|
I (actually cursor-agent) added a couple commits to insert a kernel that causes a delay of 20 ms. I was surprised to see that |
Oops, sorry, I somehow missed this response. I was curious to see how much trouble it is to add the kernel, and apart from the nanosleep surprise, cursor did that in seconds. I'll let the test finish, then we can still decide if we want to keep the kernel, or remove it again. |
|
My 2c: is that we should remove it in the name of simplicity. This adds a lot of machinery and a lot of things that can go wrong in order to test event timing, but I don't have a strong opinion if you think that this is valuable. I would move all of the kernel definition and compilation out of the test into the module if we decide to keep it though. |
cuda_core/tests/test_event.py
Outdated
| # Using a 10 ms threshold (half the sleep duration) provides a large safety margin above | ||
| # the ~0.5 microsecond resolution of cudaEventElapsedTime, making this test deterministic | ||
| # and non-flaky. | ||
| assert delta_ms > 10 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Q: Should equality be included?
| assert delta_ms > 10 | |
| assert delta_ms >= 10 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically: Because of the large safety margin (expected 10 ms) it shouldn't matter at all.
Readability aspect: Making an effort to be precise here would send the wrong message, by distracting from the large safety margin.
|
I verified conclusively that Command used: Complete test script: test_event_elapsed_time_basic.cmd Full log: test_event_elapsed_time_basic_hags_off_2025-11-25+105446_log.txt For completeness: The same pytest command also passes with HAGS ON. |
Given all the effort that went into this already (mostly before this PR, including the back and forth with SWQA) I don't want to stop a few inches before the finish line. I'll move the new kernel code into a module and address Leo's comments. |
Done. That particular step was again just seconds worth of cursor-agent effort. I changed the naming back to be more similar to the original code, and polished the comments manually, to convince myself it's all accurate. Interactive testing with HAGS OFF still passes ( I'll reboot my machine to get it back to HAGS ON (where I want to keep it) and then test again, while the CI is running here. |
|
/ok to test |
|
/ok to test |
kkraus14
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
|
Thanks for all the feedback! |
|
This change replaces the timing-based event test's use of
time.sleep()with a GPU-side nanosleep kernel, eliminating flakiness from OS/driver timing characteristics while maintaining deterministic test behavior.Changes
NanosleepKernelhelper that uses__nanosleep()to create a guaranteed 20 ms GPU-side delaytest_timing_successto use the nanosleep kernel instead oftime.sleep()Benefits
Event.__sub__functionality without depending on OS timer resolution or driver scheduling behaviorThis PR makes PR #1279 obsolete.