-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WDDM and cudaEventElapsedTime #451
Comments
Instead, if the CUDAEventTimer's sync() method beame part of It would probably also be worth making the precision/accuracy of the timer used available to the user via stdout/stderr, or via a method somewhere (
It appears difficult to get the accuracy / resolution of stead_clock timers in real terms. The std::ratio accessible from the class is just std::nano, which is just the numbers it could represent, not the actual minimium value between two timing events. |
util/CUDAEventTimer.h
usescudaEvent_t
created in the default stream andcudaEventElapsedTime
to get record elapsed time with high precision.Under linux (and presumably TCC windows) this appears to be accurate regardless of what happens between the events.
Under windows on WDDM devices, the
cudaEventRecord
calls appear to be handled through the WDDM command buffers (not too suprising). This however prevents accurate timing of host code.Timing of device code between the events looks OK based on profiler output, but host code after the first
cudaEventRecord
is issued and before the WDDM command buffer is flushed may be skipped (i.e. a thread sleep).For now, the timing is probably fine as it'll almost always be used on linux or tcc mode devices in our case.
We should investigate this, and potentially drop back to using
chrono::stead_clock
on windows WDDM, which will be fine for timing full simulations, but may not be good enough for timing individual steps or individual layers.The precision of
steady_clock
is implementation specific.std::chrono::high_resolution
clock at a glance looks like a better choice, however it is also implementation specific, either beingstd::chrono::steady_clock
orstd::chrono::system_clock
depending on the compiler. We want a monotonic clock for timing purposes, sosystem_clock
is not a good choice.The text was updated successfully, but these errors were encountered: