WDDM and cudaEventElapsedTime #451

ptheywood · 2021-02-23T17:02:15Z

util/CUDAEventTimer.h uses cudaEvent_t created in the default stream and cudaEventElapsedTime to get record elapsed time with high precision.

Under linux (and presumably TCC windows) this appears to be accurate regardless of what happens between the events.

Under windows on WDDM devices, the cudaEventRecord calls appear to be handled through the WDDM command buffers (not too suprising). This however prevents accurate timing of host code.
Timing of device code between the events looks OK based on profiler output, but host code after the first cudaEventRecord is issued and before the WDDM command buffer is flushed may be skipped (i.e. a thread sleep).

For now, the timing is probably fine as it'll almost always be used on linux or tcc mode devices in our case.
We should investigate this, and potentially drop back to using chrono::stead_clock on windows WDDM, which will be fine for timing full simulations, but may not be good enough for timing individual steps or individual layers.

The precision of steady_clock is implementation specific. std::chrono::high_resolution clock at a glance looks like a better choice, however it is also implementation specific, either being std::chrono::steady_clock or std::chrono::system_clock depending on the compiler. We want a monotonic clock for timing purposes, so system_clock is not a good choice.

The text was updated successfully, but these errors were encountered:

ptheywood · 2021-07-15T10:41:06Z

util::detail::CUDAEventTimer and util::detail::event::SteadClockTimer provide similar but not exact APIs, as CUDA Event Timers require syncrhonisation prior to storing / accessing the timer, rather than it being immediatly available after stop.

Instead, if the CUDAEventTimer's sync() method beame part of getElapsedMilliseconds, then a common base class util::detail::Timer could be created. This would then allow cudaSimulation to select the timer based on the selected device, favouring the CUDAEventTimer, but falling back to SteadyClockTimer if required. I.e. if windows + selected device is WDDM, or just if Windows in general.

It would probably also be worth making the precision/accuracy of the timer used available to the user via stdout/stderr, or via a method somewhere (bool CUDAsimulation::timerIsHighPrecision? float CUDASimulation::timerPrecisionMilliseconds?)? As this might be useful if this timing is being used to report performance in a paper for instance.

Move CUDAEventTimer::sync into the body of CUDAEventTimer::getElapsedMilliiseconds
Create common base/virtual class util::detail::Timer
- Default ctor/dtor
- void start()
- void stop()
- float getElapsedMilliseconds()
Update use of CUDAEventTimers to instead get the appropriate timer for the active GPU?
- Maybe a static factory method on the base timer class?
Optionally Add a method/stdout logging for the precision of the timer to be made available.
- Static method / constexpr on each timer implementation reporting the precision value or if it is "high" precision as a bool?
- Method on CUDASimulation either outputting to stdout/stderr (if -v -t) or getting the precision of the timer(s) used.
- Method on CUDAEnsemble outting the precision of the ensemble timer?
Enure docstrings / onlne documetnation reports this somwhere.

It appears difficult to get the accuracy / resolution of stead_clock timers in real terms. The std::ratio accessible from the class is just std::nano, which is just the numbers it could represent, not the actual minimium value between two timing events.

ptheywood added enhancement Priority: Low labels Feb 23, 2021

ptheywood added this to the v2.0.0-beta.N milestone Aug 11, 2021

ptheywood mentioned this issue Aug 13, 2021

Use SteadyClockTimer in CUDASimulation for WDDM devices #640

Merged

3 tasks

ptheywood closed this as completed in #640 Aug 25, 2021

ptheywood mentioned this issue Apr 22, 2022

Timing FLAMEGPU/FLAMEGPU2-docs#84

Open

Robadob mentioned this issue Jan 3, 2023

WDDM Cuda timers RSE-Sheffield/COMCUDA_particle_assignment#1

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WDDM and cudaEventElapsedTime #451

WDDM and cudaEventElapsedTime #451

ptheywood commented Feb 23, 2021

ptheywood commented Jul 15, 2021 •

edited

Loading

WDDM and cudaEventElapsedTime #451

WDDM and cudaEventElapsedTime #451

Comments

ptheywood commented Feb 23, 2021

ptheywood commented Jul 15, 2021 • edited Loading

ptheywood commented Jul 15, 2021 •

edited

Loading