Timing #84

Robadob · 2022-04-22T09:23:53Z

There should probably be some coverage of timing.

Performance trouble shooting should reference how timing can be collected within FGPU2.
Timing/performance metrics should be detailed in logging pages (emphasis perf metrics always logged in ensembles??).
Maybe some advanced discussion regarding accuracy/wddm, impact of cuda event timers etc.

ptheywood · 2022-04-22T10:12:16Z

Maybe some advanced discussion regarding accuracy/wddm, impact of cuda event timers etc.

CUDA event timers have a resolution of "around 0.5 microscends", and timing only behaves as intended when the event's are recorded in the NULL (default) stream:

Computes the elapsed time between two events (in milliseconds with a resolution of around 0.5 microseconds).

If either event was last recorded in a non-NULL stream, the resulting time may be greater than expected (even if both used the same stream handle). This happens because the cudaEventRecord() operation takes place asynchronously and there is no guarantee that the measured latency is actually just between the two events. Any number of other different stream operations could execute in between the two measured events, thus altering the timing in a significant way.

source

Under WDDM, due to how the WDDM command buffers work, cudaEvent based timing is only meaningful for pure device code (unless you add immediate stream/event/device sync after recording). See FLAMEGPU/FLAMEGPU2#451.
The current implementation in FLAME GPU uses std::steady_clock timers when the gpu is running under WDDM.

std::steady_clock timers are generally not as good, but they are implementation and hardware specific, so can't document a known accuracy / precision. It might be possible to calculate one at runtime though. They might not be precise enough to give useful per step or per layer timing depending on the model.
std::high_resolution_clock sounds like it should be better, but its implementation defined. MSVC it is just a std::steady_clock, but gcc uses std::system_clock which is not good for performance timing (it's not monotonic).

Robadob assigned ptheywood Jul 14, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Timing #84

Timing #84

Robadob commented Apr 22, 2022

ptheywood commented Apr 22, 2022 •

edited

Loading

Timing #84

Timing #84

Comments

Robadob commented Apr 22, 2022

ptheywood commented Apr 22, 2022 • edited Loading

ptheywood commented Apr 22, 2022 •

edited

Loading