Skip to content

CPU utilization relating to high-precision scheduling #640

@planetchili

Description

@planetchili

ETW traces have been reported to show significant proportion of service utilization attributed to SwitchToThread. Investigation has shown that, at least on the machines examined, SwitchToThread utilization was far less than utilization attributed to waiting on Windows high-resolution timer (184ms vs 4687ms of core slice over 63 seconds).

The main source is the ETW flushing thread, which runs at 8ms period (125 Hz). If we can switch this to using Sleep we can reduce service utilization (already low at 0.4% on test machine) by a significant proportion. It is not important that flush occurs at a precise cadence; it is not a sampling operation.

However, the maximum duration between flushes directly influences latency of frame event delivery to clients. Observation has shown latency increases at double the rate of increase of flush period. So moving from 8ms (current) to roughly 16ms (Sleep(1)) would potentially increase latency by 16 ms. Still, this does not seem to be a dealbreaker.

The next-highest utilization location for precision waiting is telemetry poll loop. Here we are sampling at a fixed cadence and jitter would be undesirable. However, telemetry typically runs at 100 ms period, and so ~15ms max jitter is likely not a dealbreaker either. Also, since this operation runs only at 10 Hz, it may be permissible to keep the high-precision wait for telemetry.

It is also possible to set the system timer precision higher, up to 1ms precision, at the cost of some system overhead for the higher rate of servicing. This could be opt in for users desiring lower-latency ETW and/or high precision in sampling cadence (lower jitter/skew).

Limits for period settings in the API might need to be adjusted to take the reduced resolution into account (allowing user to set 4 ms ETW or telemetry period makes little sense with 15.6 ms timer resolution. Default dynamic querying offset will need to be adjusted higher to account for the increase frame data delay. Testing should be done to find the new sweet spot.

Consider also selecting between high-precision and Sleep waiting modes at runtime based on API user demands. Same goes for setting system timer resolution.

Metadata

Metadata

Assignees

Type

No type
No fields configured for issues without a type.

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions