Fix flaky CounterSampleCalculator_ElapsedTime test#127204
Fix flaky CounterSampleCalculator_ElapsedTime test#127204danmoseley merged 4 commits intodotnet:mainfrom
Conversation
Use Stopwatch for both elapsed time measurements so they share the same clock source (QPC), instead of comparing QPC-based counter values against DateTime.Now. Under CI load the gap between the two clock sources could exceed the 0.3s tolerance. Also removed unused using directives. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
Tagging subscribers to this area: @steveisok, @dotnet/area-system-diagnostics-tracing |
There was a problem hiding this comment.
Pull request overview
Addresses flakiness in CounterSampleCalculator_ElapsedTime by ensuring elapsed-time comparisons use a consistent clock source.
Changes:
- Replaced
DateTime.Now-based elapsed measurement with aStopwatch-based calculation. - Removed unused
usingdirectives.
Show a summary per file
| File | Description |
|---|---|
src/libraries/System.Diagnostics.PerformanceCounter/tests/CounterSampleCalculatorTests.cs |
Updates the elapsed-time test to compute expected duration using Stopwatch (QPC) and cleans up unused usings. |
Copilot's findings
- Files reviewed: 1/1 changed files
- Comments generated: 2
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
adamsitnik
left a comment
There was a problem hiding this comment.
LGTM, thank you for fixing another flaky test @danmoseley !
jkotas
left a comment
There was a problem hiding this comment.
the gap between Stopwatch.GetTimestamp() and DateTime.Now on consecutive lines can exceed the 0.3s tolerance.
This does not make sense. Both of these have sub-millisecond granularity.
I agree that tests with small hardcoded time spans are prone to failing intermittently on overloaded machines due to thread scheduling happening at the right moment. Changing the time source won't fix these failures. The fix for these failures should be to either increase the hardcoded time span or to add a retry loop.
|
I had read that the system clock was updated on a 15ms interrupt, while the fine grained timer was not? I'll add a loop. |
Wrap the timing-sensitive measurement and assertion in RetryHelper.Execute (3 attempts) so transient CI scheduling delays don't cause flaky failures. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
It might in some cases, but that's still 20x less than 0.3 seconds that we allow currently. |
Wrap test body in try/finally so DeleteCategory and Dispose always run even if retries are exhausted. Restrict retries to XunitException (assertion failures) so unexpected errors propagate immediately. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
Yeah I looped against the Windows API's on a machine that was artificially loaded (quite different to CI hardware no doubt) GetSystemTimePreciseAsFileTime, QueryPerformanceCounter, DateTime.UtcNow.Ticks diverged by up to 0.1-0.2ms only. So I guess the cause was scheduling causing the perf counter read and the our time read to be separated by 30ms. |
I got this on an unrelated PR.
Note
This PR was authored with assistance from Copilot CLI.
The
CounterSampleCalculator_ElapsedTimetest occasionally fails in CI (0.01% rate) because it compares elapsed time measured by two different clock sources: QPC (via the perf counter infrastructure) andDateTime.Now(over system clock, updated by periodic interrupt). Under CI load, the gap betweenStopwatch.GetTimestamp()andDateTime.Nowon consecutive lines can exceed the 0.3s tolerance.Fix by using
Stopwatchfor both measurements so they share the same clock source (QPC). Can't use.GetElapsedTime()as that wasn't in .NET Framework.