Shrink ToolTaskThatTimeoutAndRetry budgets: slowDelay 20s -> 18s, timeout 5s -> 4.5s (follow-up to #13830)#13846
Closed
jankratochvilcz wants to merge 1 commit into
Conversation
…eout 5s -> 4.5s 24h of post-merge telemetry from dnceng-public pipeline 75 (introduced by dotnet#13830) shows actual per-attempt elapsedMs distributions: Fast path (target: should succeed within Timeout): net10.0/x64: min=102 p50=684 p95=1102 max=1136 (108 runs) net472/x86: min=1030 p50=1048 p95=1226 max=1370 (60 runs) Slow path (target: should be terminated by Timeout): net10.0/x64: 5030-5269ms (essentially = configured 5000ms timeout) net472/x86: 5030-5991ms (essentially = configured 5000ms timeout) Shrink Timeout 5000 -> 4500 (~3.3x of observed fast max=1370, ~4x of fast p95=1131) and slowDelay 20000 -> 18000 to preserve the same 4x slow/fast gap. The original 2000ms timeout from before dotnet#13830 was 1.46x of max — too tight, hence the flake. 4500ms keeps a comfortable safety margin while reclaiming ~17s of wall-clock per slow attempt. The Stopwatch + elapsedMs/configuredTimeoutMs telemetry stays in place so future regressions are visible in test output. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Contributor
Author
|
Keeping this in draft for now — 24h on main showed 0 flakes across ~20 runs, but that sample is too small to confirm the shrunk budget is safe (the original flake rate was probably <1%). Will revisit once we have another ~3-7 days of CI data on main + PR validation to confirm the worst-case stays well under the new budget. |
Contributor
Author
|
|
1 task
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Follow-up to #13830, which widened
ToolTaskThatTimeoutAndRetry's slow/fast gap (slowDelay5 s → 20 s,Timeout2 s → 5 s) and added per-attemptelapsedMs/configuredTimeoutMstelemetry explicitly so a follow-up could tighten the budgets once data accumulated.24 h of post-merge
mainruns on dnceng-public pipeline 75 (builds 1430301, 1430551, 1430938, 1431045, 1431261):Fast path (target: complete within
Timeout)Slow path (target: be killed by
Timeout)(Slow elapsed ≈ configured Timeout because the test asserts the timeout terminates the process.)
This PR shrinks:
Timeout5 000 ms → 4 500 ms — ~3.3x of fast max=1370 ms, ~4x of fast p95=1131 ms.slowDelay20 000 ms → 18 000 ms — preserves the 4x slow/fast gap (essential so the slow path reliably hits the timeout, not natural completion).For context: the original
Timeout=2000ms before #13830 was only ~1.46x of fast max, which is exactly why it flaked. 4 500 ms keeps a comfortable safety margin while reclaiming ~17 s of wall-clock per slow attempt (and the test runs the slow path for many parameter combos).The
Stopwatch+ per-attempt telemetry stays in place so future regressions show up in the test output.Risk
Test-only change. Single file (
src/Utilities.UnitTests/ToolTask_Tests.cs). No production code touched.Why draft
Keeping in draft for one more day of
maindata to confirm no new outliers appear before requesting review.