Shrink ToolTaskThatTimeoutAndRetry budgets: slowDelay 20s -> 18s, timeout 5s -> 4.5s (follow-up to #13830) by jankratochvilcz · Pull Request #13846 · dotnet/msbuild

jankratochvilcz · 2026-05-22T14:08:02Z

Follow-up to #13830, which widened ToolTaskThatTimeoutAndRetry's slow/fast gap (slowDelay 5 s → 20 s, Timeout 2 s → 5 s) and added per-attempt elapsedMs / configuredTimeoutMs telemetry explicitly so a follow-up could tighten the budgets once data accumulated.

24 h of post-merge main runs on dnceng-public pipeline 75 (builds 1430301, 1430551, 1430938, 1431045, 1431261):

Fast path (target: complete within Timeout)

TFM/arch	Runs	min ms	p50 ms	p95 ms	max ms	Failed
net10.0/x64	108	102	684	1102	1136	0
net472/x86	60	1030	1048	1226	1370	0

Slow path (target: be killed by Timeout)

TFM/arch	Runs	min ms	p50 ms	p95 ms	max ms	Failed
net10.0/x64	18	5030	5069	5199	5269	0
net472/x86	10	5030	5068	5579	5991	0

(Slow elapsed ≈ configured Timeout because the test asserts the timeout terminates the process.)

This PR shrinks:

Timeout 5 000 ms → 4 500 ms — ~3.3x of fast max=1370 ms, ~4x of fast p95=1131 ms.
slowDelay 20 000 ms → 18 000 ms — preserves the 4x slow/fast gap (essential so the slow path reliably hits the timeout, not natural completion).

For context: the original Timeout=2000 ms before #13830 was only ~1.46x of fast max, which is exactly why it flaked. 4 500 ms keeps a comfortable safety margin while reclaiming ~17 s of wall-clock per slow attempt (and the test runs the slow path for many parameter combos).

The Stopwatch + per-attempt telemetry stays in place so future regressions show up in the test output.

Risk

Test-only change. Single file (src/Utilities.UnitTests/ToolTask_Tests.cs). No production code touched.

Why draft

Keeping in draft for one more day of main data to confirm no new outliers appear before requesting review.

…eout 5s -> 4.5s 24h of post-merge telemetry from dnceng-public pipeline 75 (introduced by dotnet#13830) shows actual per-attempt elapsedMs distributions: Fast path (target: should succeed within Timeout): net10.0/x64: min=102 p50=684 p95=1102 max=1136 (108 runs) net472/x86: min=1030 p50=1048 p95=1226 max=1370 (60 runs) Slow path (target: should be terminated by Timeout): net10.0/x64: 5030-5269ms (essentially = configured 5000ms timeout) net472/x86: 5030-5991ms (essentially = configured 5000ms timeout) Shrink Timeout 5000 -> 4500 (~3.3x of observed fast max=1370, ~4x of fast p95=1131) and slowDelay 20000 -> 18000 to preserve the same 4x slow/fast gap. The original 2000ms timeout from before dotnet#13830 was 1.46x of max — too tight, hence the flake. 4500ms keeps a comfortable safety margin while reclaiming ~17s of wall-clock per slow attempt. The Stopwatch + elapsedMs/configuredTimeoutMs telemetry stays in place so future regressions are visible in test output. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

jankratochvilcz · 2026-05-22T14:11:45Z

Keeping this in draft for now — 24h on main showed 0 flakes across ~20 runs, but that sample is too small to confirm the shrunk budget is safe (the original flake rate was probably <1%). Will revisit once we have another ~3-7 days of CI data on main + PR validation to confirm the worst-case stays well under the new budget.

jankratochvilcz · 2026-05-25T09:18:44Z

jankratochvilcz mentioned this pull request May 22, 2026

Stabilize 3 ToolTask_Tests flakes with diagnostics + timing fix #13830

Merged

jankratochvilcz closed this May 25, 2026

jankratochvilcz mentioned this pull request May 25, 2026

Fix or disable flaky tests #13667

Open

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Shrink ToolTaskThatTimeoutAndRetry budgets: slowDelay 20s -> 18s, timeout 5s -> 4.5s (follow-up to #13830)#13846

Shrink ToolTaskThatTimeoutAndRetry budgets: slowDelay 20s -> 18s, timeout 5s -> 4.5s (follow-up to #13830)#13846
jankratochvilcz wants to merge 1 commit into
dotnet:mainfrom
jankratochvilcz:jankratochvilcz/flake/tooltask-retry-shrink

jankratochvilcz commented May 22, 2026

Uh oh!

jankratochvilcz commented May 22, 2026

Uh oh!

jankratochvilcz commented May 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jankratochvilcz commented May 22, 2026

Risk

Why draft

Uh oh!

jankratochvilcz commented May 22, 2026

Uh oh!

jankratochvilcz commented May 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant