Skip to content

perf: specialize Task.WhenAny for 3 and 4 tasks#127216

Closed
unsafePtr wants to merge 1 commit intodotnet:mainfrom
unsafePtr:perf/whenany-3-4-fastpath
Closed

perf: specialize Task.WhenAny for 3 and 4 tasks#127216
unsafePtr wants to merge 1 commit intodotnet:mainfrom
unsafePtr:perf/whenany-3-4-fastpath

Conversation

@unsafePtr
Copy link
Copy Markdown

@unsafePtr unsafePtr commented Apr 21, 2026

Mirrors the existing TwoTaskWhenAnyPromise pattern to eliminate the Task[] defensive-copy allocation for Count=3 and Count=4. Count>=5 unchanged.

Closes #126748

Benchmarks


BenchmarkDotNet v0.16.0-nightly.20260320.467, Windows 11 (10.0.26200.8246/25H2/2025Update/HudsonValley2)
13th Gen Intel Core i7-13700KF 3.40GHz, 1 CPU, 24 logical and 16 physical cores
Memory: 63.72 GB Total, 53.5 GB Available
.NET SDK 11.0.100-preview.3.26170.106
  [Host]     : .NET 11.0.0 (11.0.0-preview.3.26170.106, 11.0.26.17106), X64 RyuJIT x86-64-v3
  Job-UHUKNN : .NET 11.0.0 (11.0.0-dev, 42.42.42.42424), X64 RyuJIT x86-64-v3
  Job-SVDRLY : .NET 11.0.0 (11.0.0-dev, 42.42.42.42424), X64 RyuJIT x86-64-v3


Method Job Toolchain Count Mean Error StdDev Ratio Allocated Alloc Ratio
AllCompleted Job-UHUKNN \runtime-baseline\artifacts\bin\testhost\net11.0-windows-Release-x64\shared\Microsoft.NETCore.App\11.0.0\corerun.exe 3 43.07 ns 0.447 ns 0.396 ns 1.00 136 B 1.00
AllCompleted Job-SVDRLY \runtime\artifacts\bin\testhost\net11.0-windows-Release-x64\shared\Microsoft.NETCore.App\11.0.0\corerun.exe 3 10.39 ns 0.107 ns 0.100 ns 0.24 72 B 0.53
AllPendingThenComplete Job-UHUKNN \runtime-baseline\artifacts\bin\testhost\net11.0-windows-Release-x64\shared\Microsoft.NETCore.App\11.0.0\corerun.exe 3 147.30 ns 0.941 ns 0.881 ns 1.00 584 B 1.00
AllPendingThenComplete Job-SVDRLY \runtime\artifacts\bin\testhost\net11.0-windows-Release-x64\shared\Microsoft.NETCore.App\11.0.0\corerun.exe 3 131.14 ns 0.867 ns 0.811 ns 0.89 544 B 0.93
AllCompleted Job-UHUKNN \runtime-baseline\artifacts\bin\testhost\net11.0-windows-Release-x64\shared\Microsoft.NETCore.App\11.0.0\corerun.exe 4 44.30 ns 0.173 ns 0.161 ns 1.00 144 B 1.00
AllCompleted Job-SVDRLY \runtime\artifacts\bin\testhost\net11.0-windows-Release-x64\shared\Microsoft.NETCore.App\11.0.0\corerun.exe 4 10.08 ns 0.102 ns 0.090 ns 0.23 72 B 0.50
AllPendingThenComplete Job-UHUKNN \runtime-baseline\artifacts\bin\testhost\net11.0-windows-Release-x64\shared\Microsoft.NETCore.App\11.0.0\corerun.exe 4 178.93 ns 0.897 ns 0.796 ns 1.00 736 B 1.00
AllPendingThenComplete Job-SVDRLY \runtime\artifacts\bin\testhost\net11.0-windows-Release-x64\shared\Microsoft.NETCore.App\11.0.0\corerun.exe 4 171.51 ns 0.694 ns 0.649 ns 0.96 696 B 0.95

Benchmark code

[MemoryDiagnoser(false)]
public class Perf_WhenAny
{
    [Params(3, 4)]
    public int Count;

    private Task[] _completed = default!;

    [GlobalSetup]
    public void Setup()
    {
        _completed = new Task[Count];
        for (int i = 0; i < Count; i++)
        {
            _completed[i] = Task.CompletedTask;
        }
    }

    [Benchmark]
    public Task AllCompleted() => WhenAnyNoInlining(_completed);

    [Benchmark]
    [MethodImpl(MethodImplOptions.NoInlining)]
    public object AllPendingThenComplete()
    {
        var sources = new TaskCompletionSource<int>[Count];
        var tasks = new Task<int>[Count];
        for (int i = 0; i < Count; i++)
        {
            sources[i] = new TaskCompletionSource<int>();
            tasks[i] = sources[i].Task;
        }

        object result = WhenAnyNoInlining(tasks);
        sources[0].SetResult(0);
        return result;
    }

    [MethodImpl(MethodImplOptions.NoInlining)]
    public Task WhenAnyNoInlining(Task[] tasks) => Task.WhenAny(tasks);
}

Mirrors the existing TwoTaskWhenAnyPromise pattern to eliminate the Task[]
defensive-copy allocation for Count=3 and Count=4. Count>=5 unchanged.

Closes dotnet#126748
@dotnet-policy-service dotnet-policy-service Bot added the community-contribution Indicates that the PR has been added by a community member label Apr 21, 2026
@dotnet-policy-service
Copy link
Copy Markdown
Contributor

Tagging subscribers to this area: @dotnet/area-system-threading-tasks
See info in area-owners.md if you want to be subscribed.

@unsafePtr
Copy link
Copy Markdown
Author

@dotnet-policy-service agree

@jkotas
Copy link
Copy Markdown
Member

jkotas commented Apr 21, 2026

Do you have evidence that waiting for 3 or 4 tasks is common enough to make the extra code worth it?

@unsafePtr
Copy link
Copy Markdown
Author

Do you have evidence that waiting for 3 or 4 tasks is common enough to make the extra code worth it?

@jkotas went searching across widely-used .NET projects. Here's what I found:

  • dotnet/sdk (dotnet-watch) — DotNetWatcher.cs:108:

    finishedTask = await Task.WhenAny(processTask, fileSetTask, cancelledTaskSource.Task);

    On the hot path of dotnet watch — every .NET developer using file-watch runs this.

  • dotnet/runtimeBrowserRunner.cs, 4 call sites:

    // line 89, 95
    await Task.WhenAny(runTask, urlAvailable.Task, delayTask);
    // line 268, 284
    await Task.WhenAny(RunTask!, _exited.Task, Task.Delay(timeout));
  • rabbitmq/rabbitmq-stream-dotnet-clientProgram.cs:61:

    await Task.WhenAny(producerTask, consumerTask, cancelTask).ConfigureAwait(false);

    Classic producer/consumer + cancellation idiom.

  • App-vNext/PollyHedgingExecutionContext.cs#L169-L179:

    return _executingTasks.Count switch
    {
        1 => _executingTasks[0].ExecutionTaskSafe!,
        2 => Task.WhenAny(_executingTasks[0].ExecutionTaskSafe!, _executingTasks[1].ExecutionTaskSafe!),
        _ => Task.WhenAny(_executingTasks.Select(v => v.ExecutionTaskSafe!))
    };

You are right there are not many places. I've used this tool with regex.

4task specialization is included because it follows the same pattern as 3-task at near-zero incremental cost. If you'd prefer to land 3task only, happy to narrow.

Thank you, and I am sorry if I am being noisy 🙂

@jkotas
Copy link
Copy Markdown
Member

jkotas commented Apr 21, 2026

On the hot path of dotnet watch — every .NET developer using file-watch runs this.

30 nanoseconds improvement in dotnet watch loop that runs once per edit is noise. The extra code is going to add much more than 30 nanoseconds to bootstrap, so just to break even the loop would have to run many times.

@unsafePtr
Copy link
Copy Markdown
Author

@jkotas The actual motivation was reducing the ~50B per-call allocation (the internal Task[] defensive copy), not the 30ns. I understand that maintenance is a permanent cost and that's hard to justify without hot call site.

I've run into this because we had 4 tasks with WhenAny in a hot loop, but we've since changed the flow to avoid WhenAny there altogether, so the original need is gone.

Closing. Thanks for the review.

@unsafePtr unsafePtr closed this Apr 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area-System.Threading.Tasks community-contribution Indicates that the PR has been added by a community member tenet-performance Performance related issue

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Task.WhenAny - avoid allocation when more than 2 tasks are provided

2 participants