Skip to content

Conversation

Henr1k80
Copy link

@Henr1k80 Henr1k80 commented Oct 2, 2025

I found that there were room for improvement in Array.FindAll

Now a small amount of matches are stack allocated and the List is only optionally allocated.

It is always faster with less allocations.

The stack allocated size can be made bigger, but it is fixed size, so it will have a price for no-match schenarios.

InlineArray16<T> is only in .NET 10, but if this should be backported to .NET 8, we can create our local InlineMatchArrayX

BenchmarkDotNet v0.15.4, Windows 11 (10.0.26200.6584)
Intel Core i7-10750H CPU 2.60GHz (Max: 2.59GHz), 1 CPU, 12 logical and 6 physical cores
.NET SDK 10.0.100-rc.1.25451.107
[Host]               : .NET 10.0.0 (10.0.0-rc.1.25451.107, 10.0.25.45207), X64 RyuJIT x86-64-v3
.NET 10.0            : .NET 10.0.0 (10.0.0-rc.1.25451.107, 10.0.25.45207), X64 RyuJIT x86-64-v3
Method Size Mean Error StdDev Ratio RatioSD Gen0 Allocated Alloc Ratio
NewFindAllMatchAlways 1 10.050 ns 6.3250 ns 0.3467 ns 0.38 0.02 0.0051 32 B 0.31
OldFindAllMatchAlways 1 26.206 ns 24.8534 ns 1.3623 ns 1.00 0.06 0.0166 104 B 1.00
NewFindAllMatchAlways 16 36.170 ns 9.8894 ns 0.5421 ns 0.48 0.01 0.0140 88 B 0.29
OldFindAllMatchAlways 16 75.915 ns 18.0667 ns 0.9903 ns 1.00 0.02 0.0484 304 B 1.00
NewFindAllMatchAlways 17 52.152 ns 9.3427 ns 0.5121 ns 0.52 0.02 0.0267 168 B 0.36
OldFindAllMatchAlways 17 101.104 ns 61.4162 ns 3.3664 ns 1.00 0.04 0.0739 464 B 1.00
NewFindAllMatchHalf 1 10.394 ns 3.0844 ns 0.1691 ns 0.43 0.03 0.0051 32 B 0.31
OldFindAllMatchHalf 1 24.386 ns 36.3012 ns 1.9898 ns 1.00 0.10 0.0166 104 B 1.00
NewFindAllMatchHalf 16 28.195 ns 13.3802 ns 0.7334 ns 0.55 0.02 0.0089 56 B 0.30
OldFindAllMatchHalf 16 51.558 ns 36.7246 ns 2.0130 ns 1.00 0.05 0.0293 184 B 1.00
NewFindAllMatchHalf 17 30.810 ns 6.4918 ns 0.3558 ns 0.40 0.03 0.0102 64 B 0.23
OldFindAllMatchHalf 17 77.267 ns 112.9780 ns 6.1927 ns 1.00 0.10 0.0446 280 B 1.00
NewFindAllMatchNever 1 2.626 ns 1.5578 ns 0.0854 ns 0.52 0.04 - - 0.00
OldFindAllMatchNever 1 5.021 ns 6.4986 ns 0.3562 ns 1.00 0.09 0.0051 32 B 1.00
NewFindAllMatchNever 16 8.160 ns 0.9575 ns 0.0525 ns 0.86 0.02 - - 0.00
OldFindAllMatchNever 16 9.492 ns 3.7944 ns 0.2080 ns 1.00 0.03 0.0051 32 B 1.00
NewFindAllMatchNever 17 8.574 ns 8.8062 ns 0.4827 ns 0.64 0.10 - - 0.00
OldFindAllMatchNever 17 13.646 ns 47.8793 ns 2.6244 ns 1.02 0.23 0.0051 32 B 1.00

@github-actions github-actions bot added the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label Oct 2, 2025
@dotnet-policy-service dotnet-policy-service bot added the community-contribution Indicates that the PR has been added by a community member label Oct 2, 2025
@huoyaoyuan
Copy link
Member

Thanks you for your contribution. The code is not performing well with best practices. Instead, just replace the List with ValueListBuilder and add a Dispose call to the builder will serve you all the optimizations.

@huoyaoyuan huoyaoyuan added area-System.Runtime and removed needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners labels Oct 3, 2025
Copy link
Contributor

Tagging subscribers to this area: @dotnet/area-system-runtime
See info in area-owners.md if you want to be subscribed.

@huoyaoyuan
Copy link
Member

huoyaoyuan commented Oct 3, 2025

There may be more unintentional affect after rethinking. ArrayPool<T> is quite heavy for each T, especially when the pool is not widely shared. That's also why the BCL only uses ArrayPool over limited set of types (byte, char and other integers). Even ArrayPool<object> is never used in BCL.

Initializing the array pool for each T has not acceptable overhead to whole program. Microbenchmark improvement doesn't result in benefit to whole program. This PR could not be accepted because of this.

@Henr1k80
Copy link
Author

Henr1k80 commented Oct 3, 2025

There may be more unintentional affect after rethinking. ArrayPool<T> is quite heavy for each T, especially when the pool is not widely shared. That's also why the BCL only uses ArrayPool over limited set of types (byte, char and other integers). Even ArrayPool<object> is never used in BCL.

Initializing the array pool for each T has not acceptable overhead to whole program. Microbenchmark improvement doesn't result in benefit to whole program. This PR could not be accepted because of this.

SegmentedArrayBuilder uses ArrayPool<T> and SegmentedArrayBuilder is used within LINQ

@Henr1k80 Henr1k80 marked this pull request as ready for review October 3, 2025 14:14
}

List<T> list = new List<T>();
List<T>? heapMatches = null; // only allocate if needed
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is possible to extract the verbose code into a struct collection. I mean code like this:

    public static T[] FindAll<T>(T[] array, Predicate<T> match)
    {
        if (array == null)
        {
            ThrowHelper.ThrowArgumentNullException(ExceptionArgument.array);
        }

        if (match == null)
        {
            ThrowHelper.ThrowArgumentNullException(ExceptionArgument.match);
        }

        OptimizedArrayBuilder<T> arrayBuilder = new OptimizedArrayBuilder<T>();
        for (int i = 0; i < array.Length; i++)
        {
            if (match(array[i]))
            {
                arrayBuilder.Add(array[i]);
            }
        }

        if (arrayBuilder.Count == 0)
        {
            return [];
        }

        return arrayBuilder.Build();
    }

    private ref struct OptimizedArrayBuilder<T>
    {
        const int InlineArrayLength = 16;
        private List<T>? heapMatches;
        private InlineArray16<T> stackAllocatedMatches;
        private int stackAllocatedMatchesFound;

        public SmallArrayBuilder()
        {
            this.heapMatches = null; // only allocate if needed
            this.stackAllocatedMatches = default;
            this.stackAllocatedMatchesFound = 0;
        }

        public readonly int Count => this.stackAllocatedMatchesFound + (heapMatches?.Count ?? 0);

        public void Add(T item)
        {
            if (stackAllocatedMatchesFound < InlineArrayLength)
            {
                stackAllocatedMatches[stackAllocatedMatchesFound++] = item;
            }
            else
            {
                // Revert to the old logic, allocating and growing a List
                heapMatches ??= [];
                heapMatches.Add(item);
            }
        }

        public readonly T[] Build()
        {
            T[] result = new T[this.Count];

            int index = 0;
            foreach (T stackAllocatedMatch in stackAllocatedMatches)
            {
                result[index++] = stackAllocatedMatch;
                if (index >= stackAllocatedMatchesFound)
                {
                    break;
                }
            }

            heapMatches?.CopyTo(result.AsSpan(start: InlineArrayLength));

            return result;
        }
    }

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I was also thinking if it could be used instead of a regular List, in some places.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is ArrayBuilder<T> for non-pooling usage, and ValueListBuilder<T> for pooling usage. Array.FindAll isn't a common method and we aren't interested to introduce a new construct for it. Instead, it should use one of existing builder, optimizations should be done in the builder.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-System.Runtime community-contribution Indicates that the PR has been added by a community member
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants