Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Enumerable.Chunk throughput regression #72811

Merged
merged 3 commits into from
Jul 27, 2022

Conversation

stephentoub
Copy link
Member

@stephentoub stephentoub commented Jul 25, 2022

We previously changed the implementation of Enumerable.Chunk to avoid significantly overallocating in the case of the chunk size being a lot larger than the actual number of elements. We instead switched to a doubling scheme ala List<T>, and I pushed for us to just use List<T> to keep things simple. However, in doing some perf measurements I noticed that for common cases Chunk is now around 20% slower in throughput than it was previously, which is a bit too much too swallow, and the code that just uses an array directly isn't all that much more complicated; it also affords the ability to avoid further overallocation when doubling the size of the storage, which should ideally be capped at the chunk size. This does so and fixes the throughput regression enough (not completely).

@stephentoub stephentoub added area-System.Linq tenet-performance Performance related issue labels Jul 25, 2022
@stephentoub stephentoub added this to the 7.0.0 milestone Jul 25, 2022
@ghost ghost assigned stephentoub Jul 25, 2022
@ghost
Copy link

ghost commented Jul 25, 2022

Tagging subscribers to this area: @dotnet/area-system-linq
See info in area-owners.md if you want to be subscribed.

Issue Details

We previously changed the implementation of Enumerable.Chunk to avoid significantly overallocating in the case of the chunk size being a lot larger than the actual number of elements. We instead switched to a doubling scheme ala List<T>, and I pushed for us to just use List<T> to keep things simple. However, in doing some perf measurements I noticed that for common cases Chunk is now around 20% slower in throughput than it was previously, which is a bit too much too swallow, and the code that just uses an array directly isn't all that much more complicated; it also affords the ability to avoid further overallocation when doubling the size of the storage, which should ideally be capped at the chunk size. This does so and fixes the throughput regression.

Author: stephentoub
Assignees: -
Labels:

area-System.Linq, tenet-performance

Milestone: 7.0.0

We previously changed the implementation of Enumerable.Chunk to avoid significantly overallocating in the case of the chunk size being a lot larger than the actual number of elements.  We instead switched to a doubling scheme ala `List<T>`, and I pushed for us to just use `List<T>` to keep things simple.  However, in doing some perf measurements I noticed that for common cases Chunk is now around 20% slower in throughput than it was previously, which is a bit too much too swallow, and the code that just uses an array directly isn't all that much more complicated; it also affords the ability to avoid further overallocation when doubling the size of the storage, which should ideally be capped at the chunk size.  This does so and fixes the throughput regression.
@stephentoub
Copy link
Member Author

private IEnumerable<int> _source = new int[1_000];

[Benchmark]
[Arguments(10)]
[Arguments(100)]
public void Chunk(int chunkSize)
{
    foreach (int[] _ in _source.Chunk(chunkSize)) { }
}
Method Toolchain chunkSize Mean Ratio Allocated Alloc Ratio
Chunk net6.0 10 5.276 us 1.00 6.34 KB 1.00
Chunk \main\corerun.exe 10 6.764 us 1.28 6.57 KB 1.04
Chunk \pr\corerun.exe 10 5.911 us 1.12 6.52 KB 1.03
Chunk net6.0 100 4.342 us 1.00 4.23 KB 1.00
Chunk \main\corerun.exe 100 5.100 us 1.16 5.41 KB 1.28
Chunk \pr\corerun.exe 100 4.906 us 1.12 5.27 KB 1.24

@stephentoub stephentoub merged commit 1f98974 into dotnet:main Jul 27, 2022
@stephentoub stephentoub deleted the chunkregression branch July 27, 2022 01:00
@ghost ghost locked as resolved and limited conversation to collaborators Aug 26, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-System.Linq tenet-performance Performance related issue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants