Fix Enumerable.Chunk throughput regression #72811

stephentoub · 2022-07-25T19:58:26Z

We previously changed the implementation of Enumerable.Chunk to avoid significantly overallocating in the case of the chunk size being a lot larger than the actual number of elements. We instead switched to a doubling scheme ala List<T>, and I pushed for us to just use List<T> to keep things simple. However, in doing some perf measurements I noticed that for common cases Chunk is now around 20% slower in throughput than it was previously, which is a bit too much too swallow, and the code that just uses an array directly isn't all that much more complicated; it also affords the ability to avoid further overallocation when doubling the size of the storage, which should ideally be capped at the chunk size. This does so and fixes the throughput regression enough (not completely).

ghost · 2022-07-25T19:58:51Z

Tagging subscribers to this area: @dotnet/area-system-linq
See info in area-owners.md if you want to be subscribed.

Issue Details

We previously changed the implementation of Enumerable.Chunk to avoid significantly overallocating in the case of the chunk size being a lot larger than the actual number of elements. We instead switched to a doubling scheme ala List<T>, and I pushed for us to just use List<T> to keep things simple. However, in doing some perf measurements I noticed that for common cases Chunk is now around 20% slower in throughput than it was previously, which is a bit too much too swallow, and the code that just uses an array directly isn't all that much more complicated; it also affords the ability to avoid further overallocation when doubling the size of the storage, which should ideally be capped at the chunk size. This does so and fixes the throughput regression.

Author:	stephentoub
Assignees:	-
Labels:	`area-System.Linq`, `tenet-performance`
Milestone:	7.0.0

We previously changed the implementation of Enumerable.Chunk to avoid significantly overallocating in the case of the chunk size being a lot larger than the actual number of elements. We instead switched to a doubling scheme ala `List<T>`, and I pushed for us to just use `List<T>` to keep things simple. However, in doing some perf measurements I noticed that for common cases Chunk is now around 20% slower in throughput than it was previously, which is a bit too much too swallow, and the code that just uses an array directly isn't all that much more complicated; it also affords the ability to avoid further overallocation when doubling the size of the storage, which should ideally be capped at the chunk size. This does so and fixes the throughput regression.

stephentoub · 2022-07-25T21:48:18Z

private IEnumerable<int> _source = new int[1_000];

[Benchmark]
[Arguments(10)]
[Arguments(100)]
public void Chunk(int chunkSize)
{
    foreach (int[] _ in _source.Chunk(chunkSize)) { }
}

Method	Toolchain	chunkSize	Mean	Ratio	Allocated	Alloc Ratio
Chunk	net6.0	10	5.276 us	1.00	6.34 KB	1.00
Chunk	\main\corerun.exe	10	6.764 us	1.28	6.57 KB	1.04
Chunk	\pr\corerun.exe	10	5.911 us	1.12	6.52 KB	1.03

Chunk	net6.0	100	4.342 us	1.00	4.23 KB	1.00
Chunk	\main\corerun.exe	100	5.100 us	1.16	5.41 KB	1.28
Chunk	\pr\corerun.exe	100	4.906 us	1.12	5.27 KB	1.24

src/libraries/System.Linq/src/System/Linq/Chunk.cs

stephentoub added area-System.Linq tenet-performance Performance related issue labels Jul 25, 2022

stephentoub added this to the 7.0.0 milestone Jul 25, 2022

stephentoub requested a review from eiriktsarpalis July 25, 2022 19:58

ghost assigned stephentoub Jul 25, 2022

stephentoub force-pushed the chunkregression branch from 3685687 to 492834f Compare July 25, 2022 21:46

runfoapp bot mentioned this pull request Jul 26, 2022

System.Net.* cancellation tests fail in CI #72818

Closed

eiriktsarpalis reviewed Jul 26, 2022

View reviewed changes

src/libraries/System.Linq/src/System/Linq/Chunk.cs Outdated Show resolved Hide resolved

eiriktsarpalis reviewed Jul 26, 2022

View reviewed changes

src/libraries/System.Linq/src/System/Linq/Chunk.cs Show resolved Hide resolved

eiriktsarpalis reviewed Jul 26, 2022

View reviewed changes

src/libraries/System.Linq/src/System/Linq/Chunk.cs Show resolved Hide resolved

eiriktsarpalis approved these changes Jul 26, 2022

View reviewed changes

stephentoub added 2 commits July 26, 2022 16:49

Merge branch 'main' into chunkregression

fb17df6

Address PR feedback

54be552

stephentoub merged commit 1f98974 into dotnet:main Jul 27, 2022

stephentoub deleted the chunkregression branch July 27, 2022 01:00

runfoapp bot mentioned this pull request Jul 27, 2022

system.net.http.functional.tests timing out on multiple platforms #72949

Closed

Suchiman mentioned this pull request Jul 31, 2022

Optimize Enumerable.Chunk #73134

Merged

ghost locked as resolved and limited conversation to collaborators Aug 26, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix Enumerable.Chunk throughput regression #72811

Fix Enumerable.Chunk throughput regression #72811

stephentoub commented Jul 25, 2022 •

edited

ghost commented Jul 25, 2022

stephentoub commented Jul 25, 2022

Fix Enumerable.Chunk throughput regression #72811

Fix Enumerable.Chunk throughput regression #72811

Conversation

stephentoub commented Jul 25, 2022 • edited

ghost commented Jul 25, 2022

stephentoub commented Jul 25, 2022

stephentoub commented Jul 25, 2022 •

edited