`Enumerable.Chunk()` could reduce memory allocation without reducing its performance

### Description

[ In all the following, I'm ignoring arrays, because they're already special-cased ]

`Enumerable.Chunk()` uses a "list-like" exponential-growth approach to arriving at the size of buffer it needs to hold a chunk.  This optimises the case where the chunk size is much larger than the number of entries in the collection.   But in cases where there _is_ at least one whole chunk, then there will have been been a series of "wasted" reallocations as the first chunk buffer is repeatedly found to be too small and resized.

For example, a chunk size of 512 takes 8 allocations/resizes to arrive at its full size: 4,8,16,32,64,128,256,512, allocating almost twice as much space for the first chunk as it really needed, and doing 8x more optimisations than may have been necessary.  

Whether or not this matters, or can be improved, depends:

* For collections with a small number of chunks, the 'wasted' allocations of the first chunk can be as numerous, or more numerous, than the genuine ones.
* In collections with many chunks, we don't care because the 'waste' is a small proportion of the genuine allocations
* In uncountable collections, we don't know the size up front so we're stuck with discovering it incrementally
* But, in collections which **are** countable, we could trivially avoid the waste by knowing how big the first chunk needs to be

I have read #67132 and it's clear to me that nobody expects a great deal from `Chunk()` in the way of performance - it's allocation-heavy and that's that.   But that issue does leave the door open in https://github.com/dotnet/runtime/issues/67132#issuecomment-1085924041 for "little tweaks" to make it better, and in the spirit of small tweakery, I propose changing the setup of the first chunk buffer in `EnumerableChunkIterator` from 

```
     int arraySize = Math.Min(size, 4);
```
to	
```
   int arraySize = Math.Min(size, source.TryGetNonEnumeratedCount(out int sourceCount) ? sourceCount : 4);
```
		
I think from the history that there was a time when calling `TryGetNonEnumeratedCount` might have seemed too expensive, but there is now another type-check looking for the array special case, so it seems that people are now OK with doing low-cost O(1) up-front checks if they help performance.

I tend to assume that `TryGetNonEnumeratedCount` is not expensive (unless a particular `IEnumerable<T>` misbehaves), and have not been able to measure the deleterious effect of adding it to any vaguely realistic `Chunk()` benchmark - I simply can't get it out of the Benchmark.NET noise-floor - see Data below.

### Configuration

BenchmarkDotNet v0.14.0, Windows 11 (10.0.26100.3775)
Intel Core Ultra 9 185H, 1 CPU, 22 logical and 16 physical cores
.NET SDK 9.0.203
  [Host]     : .NET 9.0.4 (9.0.425.16305), X64 RyuJIT AVX2
  DefaultJob : .NET 9.0.4 (9.0.425.16305), X64 RyuJIT AVX2

### Regression?

No

### Data

| Method             | CollectionSize | ChunkSize | Mean         | Gen0   | Allocated |
|------------------- |--------------- |---------- |-------------:|-------:|----------:|
| Enumerable         | 10             | 5         |     99.53 ns | 0.0039 |     208 B |
| FastEnumerable     | 10             | 5         |     99.49 ns | 0.0039 |     208 B |
| ListCollection     | 10             | 5         |     97.82 ns | 0.0039 |     208 B |
| FastListCollection | 10             | 5         |     93.75 ns | 0.0033 |     176 B |
| Enumerable         | 10             | 100       |    112.94 ns | 0.0048 |     256 B |
| FastEnumerable     | 10             | 100       |    105.72 ns | 0.0049 |     256 B |
| ListCollection     | 10             | 100       |    108.35 ns | 0.0049 |     256 B |
| FastListCollection | 10             | 100       |     79.46 ns | 0.0029 |     152 B |
| Enumerable         | 10             | 512       |    110.42 ns | 0.0049 |     256 B |
| FastEnumerable     | 10             | 512       |    104.97 ns | 0.0049 |     256 B |
| ListCollection     | 10             | 512       |    105.93 ns | 0.0049 |     256 B |
| FastListCollection | 10             | 512       |     76.74 ns | 0.0029 |     152 B |
| Enumerable         | 1500           | 5         |  6,419.44 ns | 0.1831 |    9744 B |
| FastEnumerable     | 1500           | 5         |  6,486.67 ns | 0.1831 |    9744 B |
| ListCollection     | 1500           | 5         |  6,955.54 ns | 0.1831 |    9744 B |
| FastListCollection | 1500           | 5         |  6,829.16 ns | 0.1831 |    9712 B |
| Enumerable         | 1500           | 100       |  3,462.08 ns | 0.0420 |    2280 B |
| FastEnumerable     | 1500           | 100       |  3,586.67 ns | 0.0381 |    2280 B |
| ListCollection     | 1500           | 100       |  3,806.44 ns | 0.0420 |    2280 B |
| FastListCollection | 1500           | 100       |  3,726.25 ns | 0.0381 |    2032 B |
| Enumerable         | 1500           | 512       |  3,593.68 ns | 0.0534 |    2904 B |
| FastEnumerable     | 1500           | 512       |  3,350.55 ns | 0.0534 |    2904 B |
| ListCollection     | 1500           | 512       |  3,888.40 ns | 0.0534 |    2904 B |
| FastListCollection | 1500           | 512       |  3,560.02 ns | 0.0420 |    2224 B |
| Enumerable         | 15000          | 5         | 65,909.87 ns | 1.8311 |   96144 B |
| FastEnumerable     | 15000          | 5         | 65,062.50 ns | 1.8311 |   96144 B |
| ListCollection     | 15000          | 5         | 65,179.16 ns | 1.8311 |   96144 B |
| FastListCollection | 15000          | 5         | 62,964.80 ns | 1.8311 |   96112 B |
| Enumerable         | 15000          | 100       | 32,222.36 ns | 0.3662 |   19560 B |
| FastEnumerable     | 15000          | 100       | 32,730.25 ns | 0.3662 |   19560 B |
| ListCollection     | 15000          | 100       | 32,583.27 ns | 0.3662 |   19560 B |
| FastListCollection | 15000          | 100       | 33,289.45 ns | 0.9155 |   19312 B |
| Enumerable         | 15000          | 512       | 31,407.68 ns | 0.3052 |   17048 B |
| FastEnumerable     | 15000          | 512       | 30,778.50 ns | 0.3052 |   17048 B |
| ListCollection     | 15000          | 512       | 31,196.14 ns | 0.3052 |   17048 B |
| FastListCollection | 15000          | 512       | 30,826.80 ns | 0.3052 |   16368 B |

`Enumerable` and `ListCollection` are running the current framework source code, but compiled in my test project
`FastEnumerable` and `FastListCollection` are a copy the same code but with the modification suggested above.
`Enumerable` is directly using an iterator method that yields bytes.
`ListCollection` is the same iterator that's been pushed through `ToList()` before the benchmark

### Analysis

Everyone has their own idea about how `Chunk()` is actually used, and I don't know how many real-world scenarios this change actually improves, but it seems a low-cost way of avoiding a few allocations - the CPU difference is apparently negligible but the number of allocations is definitely reduced for countable collections (primarily `IList<>`, I would have guessed).

Of course, one could do another fast-path for `IList<>`, but that would definitely add a lot more additional complexity and risk of regression than this proposal.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

`Enumerable.Chunk()` could reduce memory allocation without reducing its performance #115079

Description

Configuration

Regression?

Data

Analysis

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Method	CollectionSize	ChunkSize	Mean	Gen0	Allocated
Enumerable	10	5	99.53 ns	0.0039	208 B
FastEnumerable	10	5	99.49 ns	0.0039	208 B
ListCollection	10	5	97.82 ns	0.0039	208 B
FastListCollection	10	5	93.75 ns	0.0033	176 B
Enumerable	10	100	112.94 ns	0.0048	256 B
FastEnumerable	10	100	105.72 ns	0.0049	256 B
ListCollection	10	100	108.35 ns	0.0049	256 B
FastListCollection	10	100	79.46 ns	0.0029	152 B
Enumerable	10	512	110.42 ns	0.0049	256 B
FastEnumerable	10	512	104.97 ns	0.0049	256 B
ListCollection	10	512	105.93 ns	0.0049	256 B
FastListCollection	10	512	76.74 ns	0.0029	152 B
Enumerable	1500	5	6,419.44 ns	0.1831	9744 B
FastEnumerable	1500	5	6,486.67 ns	0.1831	9744 B
ListCollection	1500	5	6,955.54 ns	0.1831	9744 B
FastListCollection	1500	5	6,829.16 ns	0.1831	9712 B
Enumerable	1500	100	3,462.08 ns	0.0420	2280 B
FastEnumerable	1500	100	3,586.67 ns	0.0381	2280 B
ListCollection	1500	100	3,806.44 ns	0.0420	2280 B
FastListCollection	1500	100	3,726.25 ns	0.0381	2032 B
Enumerable	1500	512	3,593.68 ns	0.0534	2904 B
FastEnumerable	1500	512	3,350.55 ns	0.0534	2904 B
ListCollection	1500	512	3,888.40 ns	0.0534	2904 B
FastListCollection	1500	512	3,560.02 ns	0.0420	2224 B
Enumerable	15000	5	65,909.87 ns	1.8311	96144 B
FastEnumerable	15000	5	65,062.50 ns	1.8311	96144 B
ListCollection	15000	5	65,179.16 ns	1.8311	96144 B
FastListCollection	15000	5	62,964.80 ns	1.8311	96112 B
Enumerable	15000	100	32,222.36 ns	0.3662	19560 B
FastEnumerable	15000	100	32,730.25 ns	0.3662	19560 B
ListCollection	15000	100	32,583.27 ns	0.3662	19560 B
FastListCollection	15000	100	33,289.45 ns	0.9155	19312 B
Enumerable	15000	512	31,407.68 ns	0.3052	17048 B
FastEnumerable	15000	512	30,778.50 ns	0.3052	17048 B
ListCollection	15000	512	31,196.14 ns	0.3052	17048 B
FastListCollection	15000	512	30,826.80 ns	0.3052	16368 B

Enumerable.Chunk() could reduce memory allocation without reducing its performance #115079

Description

Description

Configuration

Regression?

Data

Analysis

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

`Enumerable.Chunk()` could reduce memory allocation without reducing its performance #115079