Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Improve LINQ perf of chained Concats #6131
The Concat operator today is very simple: it iterats through the first source yielding all items, then does the same for the second. This works great in isolation, but when chained, the cost grows as yielding each item from the Nth source results in calling through the MoveNext/Current interface methods of the previous N-1 concats. While this is the nature of LINQ operators in general, it's particular pernicious with Concat, which is often used to assembly data from many sources.
This commit introduces a special concat iterator that avoids that recursive cost. This comes at the small expense of N+1 interface calls per iteration, where N is the number of sources involved in the concatenation chain. Chains of two sources and three sources are special-cased, after which an array is allocated and used to hold all of the sources (this could be tweaked in the future to have specializations for more sources if, for example, we found that four was a very common number). Other benefits include the size of the concat iterator being a bit smaller than it was previously when generated by the compiler, and it now taking part in the IIListProvider interface, so that for example ToList operations are faster when any of the sources are ILists.
Example results on my machine:
You can take a look at JonHanna@9894d37 though it's far from ready.
Anyway, this LGTM, but you might find one or more of the ideas in mine worth considering.
I started on that path, looked at a bunch of existing use cases and what value would actually be had for doing the type checks, adding all the special paths, etc., and it didn't seem worthwhile. If it turns out to be valuable, it's just "more code" and could be added in the future.
Yeah, I think that's separate, and IMO chains of concats is much more common than chains of Unions. Again, though, it's just "more code" that could be added later.
That's a good idea. I'll do that.
Sure. There are lots of potential combinations. I simply handled the one that seemed to provide the best return on investment. I'm trying to weigh the possible gains for the most common cases with keeping the code complexity low. It's possible additional cases would be valuable in the future.
Yeah, I was just led to think of it due to the way they correspond to two types of SQL
added a commit
this pull request
Feb 17, 2016
Feb 17, 2016
6 of 7 checks passed
In the unlikely case of this many concatenations, if we produced a ConcatNIterator with int.MaxValue then state would overflow before it matched its index.