Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve LINQ perf of chained Concats #6131

Merged
merged 3 commits into from Feb 17, 2016

Conversation

@stephentoub
Copy link
Member

commented Feb 16, 2016

The Concat operator today is very simple: it iterats through the first source yielding all items, then does the same for the second. This works great in isolation, but when chained, the cost grows as yielding each item from the Nth source results in calling through the MoveNext/Current interface methods of the previous N-1 concats. While this is the nature of LINQ operators in general, it's particular pernicious with Concat, which is often used to assembly data from many sources.

This commit introduces a special concat iterator that avoids that recursive cost. This comes at the small expense of N+1 interface calls per iteration, where N is the number of sources involved in the concatenation chain. Chains of two sources and three sources are special-cased, after which an array is allocated and used to hold all of the sources (this could be tweaked in the future to have specializations for more sources if, for example, we found that four was a very common number). Other benefits include the size of the concat iterator being a bit smaller than it was previously when generated by the compiler, and it now taking part in the IIListProvider interface, so that for example ToList operations are faster when any of the sources are ILists.

Example results on my machine:

  • Enumerating a Concat of 2 Range(0, 10) enumerables: ~15% faster
  • Enumerating a Concat of 3 Range(0, 10) enumerables: ~30% faster
  • Enumerating a Concat of 10 Range(0, 100) enumerables: ~4x faster
  • Enumerating a Concat of 100 Range(0, 1) enumerables: ~2x faster

cc: @VSadov, @JonHanna
Related to #2075

Array.Copy(n._sources, 0, sources, 0, n._sources.Length);
sources[n._sources.Length] = second;
return new ConcatNIterator<TSource>(sources);
}

This comment has been minimized.

Copy link
@stephentoub

stephentoub Feb 16, 2016

Author Member

I realized I can likely avoid these array allocations by chaining the concat iterators and changing how GetEnumerable is implemented. I'll try it out tomorrow. Not sure it'll better or not.

@svick

This comment has been minimized.

Copy link
Contributor

commented Feb 16, 2016

How much would it hurt the common "append" case (e.g. a.Concat(b).Concat(c).Concat(d)), if the less common "prepend" case (e.g. a.Concat(b.Concat(c.Concat(d)))) was optimized too?

@JonHanna

This comment has been minimized.

Copy link
Collaborator

commented Feb 16, 2016

I was taking a look at the same thing, though planning to wait until after #6127 and particularly #6129 were in.

You can take a look at JonHanna@9894d37 though it's far from ready.

Differences:

  1. I also (though it's not completed here) optimise for IList<T> sources. That was in fact my original goal, but I made the same observation as you in the course of that, because it caused me to think about the costs of Concat more generally. (In particular, XUnit itself appears to do some concats of concats. It also behaves very strangely if your Concat has a bug in it and ends up telling you all your tests passed because it missed most of them 😉). There's no reason why that couldn't be added on to this PR at a later date.
  2. I also tackle Union. Likewise, there's no reason why that couldn't still be done if we take this PR.
  3. I use an abstract-method approach to finding the appropriate ConcatIterator to return. I don't know whether that would prove to be better or worse than that used here. (Cost of virtual lookup vs type-checking would be the main difference, I imagine).
  4. I handle x.Concat(y.Concat(z)) as well, though again there's no reason this couldn't be added here.
  5. I handle x.Concat(y).Concat(a.Concat(b)), though I always skip the explicitly-numbered classes in this case. That would be easy to bring to this.
  6. I've a very different approach to MoveNext(). It's hard to weigh the two just from looking at the code.
  7. I don't set a limit on how large an array can be created. Good idea!
  8. The most important difference; my approach isn't finished, in which regard this approach clearly wins 😄

Anyway, this LGTM, but you might find one or more of the ideas in mine worth considering.

IEnumerable<TSource> source = GetEnumerable(i);
if (source == null) break;

ICollection<TSource> c = source as ICollection<TSource>;

This comment has been minimized.

Copy link
@JonHanna

JonHanna Feb 16, 2016

Collaborator

You could also test for source as IIListProvider and call GetCount(onlyIfCheap) on it.

This comment has been minimized.

Copy link
@JonHanna

JonHanna Feb 16, 2016

Collaborator

Indeed the approach used in OrderedEnumerable<T> would work here, which covers that and also the non-generic ICollection by defering to Count() in either case but predicting whether Count() itself will be constant-time or not.

This comment has been minimized.

Copy link
@stephentoub

stephentoub Feb 16, 2016

Author Member

Honestly, I was actually hesitant to even take it this far... I'm a bit concerned that having to look at all of the constituent enumerables, do all of the casting and type checks, etc., could actually hurt cases where onlyIfCheap and it needs to be really, really cheap. I think I may just scale this back to always return -1 if onlyIfCheap.

This comment has been minimized.

Copy link
@JonHanna

JonHanna Feb 16, 2016

Collaborator

I didn't consider this to be a strong case for IIListProvider at all, for similar reasons (but did if the sources are all lists, which was my main thing about my stab at it). Of course since my plan was to have separate implementations when all inputs are lists, I'd already set things so to make cheap counts less likely, but having some of the most common cases where they are handled elsewhere.

This comment has been minimized.

Copy link
@JonHanna

JonHanna Feb 16, 2016

Collaborator

If you do scale it back that way, the check could be removed here entirely and just depend on Count() doing the right thing. Another possibility is to be more thorough in the "small" classes, and lazy in the case with more than 3 items.

This comment has been minimized.

Copy link
@stephentoub

stephentoub Feb 16, 2016

Author Member

the check could be removed here entirely and just depend on Count() doing the right thing

Yup, done.


public List<TSource> ToList()
{
var list = new List<TSource>();

This comment has been minimized.

Copy link
@JonHanna

JonHanna Feb 16, 2016

Collaborator

If you call GetCount(true) and the result >= 0 then the list can be preallocated.

This comment has been minimized.

Copy link
@stephentoub

stephentoub Feb 16, 2016

Author Member

With my suggested response above, GetCount(true) would always return -1.


public TSource[] ToArray()
{
return EnumerableHelpers.ToArray(this);

This comment has been minimized.

Copy link
@JonHanna

JonHanna Feb 16, 2016

Collaborator

Likewise, GetCount(true) might let you preallocate, with this as the fall-back when the result is -1.

This comment has been minimized.

Copy link
@stephentoub

stephentoub Feb 16, 2016

Author Member

Ditto.

@stephentoub

This comment has been minimized.

Copy link
Member Author

commented Feb 16, 2016

I also (though it's not completed here) optimise for IList sources.

I started on that path, looked at a bunch of existing use cases and what value would actually be had for doing the type checks, adding all the special paths, etc., and it didn't seem worthwhile. If it turns out to be valuable, it's just "more code" and could be added in the future.

I also tackle Union. Likewise, there's no reason why that couldn't still be done if we take this PR.

Yeah, I think that's separate, and IMO chains of concats is much more common than chains of Unions. Again, though, it's just "more code" that could be added later.

I use an abstract-method approach to finding the appropriate ConcatIterator to return.

That's a good idea. I'll do that.

I handle x.Concat(y.Concat(z)) as well, though again there's no reason this couldn't be added here.

Sure. There are lots of potential combinations. I simply handled the one that seemed to provide the best return on investment. I'm trying to weigh the possible gains for the most common cases with keeping the code complexity low. It's possible additional cases would be valuable in the future.

@JonHanna

This comment has been minimized.

Copy link
Collaborator

commented Feb 16, 2016

I was thinking I'd probably keep the prepend as a reasonably likely case that needs just one more check, but drop the check within append that catches a concatenation of concatenations as more trouble than its worth.

@stephentoub

This comment has been minimized.

Copy link
Member Author

commented Feb 16, 2016

Thanks for the review, @JonHanna. I updated it to avoid the arrays entirely and to address your feedback, plus added a few more tests.

@stephentoub stephentoub force-pushed the stephentoub:concat_perf branch from 1951db6 to ed635e1 Feb 16, 2016
@JonHanna

This comment has been minimized.

Copy link
Collaborator

commented Feb 16, 2016

Yeah, I think that's separate, and IMO chains of concats is much more common than chains of Unions.

Yeah, I was just led to think of it due to the way they correspond to two types of SQL UNION. That said, since this makes most of the rest of that experiment obsolete, I'll look at adding that part to #6129, though probably cut-down to not care about prepends (chains of unions going backwards are going to be rarer still).

return new Concat2Iterator<TSource>(_first, _second);
}

internal override ConcatIterator<TSource> Append(IEnumerable<TSource> next)

This comment has been minimized.

Copy link
@JonHanna

JonHanna Feb 16, 2016

Collaborator

Concat is perhaps a better name for what is a specialised Concat rather than a specialised Append (though mea culpa on having also used "Append").

This comment has been minimized.

Copy link
@stephentoub

stephentoub Feb 16, 2016

Author Member

I didn't look at your changes, so we came up with "Append" independently... that probably means something ;) Even so, I've changed it to be Concat.

This comment has been minimized.

Copy link
@JonHanna

JonHanna Feb 16, 2016

Collaborator

That it would be a perfectly good word if there wasn't an Append in linq, and we aren't used to there now being an Append in linq, probably.

}
}

private sealed class Concat3Iterator<TSource> : ConcatIterator<TSource>

This comment has been minimized.

Copy link
@JonHanna

JonHanna Feb 16, 2016

Collaborator

I know there's no perfect answer here, but 4 seems to be popular as the "magic number" for number-of-item-based subclasses and overloads. Concat4Iterator? This comment though is made in full awareness of how cargo-cult it is to say "well, everyone else seems to do it" without actual analysis.

This comment has been minimized.

Copy link
@stephentoub

stephentoub Feb 16, 2016

Author Member

I could. Though I actually considered deleting the 3 case even. Once I switched the N version to using a chain rather than an array, there's no longer a "steep cliff".

@stephentoub stephentoub force-pushed the stephentoub:concat_perf branch from ed635e1 to 450f29c Feb 16, 2016
internal override IEnumerable<TSource> GetEnumerable(int index)
{
return
index < _nextIndex ? _previousConcat.GetEnumerable(index) :

This comment has been minimized.

Copy link
@VSadov

VSadov Feb 16, 2016

Member

Perhaps this can be done without recursion? Iterating through prev chain could be cheaper.

This comment has been minimized.

Copy link
@stephentoub

stephentoub Feb 16, 2016

Author Member

I'm not against that, but this was by far the simplest mechanism I could come up with, and the cost here should in general be very minimal. Since we'd need to process from the oldest to the newest (which is the opposite direction of the chain), and since we can't rewrite the chain, how would you recommend doing this without manually building up a stack (which has its own costs)?

(The recursion is in effect no different than what's already being done today, just on each call to MoveNext and Current rather than once per enumerable here.)

This comment has been minimized.

Copy link
@VSadov

VSadov Feb 16, 2016

Member

This seems to be O(n^2) on the number of concatenated fragments, so for long chains of short fragments this can become a dominant factor.
Would it make sense to memoize the chain into a List if we know the chain is sufficiently long? 16 would be my guess at "long enough" :-)
That would obviously make sense only at the top-level.

This comment has been minimized.

Copy link
@stephentoub

stephentoub Feb 16, 2016

Author Member

This seems to be O(n^2) on the number of concatenated fragments

My point is, today it's O(n^2) on the number of items in the enumerables (for each of a MoveNext and a Current call per item). This change makes it O(n^2) on the number of enumerables (for a GetEnumerable call per enumerable). So, yes, for long chains of very short fragments, it could approach within a constant multiple of what it is today, though still much less.

Would it make sense to memoize the chain into a List if we know the chain is sufficiently long?

That's what I initially had, actually, where ConcatNIterator stored an array of the enumerables rather than a link back to its previous one, but it requires allocating such an array/list, which is why I moved away from it. Are you asking that I add back such a thing for use with long chains, e.g. use Concat2Iterator for chains of 2, Concat3Iterator for chains of 3, ConcatLinkedIterator for chains of 4-16 (what's currently in this PR called ConcatNIterator), and a new ConcatArrayIterator for chains longer than 16? Or are you suggesting that when enumeration starts, build up an array of the enumerables once and then iterate through that?

I'm open to doing things like that. I just want to highlight that what's here in the PR is strictly better than what's currently checked in, at least in this regard.

This comment has been minimized.

Copy link
@VSadov

VSadov Feb 16, 2016

Member

And we could null-out _previousConcat if memoizing

This comment has been minimized.

Copy link
@stephentoub

stephentoub Feb 16, 2016

Author Member

Maybe I misunderstood what you were trying to accomplish. This is still O(n^2) in the number of enumerables, you've just traded iteration for function calls... is that all you were going for? I thought you wanted an iteration mechanism to make it O(n).

This comment has been minimized.

Copy link
@VSadov

VSadov Feb 16, 2016

Member

yes, this one just to avoid recursion. considering n^2 it could be a noticeable change

This comment has been minimized.

Copy link
@stephentoub

stephentoub Feb 16, 2016

Author Member

Ok, sorry, I completely misunderstood what you were going for.

This comment has been minimized.

Copy link
@VSadov

VSadov Feb 16, 2016

Member

memoization suggestion is to avoid n^2 for big n, but that is indeed allocation vs. cycles trade and as such, I agree, not necessarily a win.

This comment has been minimized.

Copy link
@stephentoub

stephentoub Feb 16, 2016

Author Member

Yeah, I agree for the purpose of avoiding the deeply recursive call chain, this makes sense. I was misunderstanding what you were trying to achieve with it. I'll fix it up.

stephentoub added 3 commits Feb 16, 2016
The Concat operator today is very simple: it iterats through the first source yielding all items, then does the same for the second.  This works great in isolation, but when chained, the cost grows as yielding each item from the Nth source results in calling through the MoveNext/Current interface methods of the previous N-1 concats.  While this is the nature of LINQ operators in general, it's particular pernicious with Concat, which is often used to assembly data from many sources.

This commit introduces a special concat iterator that avoids that recursive cost.  This comes at the small expense of N+1 interface calls per iteration, where N is the number of sources involved in the concatenation chain.  Chains of two sources and three sources are special-cased, after which an array is allocated and used to hold all of the sources (this could be tweaked in the future to have specializations for more sources if, for example, we found that four was a very common number).  Other benefits include the size of the concat iterator being a bit smaller than it was previously when generated by the compiler, and it now taking part in the IIListProvider interface, so that for example ToList operations are faster when any of the sources are ILists.

Example results on my machine:
- Enumerating a Concat of 2 Range(0, 10) enumerables: ~15% faster
- Enumerating a Concat of 3 Range(0, 10) enumerables: ~30% faster
- Enumerating a Concat of 10 Range(0, 100) enumerables: ~4x faster
- Enumerating a Concat of 100 Range(0, 1) enumerables: ~2x faster
And add a few more tests.
@stephentoub stephentoub force-pushed the stephentoub:concat_perf branch from 450f29c to 7f573ef Feb 16, 2016
stephentoub added a commit that referenced this pull request Feb 17, 2016
Improve LINQ perf of chained Concats
@stephentoub stephentoub merged commit 5790919 into dotnet:master Feb 17, 2016
6 of 7 checks passed
6 of 7 checks passed
Innerloop Ubuntu Release Build and Test Build finished. No test results found.
Details
Innerloop CentOS7.1 Debug Build and Test Build finished. No test results found.
Details
Innerloop OSX Debug Build and Test Build finished. No test results found.
Details
Innerloop OSX Release Build and Test Build finished. No test results found.
Details
Innerloop Ubuntu Debug Build and Test Build finished. No test results found.
Details
Innerloop Windows_NT Debug Build and Test Build finished. 130393 tests run, 34 skipped, 0 failed.
Details
Innerloop Windows_NT Release Build and Test Build finished. 130397 tests run, 34 skipped, 0 failed.
Details
@stephentoub stephentoub deleted the stephentoub:concat_perf branch Feb 17, 2016
@karelz karelz modified the milestone: 1.0.0-rtm Dec 3, 2016
Copy link

left a comment

In the unlikely case of this many concatenations, if we produced a ConcatNIterator with int.MaxValue then state would overflow before it matched its index.

private readonly IEnumerable<TSource> _next;
private readonly int _nextIndex;

internal ConcatNIterator(ConcatIterator<TSource> previousConcat, IEnumerable<TSource> next, int nextIndex)

This comment has been minimized.

Copy link
@lindexi

lindexi Mar 12, 2018

if you should sure the nextIndex is >=0 that why you dont use uint?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
7 participants
You can’t perform that action at this time.