New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve LINQ perf of chained Concats #6131

Merged
merged 3 commits into from Feb 17, 2016

Conversation

Projects
None yet
7 participants
@stephentoub
Member

stephentoub commented Feb 16, 2016

The Concat operator today is very simple: it iterats through the first source yielding all items, then does the same for the second. This works great in isolation, but when chained, the cost grows as yielding each item from the Nth source results in calling through the MoveNext/Current interface methods of the previous N-1 concats. While this is the nature of LINQ operators in general, it's particular pernicious with Concat, which is often used to assembly data from many sources.

This commit introduces a special concat iterator that avoids that recursive cost. This comes at the small expense of N+1 interface calls per iteration, where N is the number of sources involved in the concatenation chain. Chains of two sources and three sources are special-cased, after which an array is allocated and used to hold all of the sources (this could be tweaked in the future to have specializations for more sources if, for example, we found that four was a very common number). Other benefits include the size of the concat iterator being a bit smaller than it was previously when generated by the compiler, and it now taking part in the IIListProvider interface, so that for example ToList operations are faster when any of the sources are ILists.

Example results on my machine:

  • Enumerating a Concat of 2 Range(0, 10) enumerables: ~15% faster
  • Enumerating a Concat of 3 Range(0, 10) enumerables: ~30% faster
  • Enumerating a Concat of 10 Range(0, 100) enumerables: ~4x faster
  • Enumerating a Concat of 100 Range(0, 1) enumerables: ~2x faster

cc: @VSadov, @JonHanna
Related to #2075

@svick

This comment has been minimized.

Show comment
Hide comment
@svick

svick Feb 16, 2016

Contributor

How much would it hurt the common "append" case (e.g. a.Concat(b).Concat(c).Concat(d)), if the less common "prepend" case (e.g. a.Concat(b.Concat(c.Concat(d)))) was optimized too?

Contributor

svick commented Feb 16, 2016

How much would it hurt the common "append" case (e.g. a.Concat(b).Concat(c).Concat(d)), if the less common "prepend" case (e.g. a.Concat(b.Concat(c.Concat(d)))) was optimized too?

@JonHanna

This comment has been minimized.

Show comment
Hide comment
@JonHanna

JonHanna Feb 16, 2016

Collaborator

I was taking a look at the same thing, though planning to wait until after #6127 and particularly #6129 were in.

You can take a look at JonHanna@9894d37 though it's far from ready.

Differences:

  1. I also (though it's not completed here) optimise for IList<T> sources. That was in fact my original goal, but I made the same observation as you in the course of that, because it caused me to think about the costs of Concat more generally. (In particular, XUnit itself appears to do some concats of concats. It also behaves very strangely if your Concat has a bug in it and ends up telling you all your tests passed because it missed most of them 😉). There's no reason why that couldn't be added on to this PR at a later date.
  2. I also tackle Union. Likewise, there's no reason why that couldn't still be done if we take this PR.
  3. I use an abstract-method approach to finding the appropriate ConcatIterator to return. I don't know whether that would prove to be better or worse than that used here. (Cost of virtual lookup vs type-checking would be the main difference, I imagine).
  4. I handle x.Concat(y.Concat(z)) as well, though again there's no reason this couldn't be added here.
  5. I handle x.Concat(y).Concat(a.Concat(b)), though I always skip the explicitly-numbered classes in this case. That would be easy to bring to this.
  6. I've a very different approach to MoveNext(). It's hard to weigh the two just from looking at the code.
  7. I don't set a limit on how large an array can be created. Good idea!
  8. The most important difference; my approach isn't finished, in which regard this approach clearly wins 😄

Anyway, this LGTM, but you might find one or more of the ideas in mine worth considering.

Collaborator

JonHanna commented Feb 16, 2016

I was taking a look at the same thing, though planning to wait until after #6127 and particularly #6129 were in.

You can take a look at JonHanna@9894d37 though it's far from ready.

Differences:

  1. I also (though it's not completed here) optimise for IList<T> sources. That was in fact my original goal, but I made the same observation as you in the course of that, because it caused me to think about the costs of Concat more generally. (In particular, XUnit itself appears to do some concats of concats. It also behaves very strangely if your Concat has a bug in it and ends up telling you all your tests passed because it missed most of them 😉). There's no reason why that couldn't be added on to this PR at a later date.
  2. I also tackle Union. Likewise, there's no reason why that couldn't still be done if we take this PR.
  3. I use an abstract-method approach to finding the appropriate ConcatIterator to return. I don't know whether that would prove to be better or worse than that used here. (Cost of virtual lookup vs type-checking would be the main difference, I imagine).
  4. I handle x.Concat(y.Concat(z)) as well, though again there's no reason this couldn't be added here.
  5. I handle x.Concat(y).Concat(a.Concat(b)), though I always skip the explicitly-numbered classes in this case. That would be easy to bring to this.
  6. I've a very different approach to MoveNext(). It's hard to weigh the two just from looking at the code.
  7. I don't set a limit on how large an array can be created. Good idea!
  8. The most important difference; my approach isn't finished, in which regard this approach clearly wins 😄

Anyway, this LGTM, but you might find one or more of the ideas in mine worth considering.

public List<TSource> ToList()
{
var list = new List<TSource>();

This comment has been minimized.

@JonHanna

JonHanna Feb 16, 2016

Collaborator

If you call GetCount(true) and the result >= 0 then the list can be preallocated.

@JonHanna

JonHanna Feb 16, 2016

Collaborator

If you call GetCount(true) and the result >= 0 then the list can be preallocated.

This comment has been minimized.

@stephentoub

stephentoub Feb 16, 2016

Member

With my suggested response above, GetCount(true) would always return -1.

@stephentoub

stephentoub Feb 16, 2016

Member

With my suggested response above, GetCount(true) would always return -1.

public TSource[] ToArray()
{
return EnumerableHelpers.ToArray(this);

This comment has been minimized.

@JonHanna

JonHanna Feb 16, 2016

Collaborator

Likewise, GetCount(true) might let you preallocate, with this as the fall-back when the result is -1.

@JonHanna

JonHanna Feb 16, 2016

Collaborator

Likewise, GetCount(true) might let you preallocate, with this as the fall-back when the result is -1.

This comment has been minimized.

@stephentoub

stephentoub Feb 16, 2016

Member

Ditto.

@stephentoub
@stephentoub

This comment has been minimized.

Show comment
Hide comment
@stephentoub

stephentoub Feb 16, 2016

Member

I also (though it's not completed here) optimise for IList sources.

I started on that path, looked at a bunch of existing use cases and what value would actually be had for doing the type checks, adding all the special paths, etc., and it didn't seem worthwhile. If it turns out to be valuable, it's just "more code" and could be added in the future.

I also tackle Union. Likewise, there's no reason why that couldn't still be done if we take this PR.

Yeah, I think that's separate, and IMO chains of concats is much more common than chains of Unions. Again, though, it's just "more code" that could be added later.

I use an abstract-method approach to finding the appropriate ConcatIterator to return.

That's a good idea. I'll do that.

I handle x.Concat(y.Concat(z)) as well, though again there's no reason this couldn't be added here.

Sure. There are lots of potential combinations. I simply handled the one that seemed to provide the best return on investment. I'm trying to weigh the possible gains for the most common cases with keeping the code complexity low. It's possible additional cases would be valuable in the future.

Member

stephentoub commented Feb 16, 2016

I also (though it's not completed here) optimise for IList sources.

I started on that path, looked at a bunch of existing use cases and what value would actually be had for doing the type checks, adding all the special paths, etc., and it didn't seem worthwhile. If it turns out to be valuable, it's just "more code" and could be added in the future.

I also tackle Union. Likewise, there's no reason why that couldn't still be done if we take this PR.

Yeah, I think that's separate, and IMO chains of concats is much more common than chains of Unions. Again, though, it's just "more code" that could be added later.

I use an abstract-method approach to finding the appropriate ConcatIterator to return.

That's a good idea. I'll do that.

I handle x.Concat(y.Concat(z)) as well, though again there's no reason this couldn't be added here.

Sure. There are lots of potential combinations. I simply handled the one that seemed to provide the best return on investment. I'm trying to weigh the possible gains for the most common cases with keeping the code complexity low. It's possible additional cases would be valuable in the future.

@JonHanna

This comment has been minimized.

Show comment
Hide comment
@JonHanna

JonHanna Feb 16, 2016

Collaborator

I was thinking I'd probably keep the prepend as a reasonably likely case that needs just one more check, but drop the check within append that catches a concatenation of concatenations as more trouble than its worth.

Collaborator

JonHanna commented Feb 16, 2016

I was thinking I'd probably keep the prepend as a reasonably likely case that needs just one more check, but drop the check within append that catches a concatenation of concatenations as more trouble than its worth.

@stephentoub

This comment has been minimized.

Show comment
Hide comment
@stephentoub

stephentoub Feb 16, 2016

Member

Thanks for the review, @JonHanna. I updated it to avoid the arrays entirely and to address your feedback, plus added a few more tests.

Member

stephentoub commented Feb 16, 2016

Thanks for the review, @JonHanna. I updated it to avoid the arrays entirely and to address your feedback, plus added a few more tests.

@JonHanna

This comment has been minimized.

Show comment
Hide comment
@JonHanna

JonHanna Feb 16, 2016

Collaborator

Yeah, I think that's separate, and IMO chains of concats is much more common than chains of Unions.

Yeah, I was just led to think of it due to the way they correspond to two types of SQL UNION. That said, since this makes most of the rest of that experiment obsolete, I'll look at adding that part to #6129, though probably cut-down to not care about prepends (chains of unions going backwards are going to be rarer still).

Collaborator

JonHanna commented Feb 16, 2016

Yeah, I think that's separate, and IMO chains of concats is much more common than chains of Unions.

Yeah, I was just led to think of it due to the way they correspond to two types of SQL UNION. That said, since this makes most of the rest of that experiment obsolete, I'll look at adding that part to #6129, though probably cut-down to not care about prepends (chains of unions going backwards are going to be rarer still).

}
}
private sealed class Concat3Iterator<TSource> : ConcatIterator<TSource>

This comment has been minimized.

@JonHanna

JonHanna Feb 16, 2016

Collaborator

I know there's no perfect answer here, but 4 seems to be popular as the "magic number" for number-of-item-based subclasses and overloads. Concat4Iterator? This comment though is made in full awareness of how cargo-cult it is to say "well, everyone else seems to do it" without actual analysis.

@JonHanna

JonHanna Feb 16, 2016

Collaborator

I know there's no perfect answer here, but 4 seems to be popular as the "magic number" for number-of-item-based subclasses and overloads. Concat4Iterator? This comment though is made in full awareness of how cargo-cult it is to say "well, everyone else seems to do it" without actual analysis.

This comment has been minimized.

@stephentoub

stephentoub Feb 16, 2016

Member

I could. Though I actually considered deleting the 3 case even. Once I switched the N version to using a chain rather than an array, there's no longer a "steep cliff".

@stephentoub

stephentoub Feb 16, 2016

Member

I could. Though I actually considered deleting the 3 case even. Once I switched the N version to using a chain rather than an array, there's no longer a "steep cliff".

stephentoub added some commits Feb 16, 2016

Improve LINQ perf of chained Concats
The Concat operator today is very simple: it iterats through the first source yielding all items, then does the same for the second.  This works great in isolation, but when chained, the cost grows as yielding each item from the Nth source results in calling through the MoveNext/Current interface methods of the previous N-1 concats.  While this is the nature of LINQ operators in general, it's particular pernicious with Concat, which is often used to assembly data from many sources.

This commit introduces a special concat iterator that avoids that recursive cost.  This comes at the small expense of N+1 interface calls per iteration, where N is the number of sources involved in the concatenation chain.  Chains of two sources and three sources are special-cased, after which an array is allocated and used to hold all of the sources (this could be tweaked in the future to have specializations for more sources if, for example, we found that four was a very common number).  Other benefits include the size of the concat iterator being a bit smaller than it was previously when generated by the compiler, and it now taking part in the IIListProvider interface, so that for example ToList operations are faster when any of the sources are ILists.

Example results on my machine:
- Enumerating a Concat of 2 Range(0, 10) enumerables: ~15% faster
- Enumerating a Concat of 3 Range(0, 10) enumerables: ~30% faster
- Enumerating a Concat of 10 Range(0, 100) enumerables: ~4x faster
- Enumerating a Concat of 100 Range(0, 1) enumerables: ~2x faster
Address PR feedback
And add a few more tests.

stephentoub added a commit that referenced this pull request Feb 17, 2016

Merge pull request #6131 from stephentoub/concat_perf
Improve LINQ perf of chained Concats

@stephentoub stephentoub merged commit 5790919 into dotnet:master Feb 17, 2016

6 of 7 checks passed

Innerloop Ubuntu Release Build and Test Build finished. No test results found.
Details
Innerloop CentOS7.1 Debug Build and Test Build finished. No test results found.
Details
Innerloop OSX Debug Build and Test Build finished. No test results found.
Details
Innerloop OSX Release Build and Test Build finished. No test results found.
Details
Innerloop Ubuntu Debug Build and Test Build finished. No test results found.
Details
Innerloop Windows_NT Debug Build and Test Build finished. 130393 tests run, 34 skipped, 0 failed.
Details
Innerloop Windows_NT Release Build and Test Build finished. 130397 tests run, 34 skipped, 0 failed.
Details

@stephentoub stephentoub deleted the stephentoub:concat_perf branch Feb 17, 2016

@karelz karelz modified the milestone: 1.0.0-rtm Dec 3, 2016

@lindexi

In the unlikely case of this many concatenations, if we produced a ConcatNIterator with int.MaxValue then state would overflow before it matched its index.

private readonly IEnumerable<TSource> _next;
private readonly int _nextIndex;
internal ConcatNIterator(ConcatIterator<TSource> previousConcat, IEnumerable<TSource> next, int nextIndex)

This comment has been minimized.

@lindexi

lindexi Mar 12, 2018

if you should sure the nextIndex is >=0 that why you dont use uint?

@lindexi

lindexi Mar 12, 2018

if you should sure the nextIndex is >=0 that why you dont use uint?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment