Skip to content
This repository has been archived by the owner on Jan 23, 2023. It is now read-only.
/ corefx Public archive

Optimise First, Last, FirstOrDefault & LastOrDefault for OrderedEnumerable #2401

Merged
merged 1 commit into from
Jan 14, 2016

Conversation

JonHanna
Copy link
Contributor

Fixes #2400

If you only want the first or last element of an ordered enumerable, you do
not need to O‎(n log n) (worse case, O(n²) but rare) operation of sorting
the entire set in O(n) space, but can instead to an O(n) retrieval of just
that element in O(1) space.

@stephentoub
Copy link
Member

cc: @VSadov

JonHanna added a commit to JonHanna/corefx that referenced this pull request Jul 22, 2015
Fixes #2238
(And I think this is the last bit to separate out of the original PR)

https://github.com/hackcraft/Enumerable-Tester/raw/master/Skip%20performance.ods
shows a comparison of the old and new approaches. Improvements are made in
the vast majority of cases, and with Skip in particular can be significant.

Edge-cases of disposal and exception ordering are dealt with; duplicating the
behaviour found with the current implementation.

(This ignores the matter of ordered skips as per dotnet#2401).
JonHanna added a commit to JonHanna/corefx that referenced this pull request Jul 22, 2015
Fixes #2238
(And I think this is the last bit to separate out of the original PR)

https://github.com/hackcraft/Enumerable-Tester/raw/master/Skip%20performance.ods
shows a comparison of the old and new approaches. Improvements are made in
the vast majority of cases, and with Skip in particular can be significant.

Edge-cases of disposal and exception ordering are dealt with; duplicating the
behaviour found with the current implementation.

(This ignores the matter of ordered skips as per dotnet#2401).
@@ -2958,7 +3136,67 @@ IEnumerator IEnumerable.GetEnumerator()
}
}

internal abstract class OrderedEnumerable<TElement> : IOrderedEnumerable<TElement>
internal interface IOrderedPartitonable<TElement> : IEnumerable<TElement>

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo in interface name IOrderedPartit_i_onable

JonHanna added a commit to JonHanna/corefx that referenced this pull request Aug 12, 2015
Fixes #2238
(And I think this is the last bit to separate out of the original PR)

https://github.com/hackcraft/Enumerable-Tester/raw/master/Skip%20performance.ods and https://github.com/hackcraft/Enumerable-Tester/raw/master/new%20version%20skip%20improvements.ods
show comparisons of the old and new approaches. Improvements are made in
the vast majority of cases.

(This ignores the matter of ordered skips as per dotnet#2401).
JonHanna added a commit to JonHanna/corefx that referenced this pull request Aug 12, 2015
Fixes #2238
(And I think this is the last bit to separate out of the original PR)

https://github.com/hackcraft/Enumerable-Tester/raw/master/Skip%20performance.ods and https://github.com/hackcraft/Enumerable-Tester/raw/master/new%20version%20skip%20improvements.ods
show comparisons of the old and new approaches. Improvements are made in
the vast majority of cases.

(This ignores the matter of ordered skips as per dotnet#2401).
JonHanna added a commit to JonHanna/corefx that referenced this pull request Aug 12, 2015
Fixes #2238
(And I think this is the last bit to separate out of the original PR)

https://github.com/hackcraft/Enumerable-Tester/raw/master/Skip%20performance.ods and https://github.com/hackcraft/Enumerable-Tester/raw/master/new%20version%20skip%20improvements.ods
show comparisons of the old and new approaches. Improvements are made in
the vast majority of cases.

(This ignores the matter of ordered skips as per dotnet#2401).
JonHanna added a commit to JonHanna/corefx that referenced this pull request Aug 17, 2015
Fixes #2238
(And I think this is the last bit to separate out of the original PR)

https://github.com/hackcraft/Enumerable-Tester/raw/master/Skip%20performance.ods and https://github.com/hackcraft/Enumerable-Tester/raw/master/new%20version%20skip%20improvements.ods
show comparisons of the old and new approaches. Improvements are made in
the vast majority of cases.

(This ignores the matter of ordered skips as per dotnet#2401).

Also similar Take improvements. In particular, the common Linq paging Idiom of a Skip
followed by a Take is catered for specifically.
@@ -122,11 +122,6 @@ public IEnumerator<TSource> GetEnumerator()

public abstract IEnumerable<TSource> Where(Func<TSource, bool> predicate);

public virtual TSource[] ToArray()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing this from here was mainly to be able to have both Iterator and IPartition share the same interface-path to ToArray. Some Iterators have a ToArray that ends up going through Buffer twice, so just removing them and letting Buffer do what it would do anyway is a side-effect benefit.

@JonHanna JonHanna changed the title [WIP] Optimise First, Last, FirstOrDefault & LastOrDefault for OrderedEnumerable Optimise First, Last, FirstOrDefault & LastOrDefault for OrderedEnumerable Aug 21, 2015
@JonHanna
Copy link
Contributor Author

After some further changes, checks, and squashing, the state of this PR is:

The following operations have changed time complexity on the results of an OrderBy(), OrderBy().ThenBy(), etc:

Operation Old Time Complexity New Time Complexity
First()
Last()
FirstOrDefault()
LastOrDefault()
O(n log n) Best
O(n log n) Average
O(n²) Worse
O(n) Space Compexity
Θ(n)
O(1) Space Complexity
ElementAt()
ElementAtDefault()
O(n log n) Best
O(n log n) Average
O(n²) Worse
O(n) Best
O(n) Average
O(n²) Worse

ElementAt() and ElementAtDefault() use a Θ(n) fast-path when possible (when it turns out to be the first or last element), which is sometimes O(1) in space and sometimes O(n).

There is also some work to make the worse-case less likely to be hit.

When .Skip(j).Take(k) is called on such results (cases of just Skip() or just Take() can be considered the same, but with j or k being 0), then the complexity of obtaining the results for any subsquent operation (or to enumerate) is changed thus:

(k here is the number of results actually taken, rather than asked for, so if there are 5 items and you try to take 7, k is 5)

Old Time Complexity New Time Complexity
O(n log n) Best
O(n log n) Average
O(n²) Worse
O(n + k log k) Best
O(n + k log k) Average
O(n²) Worse

Once the results are prepared, enumerating (directly or indirectly) is also reduced from Θ(j + k) to Θ(k).

These ordered Skip/Take results also have the following changes to their complexity:

Operation Old Time Complexity New Time Complexity
First()
Last()†
FirstOrDefault()
LastOrDefault()†
O(n log n) Best
O(n log n) Average
O(n²) Worse
O(n) Best
O(n) Average
O(n²) Worse
ElementAt()†
ElementAtDefault()†
O(n log n) Best
O(n log n) Average
O(n²) Worse
O(n) Best
O(n) Average
O(n²) Worse

Those marked † also had a O(n) component that was completely removed, so the lower-order costs are also improved.

All of these use a Θ(n) fast-path when possible, which is sometimes O(1) in space and sometimes O(n).

Obtaining an array of all the above results is also optimsed. Obtaining an array of those iterators that can't special-case the operation themselves is also optimised as a side-effect of that.

@JonHanna
Copy link
Contributor Author

There is also some work to make the worse-case less likely to be hit.

Which is a complexity not directly relevant and may not be worth it. I had been doing median-of-three, but I don't think that should be done here, so I took it out again. Whether it helps or hinders should probably be considered separately.

@@ -605,6 +594,9 @@ public override TResult[] ToArray()
public static IEnumerable<TSource> Take<TSource>(this IEnumerable<TSource> source, int count)
{
if (source == null) throw Error.ArgumentNull("source");
if (count <= 0) return EmptyPartition<TSource>.Instance;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or perhaps Empty() would be better. Since GetEnumerator() isn't hit in the original form, there's no meaningful observable difference this way, but there could be with Empty().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rect: It does affect garbage collection. That effect is a generally good one (faster collection and perhaps in an earlier generation), but it is an effect that can be observed with finalisers or weak references, so it's not a completely unobservable change.

@dnfclas
Copy link

dnfclas commented Aug 25, 2015

@JonHanna, Thanks for signing the contribution license agreement so quickly! Actual humans will now validate the agreement and then evaluate the PR.

Thanks, DNFBOT;

@JonHanna
Copy link
Contributor Author

JonHanna commented Jan 6, 2016

Test Innerloop Windows_NT Debug Build and Test

VSadov added a commit that referenced this pull request Jan 14, 2016
Optimise First, Last, FirstOrDefault & LastOrDefault for OrderedEnumerable
@VSadov VSadov merged commit b2126e1 into dotnet:master Jan 14, 2016
@JonHanna JonHanna deleted the ordered_first_last branch January 14, 2016 23:21
JonHanna added a commit to JonHanna/corefx that referenced this pull request Jan 16, 2016
Optimisation of Skip() for IList sources from dotnet#4551 fits with
optimisations of Skip() and Take() for other sources from dotnet#2401.

Combine the approaches, extending how the result of Skip() on a
list handles subsequent operations.
JonHanna added a commit to JonHanna/corefx that referenced this pull request Jan 16, 2016
Optimisation of Skip() for IList sources from dotnet#4551 fits with
optimisations of Skip() and Take() for other sources from dotnet#2401.

Combine the approaches, extending how the result of Skip() on a
list handles subsequent operations.
JonHanna added a commit to JonHanna/corefx that referenced this pull request Jan 16, 2016
Optimisation of Skip() for IList sources from dotnet#4551 fits with
optimisations of Skip() and Take() for other sources from dotnet#2401.

Combine the approaches, extending how the result of Skip() on a
list handles subsequent operations.
JonHanna added a commit to JonHanna/corefx that referenced this pull request Jan 29, 2016
Optimisation of Skip() for IList sources from dotnet#4551 fits with
optimisations of Skip() and Take() for other sources from dotnet#2401.

Combine the approaches, extending how the result of Skip() on a
list handles subsequent operations.
JonHanna added a commit to JonHanna/corefx that referenced this pull request Jan 30, 2016
Anything that can serve as one can serve as the other, and also provide
a faster path for Count(). Merge the two interfaces and add a Count
property.

Have IList optimised result of Skip() partitionable.

Optimisation of Skip() for IList sources from dotnet#4551 fits with
optimisations of Skip() and Take() for other sources from dotnet#2401.

Combine the approaches, extending how the result of Skip() on a
list handles subsequent operations.
JonHanna added a commit to JonHanna/corefx that referenced this pull request Jan 30, 2016
Anything that can serve as one can serve as the other, and also provide
a faster path for Count(). Merge the two interfaces and add a Count
property.

Have IList optimised result of Skip() partitionable.

Optimisation of Skip() for IList sources from dotnet#4551 fits with
optimisations of Skip() and Take() for other sources from dotnet#2401.

Combine the approaches, extending how the result of Skip() on a
list handles subsequent operations.
JonHanna added a commit to JonHanna/corefx that referenced this pull request Jan 30, 2016
Anything that can serve as one can serve as the other, and also provide
a faster path for Count(). Merge the two interfaces and add a Count
property.

Have IList optimised result of Skip() partitionable.

Optimisation of Skip() for IList sources from dotnet#4551 fits with
optimisations of Skip() and Take() for other sources from dotnet#2401.

Combine the approaches, extending how the result of Skip() on a
list handles subsequent operations.
JonHanna added a commit to JonHanna/corefx that referenced this pull request Jan 30, 2016
Anything that can serve as one can serve as the other, and also provide
a faster path for Count(). Merge the two interfaces and add a Count
property.

Have IList optimised result of Skip() partitionable.

Optimisation of Skip() for IList sources from dotnet#4551 fits with
optimisations of Skip() and Take() for other sources from dotnet#2401.

Combine the approaches, extending how the result of Skip() on a
list handles subsequent operations.
JonHanna added a commit to JonHanna/corefx that referenced this pull request Jan 31, 2016
Anything that can serve as one can serve as the other, and also provide
a faster path for Count(). Merge the two interfaces and add a Count
property.

Have IList optimised result of Skip() partitionable.

Optimisation of Skip() for IList sources from dotnet#4551 fits with
optimisations of Skip() and Take() for other sources from dotnet#2401.

Combine the approaches, extending how the result of Skip() on a
list handles subsequent operations.
JonHanna added a commit to JonHanna/corefx that referenced this pull request Feb 11, 2016
Anything that can serve as one can serve as the other, and also provide
a faster path for Count(). Merge the two interfaces and add a Count
property.

Have IList optimised result of Skip() partitionable.

Optimisation of Skip() for IList sources from dotnet#4551 fits with
optimisations of Skip() and Take() for other sources from dotnet#2401.

Combine the approaches, extending how the result of Skip() on a
list handles subsequent operations.
JonHanna added a commit to JonHanna/corefx that referenced this pull request Feb 11, 2016
Anything that can serve as one can serve as the other, and also provide
a faster path for Count(). Merge the two interfaces and add a Count
property.

Have IList optimised result of Skip() partitionable.

Optimisation of Skip() for IList sources from dotnet#4551 fits with
optimisations of Skip() and Take() for other sources from dotnet#2401.

Combine the approaches, extending how the result of Skip() on a
list handles subsequent operations.
JonHanna added a commit to JonHanna/corefx that referenced this pull request Feb 12, 2016
Anything that can serve as one can serve as the other, and also provide
a faster path for Count(). Merge the two interfaces and add a Count
property.

Have IList optimised result of Skip() partitionable.

Optimisation of Skip() for IList sources from dotnet#4551 fits with
optimisations of Skip() and Take() for other sources from dotnet#2401.

Combine the approaches, extending how the result of Skip() on a
list handles subsequent operations.
JonHanna added a commit to JonHanna/corefx that referenced this pull request Feb 12, 2016
Anything that can serve as one can serve as the other, and also provide
a faster path for Count(). Merge the two interfaces and add a Count
property.

Have IList optimised result of Skip() partitionable.

Optimisation of Skip() for IList sources from dotnet#4551 fits with
optimisations of Skip() and Take() for other sources from dotnet#2401.

Combine the approaches, extending how the result of Skip() on a
list handles subsequent operations.
JonHanna added a commit to JonHanna/corefx that referenced this pull request Feb 12, 2016
Anything that can serve as one can serve as the other, and also provide
a faster path for Count(). Merge the two interfaces and add a Count
property.

Have IList optimised result of Skip() partitionable.

Optimisation of Skip() for IList sources from dotnet#4551 fits with
optimisations of Skip() and Take() for other sources from dotnet#2401.

Combine the approaches, extending how the result of Skip() on a
list handles subsequent operations.
JonHanna added a commit to JonHanna/corefx that referenced this pull request Feb 12, 2016
Anything that can serve as one can serve as the other, and also provide
a faster path for Count(). Merge the two interfaces and add a Count
property.

Have IList optimised result of Skip() partitionable.

Optimisation of Skip() for IList sources from dotnet#4551 fits with
optimisations of Skip() and Take() for other sources from dotnet#2401.

Combine the approaches, extending how the result of Skip() on a
list handles subsequent operations.
@karelz karelz modified the milestone: 1.0.0-rtm Dec 3, 2016
@EngrMHanif
Copy link

pulled

picenka21 pushed a commit to picenka21/runtime that referenced this pull request Feb 18, 2022
Optimise First, Last, FirstOrDefault & LastOrDefault for OrderedEnumerable

Commit migrated from dotnet/corefx@b2126e1
picenka21 pushed a commit to picenka21/runtime that referenced this pull request Feb 18, 2022
Anything that can serve as one can serve as the other, and also provide
a faster path for Count(). Merge the two interfaces and add a Count
property.

Have IList optimised result of Skip() partitionable.

Optimisation of Skip() for IList sources from dotnet/corefx#4551 fits with
optimisations of Skip() and Take() for other sources from dotnet/corefx#2401.

Combine the approaches, extending how the result of Skip() on a
list handles subsequent operations.


Commit migrated from dotnet/corefx@a087c2d
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
8 participants