Skip to content
This repository has been archived by the owner on Jan 23, 2023. It is now read-only.
/ corefx Public archive

Performance improvements to Skip and SkipWhile. #2446

Closed
wants to merge 1 commit into from

Conversation

JonHanna
Copy link
Contributor

Fixes #2238
(And I think this is the last bit to separate out of the original PR)

https://github.com/hackcraft/Enumerable-Tester/raw/master/Skip%20performance.ods
shows a comparison of the old and new approaches. Improvements are made in
the vast majority of cases, and with Skip in particular can be significant.

Edge-cases of disposal and exception ordering are dealt with; duplicating the
behaviour found with the current implementation.

(This ignores the matter of ordered skips as per #2401).

{
// Don't short-circuit. These are almost always both true, so don't "save"
// time by branching after a very likely case.
var ret = _state == 0 & _threadId == Environment.CurrentManagedThreadId
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interestingly, I found that just allocating a new object as enumerator beat this approach slightly more often than this did. However, it was only slight and the approach that allocates the least is likely to be the approach that handles memory pressures tests don't catch the best.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's really going to depend on your benchmark. With the workstation GC, with a single-threaded benchmark, yeah, I'm not surprised that they'd be in a tight race. The actual act of allocating an object is usually very cheap, just some pointer manipulations and whatever the cost of invoking the ctor. The real impact comes later when the GC needs to run, causing GCs to take longer and forcing the GC to run more often because there's more garbage to be cleaned up. Such costs are typically less noticeable when collections can be done concurrently with the running code and when machine isn't saturated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, of course. And I was generally taking time to ensure GC didn't have an effect, because the difficulty in predicting could lead one or the other attempt to have unfair results.

Fixes #2238
(And I think this is the last bit to separate out of the original PR)

https://github.com/hackcraft/Enumerable-Tester/raw/master/Skip%20performance.ods
shows a comparison of the old and new approaches. Improvements are made in
the vast majority of cases, and with Skip in particular can be significant.

Edge-cases of disposal and exception ordering are dealt with; duplicating the
behaviour found with the current implementation.

(This ignores the matter of ordered skips as per dotnet#2401).
@stephentoub
Copy link
Member

cc: @VSadov

// It should also be harmless to dispose the enumerator here if we've
// exhausted it, but we guard against strange behaviour in either the
// enumerator or the caller.
if (!sourceEnumerator.MoveNext()) return new DeadEnumeratorWrapper<TSource>(sourceEnumerator);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even with the heroic acts being done here to minimize visible differences in behavior, this still has the very visible difference that where work is being done is being moved. If I previously had:

var source = Enumerable.Range(0, Int32.MaxValue).Skip(Int32.MaxValue - 1);
var e = source.GetEnumerator();

previously that would have been very cheap and quick. With this change, that's going to iterate through 2 billion elements. That makes me nervous.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. Calling GetEnumerator() and not calling MoveNext() even once almost immediately seems pretty obscure, even to someone who has a strange love for playing with enumerators like myself, but certainly not impossible.
The only way around that is to wrap it entirely. While that proved to be a win with SkipWhile I wouldn't expect it to be as big a win, and perhaps not worth the effort.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other problem with ExceptionDispatchInfo is debuggability. It preserves stack trace, but not stack itself. Basically if you have to do a postmortem analysis of a crash dump, you will not be able to dig through the application state at the time of the crash. You will have the state preserved at the time of ex.Throw() which is not all that useful.
ExceptionDispatchInfo is a necessary evil in async. It is still already inconvenient enough though to force us using exception filters in Roslyn to guarantee robust and immediate failfast in async code.
In iterators the dead iterator trick with ExceptionDispatchInfo would be just to gain some perf and IMO is probably not worth the overall trouble.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay. I'm inclined to abandon this chunk entirely then. Unless someone comes up with a "hey, but you could just…" that saves it in the next short time, I'll close this PR.

@stephentoub
Copy link
Member

Unless someone comes up with a "hey, but you could just…" that saves it in the next short time, I'll close this PR

Thanks, @hackcraft. Should we go ahead and close this?

@JonHanna
Copy link
Contributor Author

I think I'll see what happens if I redo Skip much as SkipWhile is done here (adjusted as per what @VSadov said about the && in the thread check), since that it won't be a big job to get from here to there and while it may not be as dramatic an improvement it might still be worth it now I nearly have it, but as I won't have time to do that for a while, for now I'm going to just close this out.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants