New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve SortedSet performance #1955

Merged
merged 3 commits into from Jun 9, 2015

Conversation

Projects
None yet
7 participants
@stephentoub
Member

stephentoub commented Jun 8, 2015

Primary changes:

  • SortedSet's ctor has a loop that's O(D * N), where N is the number of elements being added to the set and D is the number of duplicates in those elements. As the number of duplicates D grows, the time it takes to construct a sorted set grows polynomially, approaching O(N^2), which is particularly bad given that the purpose of a set is to help remove duplicates. The core cause is a loop over the elements to determine whether element n is equal to n-1; if it is, element n is removed, but it's removed via a call to List.RemoveAt, which will shift down all of the elements above it in the list. A better implementation can make this O(N) instead; the ctor overall is still then O(N log N), due to the sort, but that's better than O(N^2).
  • To sort the elements, the ctor builds a List of the elements but then uses ToArray to get an array used to actually build the set; this incurs an unnecessary (and potentially large) array allocation and copy. We can avoid that entirely by just building up the array manually using the same logic that List does. That same logic is actually duplicated in multiple places, including in Stack, so I've extracted it as a helper to use in both places (there are a few other places in corefx that use the same logic, hence I put the helper into a Common file, but I did not condense those other uses as part of this commit). This then further helps with Stack perf in a few corner cases, such as when a Stack is constructed with an empty source. (It's not clear to me why Queue doesn't have the same IColection-related logic; probably just an oversight, but I didn't want to change its behavior with the additional ICollection check.)
  • SortedSet's BreadthFirstTreeWalk (which is used by several members, such as RemoveWhere) is currently using a List as a queue, doing a RemoveAt(0) to dequeue, which will incur the cost of shifting all elements down. Using a Queue is both cleaner and better performing.
  • As I was touching Stack and Queue, I noticed that they had an empty array static field, which was just being initialized to Array.Empty(). The field is unnecessary, and the call site can just use Array.Empty directly.

Fixes #1953.

@stephentoub

This comment has been minimized.

Show comment
Hide comment
@stephentoub
Member

stephentoub commented Jun 8, 2015

@ellismg

This comment has been minimized.

Show comment
Hide comment
@ellismg

ellismg Jun 8, 2015

Contributor

Aside from the naming of the local, the change looks good to me.

Contributor

ellismg commented Jun 8, 2015

Aside from the naming of the local, the change looks good to me.

// Creates a queue with room for capacity objects. The default initial
// capacity and grow factor are used.
/// <include file='doc\Queue.uex' path='docs/doc[@for="Queue.Queue"]/*' />
public Queue()
{
_array = s_emptyArray;
_array = Array.Empty<T>();

This comment has been minimized.

@sharwell

sharwell Jun 8, 2015

Member

💡 this should probably have been part of a different commit.

@sharwell

sharwell Jun 8, 2015

Member

💡 this should probably have been part of a different commit.

@sharwell

This comment has been minimized.

Show comment
Hide comment
@sharwell

sharwell Jun 8, 2015

Member

No major problems stood out to me. I did comment on some minor items and had a question.

I do wish this was separated into one commit (or perhaps even one PR) per topic, but I say this as a reminder for the future and not as a reason to reject this PR.

Member

sharwell commented Jun 8, 2015

No major problems stood out to me. I did comment on some minor items and had a question.

I do wish this was separated into one commit (or perhaps even one PR) per topic, but I say this as a reminder for the future and not as a reason to reject this PR.

@sharwell

This comment has been minimized.

Show comment
Hide comment
@sharwell

sharwell Jun 8, 2015

Member

I take that back... I'm concerned about the current MaxArrayLength behavior. See my comment in the diff for details.

Member

sharwell commented Jun 8, 2015

I take that back... I'm concerned about the current MaxArrayLength behavior. See my comment in the diff for details.

}
_root = ConstructRootFromSortedArray(els.ToArray(), 0, els.Count - 1, null);
_count = els.Count;
_version = 0;

This comment has been minimized.

@AlexGhiondea

AlexGhiondea Jun 9, 2015

Member

You no longer need to set the version?

@AlexGhiondea

AlexGhiondea Jun 9, 2015

Member

You no longer need to set the version?

This comment has been minimized.

@sharwell

sharwell Jun 9, 2015

Member

This is the constructor. Setting _version to 0 here is redundant because the CLR initializes all fields to their default values.

@sharwell

sharwell Jun 9, 2015

Member

This is the constructor. Setting _version to 0 here is redundant because the CLR initializes all fields to their default values.

stephentoub added some commits Jun 9, 2015

Improve SortedSet ctor performance
- SortedSet's ctor has a loop that's O(D * N), where N is the number of elements being added to the set and D is the number of duplicates in those elements.  As the number of duplicates D grows, the time it takes to construct a sorted set grows polynomially, approaching O(N^2), which is particularly bad given that the purpose of a set is to help remove duplicates.  The core cause is a loop over the elements to determine whether element n is equal to n-1; if it is, element n is removed, but it's removed via a call to List<T>.RemoveAt, which will shift down all of the elements above it in the list.  A better implementation can make this O(N) instead; the ctor overall is still then O(N log N), due to the sort, but that's better than O(N^2).
- To sort the elements, the ctor builds a List<T> of the elements but then uses ToArray to get an array used to actually build the set; this incurs an unnecessary (and potentially large) array allocation and copy.  We can avoid that entirely by just building up the array manually using the same logic that List<T> does.  That same logic is actually duplicated in multiple places, including in Stack<T>, so I've extracted it as a helper to use in both places (there are a few other places in corefx that use the same logic, hence I put the helper into a Common file, but I did not condense those other uses as part of this commit).  This then further helps with Stack<T> perf in a few corner cases, such as when a Stack<T> is constructed with an empty enumerable. (It's not clear to me why Queue<T> doesn't have the same IColection<T>-related logic; probably just an oversight, but I didn't want to change its behavior with the additional ICollection<T> check.)
Improve SortedSet.BreadthFirstTreeWalk
SortedSet's BreadthFirstTreeWalk (which is used by several members, such as RemoveWhere) is currently using a List as a queue, doing a RemoveAt(0) to dequeue, which will incur the cost of shifting all elements down. Using a Queue is both cleaner and better performing.
Remove unnecessary empty array static fields
As I was touching Stack, I noticed that it and Queue had an empty array static field, which was just being initialized to Array.Empty(). The field is unnecessary, and the call site can just use Array.Empty directly.  This commit just cleans that up and some unnecessary field setting in the ctors.
@stephentoub

This comment has been minimized.

Show comment
Hide comment
@stephentoub

stephentoub Jun 9, 2015

Member

Thanks. Updated.

Member

stephentoub commented Jun 9, 2015

Thanks. Updated.

@stephentoub

This comment has been minimized.

Show comment
Hide comment
@stephentoub

stephentoub Jun 9, 2015

Member

@dotnet-bot test this please

Member

stephentoub commented Jun 9, 2015

@dotnet-bot test this please

stephentoub added a commit that referenced this pull request Jun 9, 2015

@stephentoub stephentoub merged commit c8155b6 into dotnet:master Jun 9, 2015

1 check passed

default Build finished.
Details

@mmitche mmitche removed the 2 - In Progress label Jun 9, 2015

@stephentoub stephentoub deleted the stephentoub:sortedset_perf branch Jun 10, 2015

@karelz karelz modified the milestone: 1.0.0-rtm Dec 3, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment