Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix quadratic worst case in quicksort for Vectors #101

Closed
wants to merge 2 commits into from
Closed

Fix quadratic worst case in quicksort for Vectors #101

wants to merge 2 commits into from

Conversation

nomad010
Copy link
Contributor

@nomad010 nomad010 commented Sep 5, 2019

Fixes a quadratic worst case scenario by choosing a random pivot instead of just the last entry in the Vector.

Some benchmarks can be seen here: https://gist.github.com/nomad010/b83cc9b50193212532f02fbb309fd568

In most cases, this won't actually affect performance very much, but in cases where the Vector is sorted/mostly sorted, it will affect it dramatically for the better.

The downsides to this fix are that it adds another dependency (rand), however I think that the improvement that comes with it more than make up for that.

@bodil
Copy link
Owner

bodil commented Sep 19, 2019

The benchmarks speak for themselves, but I'm feeling extremely cautious about adding a relatively heavy dependency to im - rand pulls in more subdependencies than I'm entirely happy with, and a design goal of im is to keep those at an absolute minimum. I wonder if there's a way we could get the same results without pulling in all of rand?

@bodil
Copy link
Owner

bodil commented Sep 19, 2019

I've got a patch using just rand_core and rand_xoshiro which seems to do the job. Will merge both.

@bodil bodil closed this Sep 19, 2019
@bodil
Copy link
Owner

bodil commented Sep 19, 2019

OK, merged. You might want to verify that I haven't done anything stupid to your fix: 2f7d19a

@nomad010
Copy link
Contributor Author

Thank you for looking at the PR. I'm on holiday at the moment and without my laptop, but the commit you linked looks fine.

I agree completely with the rand and rand_core and have looked at ways to mitigate that when I filed the PR. I had a look at what the std library uses for its sort and while it does use PRNGs implements with a fixed seed and without using rand or rand_core.

Another idea I had but haven't had time to test out is a sort of modified merge sort. Sort each chunk individually using whatever sort is appropriate (hopefully the std libraries) and then use structure of the tree to merge chunks until they are sorted. The main complexity with this would be writing the merger as they would have to efficiently merge large numbers of chunks. However it might be possible to write this such that random access on the tree is minimised. I'm not really sure how the various tradeoffs will pan out so it might not be fruitful but if it does pan out I'll open another PR with the findings.

@bodil
Copy link
Owner

bodil commented Sep 19, 2019

That would be extremely welcome. I haven't had much time or energy to spare to put into sort algo efficiency, it's an area that could definitely be improved a lot.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants