Skip to content

New Quicksort #5081

Merged
merged 1 commit into from Feb 12, 2014

5 participants

@illerucis

This is recommended change to Julia's Base Quicksort. Related issue: #4576. Implements a standard median-of-three Quicksort.

[ViralBShah: Added WIP to the title]

@StefanKarpinski
The Julia Language member

Cool. Thanks for submitting. cc: @kmsquire

@kmsquire
The Julia Language member
kmsquire commented Dec 9, 2013

Sorry not to get back to you sooner. Check out my comments in #4576 (just posted).

@kmsquire kmsquire commented on an outdated diff Dec 9, 2013
base/sort.jl
@@ -275,18 +293,15 @@ end
function sort!(v::AbstractVector, lo::Int, hi::Int, a::MergeSortAlg, o::Ordering, t=similar(v))
@inbounds if lo < hi
hi-lo <= SMALL_THRESHOLD && return sort!(v, lo, hi, SMALL_ALGORITHM, o)
-
@kmsquire
The Julia Language member
kmsquire added a note Dec 9, 2013

Generally it's better to leave the whitespace alone, unless it clearly makes things easier to read. This seems to do the opposite.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@kmsquire kmsquire commented on an outdated diff Dec 9, 2013
base/sort.jl
hi-lo <= SMALL_THRESHOLD && return sort!(v, lo, hi, SMALL_ALGORITHM, o)
- pivot = v[(lo+hi)>>>1]
- i, j = lo, hi
+ mi = (lo+hi)>>>1
+ if v[lo] > v[mi]
@kmsquire
The Julia Language member
kmsquire added a note Dec 9, 2013

You'll want to use lt(o, ...) for all of these tests as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@illerucis

Working on changes mentioned here #4576 (comment). Sorry about the whitespace edit - that was an accident. Should have been more careful in reviewing.

@kmsquire
The Julia Language member
kmsquire commented Dec 9, 2013

Great! No worries on the whitespace edit--that's what code review is for. ;-)

@StefanKarpinski
The Julia Language member

No worries, that's what pull requests are for :-)

@kmsquire
The Julia Language member
@illerucis

On it. Will report back soon.

@illerucis

@kmsquire I did some testing with the version you mentioned - one that simply calculates the median and uses that as a pivot without placing the pivot in the first location, and without explicitly placing it in the right place after a passing. The code can be found here. Let me know if I misinterpreted anything. That file also contains a modified version of this pull request according to your earlier comments here

I found that this version is slower than the median-of-three pivot algorithm that uses the suggestions in your earlier comments (going to modify the pull request shortly to reflect these changes, for re-review).

@kmsquire
The Julia Language member

That's what I was looking for--thanks for checking. Did you compare my suggestions with your original? Either way, feel free to update this pull request with your changes.

Cheers, Kevin

@illerucis

So, one way to eliminate this test is to also put the pivot in the high position, and restore the real value outside the loop. For Rob's median version, we can do this, because he ensured that v[hi] >= v[pivot]. (kmsquire/julia_qsortbenchmarks@820eb81). Nice!

I implemented this and saw a big improvement in speed, but the way it's currently written won't work for arrays sorted in reverse order (correct me if I'm wrong). I'm working on changing the median calculation to use a comparator like lt() in the inner loop with ordering.

@kmsquire
The Julia Language member

Absolutely correct! It should work fine if you use lt() everywhere.

@illerucis

Great - make test passed, but I just want to be doubly sure that I'm still seeing performance improvements using lt()

@StefanKarpinski
The Julia Language member

There shouldn't be any overhead from calling lt relative to calling < but it's always good to check.

@illerucis

I checked and there was no (noticeable) hit in performance.

@kmsquire kmsquire and 1 other commented on an outdated diff Dec 10, 2013
base/sort.jl
@@ -257,17 +257,33 @@ end
function sort!(v::AbstractVector, lo::Int, hi::Int, a::QuickSortAlg, o::Ordering)
@inbounds while lo < hi
hi-lo <= SMALL_THRESHOLD && return sort!(v, lo, hi, SMALL_ALGORITHM, o)
- pivot = v[(lo+hi)>>>1]
+ mi = (lo+hi)>>>1
+ if lt(o, v[mi], v[lo])
+ v[lo], v[mi] = v[mi], v[lo]
+ end
+ if lt(o, v[hi], v[lo])
+ v[lo], v[hi] = v[hi], v[lo]
+ end
+ if lt(o, v[lo], v[mi])
@kmsquire
The Julia Language member
kmsquire added a note Dec 10, 2013

Looks like the test here is backwards. If v[lo] < v[mi], we don't want to exchange, right?

@kmsquire
The Julia Language member
kmsquire added a note Dec 10, 2013

Hmmm... I'm not following the logic of the 3 way sort. It might have been right previously (comparing v[mi] with v[hi]--sorry if I misled), but please look at it carefully. I would do

b < a  => swap(a,b)
c < b => swap(b,c)
b < a => swap(a,b)

But what you had before might be equivalent. This currently isn't.

@kmsquire
The Julia Language member
kmsquire added a note Dec 10, 2013

Actually, you might consider rewriting these to minimize the number of writes.

@StefanKarpinski
The Julia Language member

Minimizing writes is probably a good metric, although who knows what the CPU does (can one write catch another?).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@kmsquire
The Julia Language member

Couple of comments

1) After thinking about it a little more, I realized that placing the pivot at v[hi] is totally unnecessary. After the median sort, that value is already guaranteed to be greater than or equal to the pivot, which means that i will stop there, and will always be >=j if it does. Your original check is unnecessary, and so is my "fix"

2) I'm seeing a really strange regression, where sorting an already sorted array with your current version is exceedingly slow. I haven't been able to explain it. (I changed your median sort to match my comment.)

@kmsquire
The Julia Language member

Never mind the slowness. Went away on a julia rebuild.

@illerucis

@kmsquire I fixed the median calculation, and tested it by putting med = median([v[lo], v[mi], v[hi]]) before the swapping, and @assert v[mi] == med afterwards (code is here).

I also made changes to not place the pivot at v[hi]. The algorithm is noticeably faster without, and like you said it's not necessary.

Regarding #5081 (comment), I'll think about how to reduce the number of swaps (a solution without introducing more comparisons is not immediately obvious to me). For now I pushed a v3 with the swapping mechanics as is.

Thanks!

@kmsquire
The Julia Language member

(a solution without introducing more comparisons is not immediately obvious to me)

There is a way with more total code, but with any path only having at most 3 comparisons. I can post the code, but only if you want--I don't want to rob you of the challenge of figuring it out for yourself. ;-)

@illerucis

If the solution is allowed to have more total code, then challenge accepted.

@illerucis

At most 3 comparisons, at most two swaps?

@kmsquire
The Julia Language member

At most 3 comparisons, at most 4 assignments.

Swap should involve 3 assignments, unless there exists special purpose instructions to swap the contents of registers. (I don't think the x86 has these.)

@illerucis

I have at most 5 assignments (and it is slower than the current version). Looking at it more carefully now ...

@illerucis

At most 3 comparisons, and most 4 assignments here. I'm finding that it is still slower than the current version (marginally).

@kmsquire
The Julia Language member

Okay. Nice job getting the logic!

Stylistically, I would still write the swaps as swaps, so it's more understandable (with the understanding that the compiler will likely change that to something close to what you did by hand).

Just for reference, here's what I came up with (which is functionally identical to yours):

        if lt(o, v[mi], v[lo])
            if lt(o, v[hi], v[mi])
                v[lo], v[hi] = v[hi], v[lo]
            elseif lt(o, v[hi], v[lo])
                v[lo], v[mi], v[hi] = v[mi], v[hi], v[lo]
            else
                v[lo], v[mi] = v[mi], v[lo]
            end
        elseif lt(o, v[hi], v[mi])
            if lt(o, v[hi], v[lo])
                v[lo], v[mi], v[hi] = v[hi], v[lo], v[mi]
            else
                v[mi], v[hi] = v[hi], v[mi]
            end
        end

Not pretty, but you can see the swaps at least. (The 3-way assignments might be sketchy, performance-wise--not sure.)

Since this isn't faster than the current version, leave the current version--it's much easier to understand.

@kmsquire
The Julia Language member

Unfortunately, here's a possible regression. Can you run this and tell me what you get?

Using the version from base:

julia> d = rand(2^20); sort!(d); @time sort!(d); @time sort!(d, rev=true);
elapsed time: 0.048643169 seconds (96 bytes allocated)
elapsed time: 0.046859805 seconds (176 bytes allocated)

Using your pull request:

julia> d = rand(2^20); sort!(d); @time sort!(d); @time sort!(d, rev=true)
elapsed time: 0.055544636 seconds (48 bytes allocated)
ERROR: stack overflow
 in sort! at sort.jl:262
@kmsquire
The Julia Language member

To clarify: this version of quicksort seems to have issues when the data is already sorted in reverse. E.g.,

julia> d = rand(2^20); @time sort!(d, rev=true); @time sort!(d)
elapsed time: 0.116975951 seconds (128 bytes allocated)
ERROR: stack overflow
 in sort! at sort.jl:262

julia> d = rand(2^20); @time sort!(d, rev=true); @time sort!(d, rev=true)
elapsed time: 0.15468553 seconds (128 bytes allocated)
ERROR: stack overflow
 in sort! at sort.jl:262

It's unclear to me why at this point.

@illerucis

Major edit to this comment: I now am only getting a stack overflow error when sorting an array already sorted in reverse. The forward direction seems fine.

The below still stands ...

For the sorted array test, the Base version conveniently picks the middle element as the pivot, always. This is nice for sorted arrays, because each recursive iteration looks at an array of size N / 2 (~NlogN). However any naive pivot selection should have equally pathological cases that would cause ~N^2.

For any implementation I think the only way to guarantee ~NlogN is to do a Fisher-Yates shuffle (~N) prior to sorting.

@StefanKarpinski
The Julia Language member

Am I missing something here? Shouldn't the median of three and end up picking the middle element in both the forward and reverse sorted cases? Shuffling the array beforehand technically doesn't prevent O(n^2) worst-case run-time – it just makes it vanishingly unlikely and impossible to cause maliciously. It's just as effective to choose your pivots at random. The best-known way to actually certainly prevent the possibility of an O(n^2) worst case is the median of medians algorithm.

@illerucis

Am I missing something here?

No. I was thinking of the basic / naive Quicksort you first see in most algorithm textbooks (first element pivot), and edited my comment shortly after posting. My apologies.

Shouldn't the median of three and end up picking the middle element in both the forward and reverse sorted cases?

Yes. This should prevent O(N^2) in both sorted cases. The first thing I tested after reading Kevin's latest issue was the median selection in the reverse sorted case.

Still looking into the stack overflow error.

@illerucis

I explicitly wrote the reverse direction, and am not getting a stack overflow error on an array of size 2^20 when the array is sorted in reverse order. Reverse code here

julia> require("candidates.jl")
julia> d = rand(2^20); @time qsort_c_mp!(d); @time qsort_c_mp!(d); @assert issorted(d, rev=true); 
elapsed time: 0.257379639 seconds (1011208 bytes allocated)
elapsed time: 0.101460163 seconds (48 bytes allocated)
@kmsquire
The Julia Language member
@illerucis
julia> d = rand(2^20); @time sort!(d); @time qsort_c_mp!(d); @assert issorted(d, rev=true);
elapsed time: 0.145157068 seconds (48 bytes allocated)
elapsed time: 0.101269386 seconds (48 bytes allocated)

Like this?

@kmsquire
The Julia Language member
@illerucis

Absolutely - I got sidetracked with something yesterday but I'll take a careful look today

@illerucis

@kmsquire I worked on this today with @StefanKarpinski and he found that the issue was caused by nans2end(). Fixed here 0542ba8. I'm no longer getting a stack overflow error (it sorts the same speed on forward and reverse sorted arrays). I'm going to re-benchmark with this change to make sure we still have speed improvements with the new Quicksort.

@kmsquire
The Julia Language member
@StefanKarpinski
The Julia Language member

Ok, folks, what do we have to do at this point to get this merged? It would be nice to verify that there isn't a performance regression – and ideally a performance enhancement. To that end, let's try to a) get this code in what we think it's final state should be (maybe it's already there?), and b) run the full suite of sorting benchmarks and see how the new quicksort does, perhaps in comparison to the old quicksort too. The cleanest way to do the comparison may be to have a new SortingAlgorithm type that does the old one.

@illerucis

I did some simple benchmarking on Friday (using the script I wrote) and I'm getting about an 8% improvement in speed. I'll do the full suite of sorting benchmarks, with the suggestion of a new SortingAlgorithm type holding the current Base sort.

Regarding a), unless @kmsquire has any other suggestions, I think the only thing missing is the @inbounds

@kmsquire kmsquire and 2 others commented on an outdated diff Dec 16, 2013
@@ -255,19 +255,31 @@ function sort!(v::AbstractVector, lo::Int, hi::Int, ::InsertionSortAlg, o::Order
end
function sort!(v::AbstractVector, lo::Int, hi::Int, a::QuickSortAlg, o::Ordering)
- @inbounds while lo < hi
@kmsquire
The Julia Language member
kmsquire added a note Dec 16, 2013

Was @inbounds not useful?

@illerucis
illerucis added a note Dec 16, 2013

No it was - it's missing for some reason. It should go back. Also in the inner loop

@StefanKarpinski
The Julia Language member

It's only needed on the outer loop since that covers everything in the scope.

@StefanKarpinski
The Julia Language member

This @inbounds is still missing.

@illerucis
illerucis added a note Feb 11, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@StefanKarpinski
The Julia Language member

Well, you need to at least add the @inbounds back in. Presumably you did that in the version you're benchmarking with since otherwise I doubt you'd be seeing any kind of performance increase, let alone an 8% one.

@illerucis

Right - I was benchmarking with an isolated version of the algorithm not in my branch's Base. I'm going to put the old quicksort into Base with a new sorting algorithm type, put @inbounds back, and benchmark from Base

@StefanKarpinski
The Julia Language member

sounds good.

@kmsquire
The Julia Language member

Modulo the @inbounds question, it looks good to me. Minimal testing with SortPerf.jl suggested that it gives useful speedups over the original quicksort with most inputs. For some inputs (e.g., appending values to a large sorted list and then resorting), it might be slightly slower, but the speedups in most situations outweigh those, IMO.

Rob, you should check out and try to run SortPerf.jl (https://github.com/kmsquire/SortPerf.jl). Let me know if you have any issues (or just submit them), and feel free to suggest updates, etc.

@illerucis

@kmsquire will do!

@ViralBShah
The Julia Language member

Bump. Seems like we are almost there on this one.

@StefanKarpinski
The Julia Language member

@illerucis has assured me he's still working on this. I'll make sure it gets done :-)

@illerucis

getting sick during the holidays has been pretty disruptive, but I'm back on this now.

@illerucis

@kmsquire @StefanKarpinski The latest PDF / TSV reports are here, as a result of running SortPerf.jl with these parameters.

Should I try it with more repetitions? 10,000 was actually taking quite some time when I tried (several hours before finishing the suite for one sorting algorithm)

@StefanKarpinski
The Julia Language member

I don't really think we need more repetitions. It's really interesting that the old quicksort seems to be asymptotically better in a bunch of cases, but the new quicksort is a bit more resilient – especially to the quicksort killer.

@ViralBShah
The Julia Language member

@illerucis Good to hear you are better!

@StefanKarpinski
The Julia Language member

In fact, this behavior strikes me as backwards from what I would expect. I would expect the median-of-three algorithm to have better asymptotic behavior but a higher constant factor of overhead since each median selection is more expensive. Thus, I would expect pick the middle pivot selection to do better for small arrays where asymptotic behavior doesn't get a chance to compensate for the constant factor, and the median-of-three to do better for large arrays.

@kmsquire
The Julia Language member
@illerucis

Thanks for the feedback, guys!

I'm having a hard time understanding why the OldQuickSort does better on Float sorted/reverse arrays, but this version does better on Int sorted/reverse arrays. Maybe there is still something going on with fpsort or nans2end? Or is there some other explanation I'm missing?

@illerucis

I'm starting to check it out

@ViralBShah
The Julia Language member

Does code_llvm, code_native reveal anything?

@illerucis

I don't see anything immediately, but I'm going to look into that later tonight. Didn't get a chance to look at fpsort or nans2end yet but I hope to write some tests for that code later tonight to try and reveal something.

@gitfoxi gitfoxi referenced this pull request Jan 5, 2014
@JeffBezanson JeffBezanson fix isless and cmp/lexcmp for floating point
for now cmp() uses the total ordering, but we might change it to give a
DomainError for NaN arguments
2343ba0
@kmsquire
The Julia Language member

Bump. Any progress on this, @illerucis?

@illerucis

getting back into this now. I was going to head down the route of exploring fpsort and nans2end before I accept this int/float performance discrepancy in benchmarking

@illerucis

@kmsquire - I don't see anything (fpsort and nans2end seem fine after looking at them pretty carefully)

@kmsquire
The Julia Language member

Hi @illerucis, no worries. As @StefanKarpinski suggested in the other email thread, please go ahead and merge this. Thanks for all of your work (so far)!

If you're interested in exploring the dual pivot quicksort, I linked to a good paper here: #1691 (comment)

Cheers!

@kmsquire
The Julia Language member

Wait, I just realized that you probably can't merge this, right?

@kmsquire
The Julia Language member

(In which case, I or Stefan or someone else will.) It would be good (but not necessary) to remove the WIP from the title.

@illerucis

@kmsquire thanks! I added the @inbounds macro (I thought I already took care of that). I'm definitely interested in exploring the dual pivot quicksort. I'll look at the paper. Thanks!

@kmsquire
The Julia Language member

LGTM. Sorry not to notice earlier, but can you squash the commits? (https://help.github.com/articles/interactive-rebase)

@illerucis

absolutely

@kmsquire
The Julia Language member

@illerucis, Thanks for squashing.

Sorry to be a little picky, but all of your intermediate notes probably don't need to go in the final commit message, since that will become a (relatively) permanent record of the commit in the Julia tree. Can you do another interactive rebase and change the commit message to just include relevant information about the final commit? After that, I'll merge it.

@pao
The Julia Language member
pao commented Feb 12, 2014

You don't need an interactive rebase for that. git commit --amend -c HEAD works too. (Depending on your configuration, you can omit the -c HEAD.)

@kmsquire
The Julia Language member

Thanks, Patrick!

@illerucis illerucis A new median of three quicksort that explicitly places the pivot in t…
…he right place after the inner loop. Additional performance gains were achieved after removing unnecessary bounds checking on the inner loop, as suggested by Kevin Squire.
6de7025
@illerucis
@StefanKarpinski
The Julia Language member

Looks good to me. Let's do this!

@StefanKarpinski StefanKarpinski merged commit 272c26c into JuliaLang:master Feb 12, 2014

1 check passed

Details default The Travis CI build passed
@kmsquire
The Julia Language member

:+1:

@ViralBShah
The Julia Language member

It's great to see this merged!

@illerucis
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.