New Quicksort #5081

Merged
merged 1 commit into from Feb 12, 2014

Conversation

Projects
None yet
5 participants
@illerucis
Contributor

illerucis commented Dec 9, 2013

This is recommended change to Julia's Base Quicksort. Related issue: #4576. Implements a standard median-of-three Quicksort.

[ViralBShah: Added WIP to the title]

@StefanKarpinski

This comment has been minimized.

Show comment
Hide comment
@StefanKarpinski

StefanKarpinski Dec 9, 2013

Member

Cool. Thanks for submitting. cc: @kmsquire

Member

StefanKarpinski commented Dec 9, 2013

Cool. Thanks for submitting. cc: @kmsquire

@kmsquire

This comment has been minimized.

Show comment
Hide comment
@kmsquire

kmsquire Dec 9, 2013

Member

Sorry not to get back to you sooner. Check out my comments in #4576 (just posted).

Member

kmsquire commented Dec 9, 2013

Sorry not to get back to you sooner. Check out my comments in #4576 (just posted).

@kmsquire

View changes

base/sort.jl
@@ -275,18 +293,15 @@ end
function sort!(v::AbstractVector, lo::Int, hi::Int, a::MergeSortAlg, o::Ordering, t=similar(v))
@inbounds if lo < hi
hi-lo <= SMALL_THRESHOLD && return sort!(v, lo, hi, SMALL_ALGORITHM, o)
-

This comment has been minimized.

@kmsquire

kmsquire Dec 9, 2013

Member

Generally it's better to leave the whitespace alone, unless it clearly makes things easier to read. This seems to do the opposite.

@kmsquire

kmsquire Dec 9, 2013

Member

Generally it's better to leave the whitespace alone, unless it clearly makes things easier to read. This seems to do the opposite.

@kmsquire

View changes

base/sort.jl
- pivot = v[(lo+hi)>>>1]
- i, j = lo, hi
+ mi = (lo+hi)>>>1
+ if v[lo] > v[mi]

This comment has been minimized.

@kmsquire

kmsquire Dec 9, 2013

Member

You'll want to use lt(o, ...) for all of these tests as well.

@kmsquire

kmsquire Dec 9, 2013

Member

You'll want to use lt(o, ...) for all of these tests as well.

@illerucis

This comment has been minimized.

Show comment
Hide comment
@illerucis

illerucis Dec 9, 2013

Contributor

Working on changes mentioned here #4576 (comment). Sorry about the whitespace edit - that was an accident. Should have been more careful in reviewing.

Contributor

illerucis commented Dec 9, 2013

Working on changes mentioned here #4576 (comment). Sorry about the whitespace edit - that was an accident. Should have been more careful in reviewing.

@kmsquire

This comment has been minimized.

Show comment
Hide comment
@kmsquire

kmsquire Dec 9, 2013

Member

Great! No worries on the whitespace edit--that's what code review is for. ;-)

Member

kmsquire commented Dec 9, 2013

Great! No worries on the whitespace edit--that's what code review is for. ;-)

@StefanKarpinski

This comment has been minimized.

Show comment
Hide comment
@StefanKarpinski

StefanKarpinski Dec 9, 2013

Member

No worries, that's what pull requests are for :-)

Member

StefanKarpinski commented Dec 9, 2013

No worries, that's what pull requests are for :-)

@kmsquire

This comment has been minimized.

Show comment
Hide comment
@kmsquire

kmsquire Dec 10, 2013

Member

Hey Rob, can you also test a version which does not store the pivot in the
first location, but simply calculates and uses the median of the lo, mid,
and high points? Because the current QuickSort already puts the pivot in
the right place (and ignores it thereafter) most of the time, just using
the median might be faster.

Thanks, Kevin

On Mon, Dec 9, 2013 at 3:52 PM, Stefan Karpinski
notifications@github.comwrote:

No worries, that's what pull requests are for :-)


Reply to this email directly or view it on GitHubhttps://github.com/JuliaLang/julia/pull/5081#issuecomment-30186825
.

Member

kmsquire commented Dec 10, 2013

Hey Rob, can you also test a version which does not store the pivot in the
first location, but simply calculates and uses the median of the lo, mid,
and high points? Because the current QuickSort already puts the pivot in
the right place (and ignores it thereafter) most of the time, just using
the median might be faster.

Thanks, Kevin

On Mon, Dec 9, 2013 at 3:52 PM, Stefan Karpinski
notifications@github.comwrote:

No worries, that's what pull requests are for :-)


Reply to this email directly or view it on GitHubhttps://github.com/JuliaLang/julia/pull/5081#issuecomment-30186825
.

@illerucis

This comment has been minimized.

Show comment
Hide comment
@illerucis

illerucis Dec 10, 2013

Contributor

On it. Will report back soon.

Contributor

illerucis commented Dec 10, 2013

On it. Will report back soon.

@illerucis

This comment has been minimized.

Show comment
Hide comment
@illerucis

illerucis Dec 10, 2013

Contributor

@kmsquire I did some testing with the version you mentioned - one that simply calculates the median and uses that as a pivot without placing the pivot in the first location, and without explicitly placing it in the right place after a passing. The code can be found here. Let me know if I misinterpreted anything. That file also contains a modified version of this pull request according to your earlier comments here

I found that this version is slower than the median-of-three pivot algorithm that uses the suggestions in your earlier comments (going to modify the pull request shortly to reflect these changes, for re-review).

Contributor

illerucis commented Dec 10, 2013

@kmsquire I did some testing with the version you mentioned - one that simply calculates the median and uses that as a pivot without placing the pivot in the first location, and without explicitly placing it in the right place after a passing. The code can be found here. Let me know if I misinterpreted anything. That file also contains a modified version of this pull request according to your earlier comments here

I found that this version is slower than the median-of-three pivot algorithm that uses the suggestions in your earlier comments (going to modify the pull request shortly to reflect these changes, for re-review).

@kmsquire

This comment has been minimized.

Show comment
Hide comment
@kmsquire

kmsquire Dec 10, 2013

Member

That's what I was looking for--thanks for checking. Did you compare my suggestions with your original? Either way, feel free to update this pull request with your changes.

Cheers, Kevin

Member

kmsquire commented Dec 10, 2013

That's what I was looking for--thanks for checking. Did you compare my suggestions with your original? Either way, feel free to update this pull request with your changes.

Cheers, Kevin

@illerucis

This comment has been minimized.

Show comment
Hide comment
@illerucis

illerucis Dec 10, 2013

Contributor

So, one way to eliminate this test is to also put the pivot in the high position, and restore the real value outside the loop. For Rob's median version, we can do this, because he ensured that v[hi] >= v[pivot]. (kmsquire/julia_qsortbenchmarks@820eb81). Nice!

I implemented this and saw a big improvement in speed, but the way it's currently written won't work for arrays sorted in reverse order (correct me if I'm wrong). I'm working on changing the median calculation to use a comparator like lt() in the inner loop with ordering.

Contributor

illerucis commented Dec 10, 2013

So, one way to eliminate this test is to also put the pivot in the high position, and restore the real value outside the loop. For Rob's median version, we can do this, because he ensured that v[hi] >= v[pivot]. (kmsquire/julia_qsortbenchmarks@820eb81). Nice!

I implemented this and saw a big improvement in speed, but the way it's currently written won't work for arrays sorted in reverse order (correct me if I'm wrong). I'm working on changing the median calculation to use a comparator like lt() in the inner loop with ordering.

@kmsquire

This comment has been minimized.

Show comment
Hide comment
@kmsquire

kmsquire Dec 10, 2013

Member

Absolutely correct! It should work fine if you use lt() everywhere.

Member

kmsquire commented Dec 10, 2013

Absolutely correct! It should work fine if you use lt() everywhere.

@illerucis

This comment has been minimized.

Show comment
Hide comment
@illerucis

illerucis Dec 10, 2013

Contributor

Great - make test passed, but I just want to be doubly sure that I'm still seeing performance improvements using lt()

Contributor

illerucis commented Dec 10, 2013

Great - make test passed, but I just want to be doubly sure that I'm still seeing performance improvements using lt()

@StefanKarpinski

This comment has been minimized.

Show comment
Hide comment
@StefanKarpinski

StefanKarpinski Dec 10, 2013

Member

There shouldn't be any overhead from calling lt relative to calling < but it's always good to check.

Member

StefanKarpinski commented Dec 10, 2013

There shouldn't be any overhead from calling lt relative to calling < but it's always good to check.

@illerucis

This comment has been minimized.

Show comment
Hide comment
@illerucis

illerucis Dec 10, 2013

Contributor

I checked and there was no (noticeable) hit in performance.

Contributor

illerucis commented Dec 10, 2013

I checked and there was no (noticeable) hit in performance.

@kmsquire

View changes

base/sort.jl
+ if lt(o, v[hi], v[lo])
+ v[lo], v[hi] = v[hi], v[lo]
+ end
+ if lt(o, v[lo], v[mi])

This comment has been minimized.

@kmsquire

kmsquire Dec 10, 2013

Member

Looks like the test here is backwards. If v[lo] < v[mi], we don't want to exchange, right?

@kmsquire

kmsquire Dec 10, 2013

Member

Looks like the test here is backwards. If v[lo] < v[mi], we don't want to exchange, right?

This comment has been minimized.

@kmsquire

kmsquire Dec 10, 2013

Member

Hmmm... I'm not following the logic of the 3 way sort. It might have been right previously (comparing v[mi] with v[hi]--sorry if I misled), but please look at it carefully. I would do

b < a  => swap(a,b)
c < b => swap(b,c)
b < a => swap(a,b)

But what you had before might be equivalent. This currently isn't.

@kmsquire

kmsquire Dec 10, 2013

Member

Hmmm... I'm not following the logic of the 3 way sort. It might have been right previously (comparing v[mi] with v[hi]--sorry if I misled), but please look at it carefully. I would do

b < a  => swap(a,b)
c < b => swap(b,c)
b < a => swap(a,b)

But what you had before might be equivalent. This currently isn't.

This comment has been minimized.

@kmsquire

kmsquire Dec 10, 2013

Member

Actually, you might consider rewriting these to minimize the number of writes.

@kmsquire

kmsquire Dec 10, 2013

Member

Actually, you might consider rewriting these to minimize the number of writes.

This comment has been minimized.

@StefanKarpinski

StefanKarpinski Dec 10, 2013

Member

Minimizing writes is probably a good metric, although who knows what the CPU does (can one write catch another?).

@StefanKarpinski

StefanKarpinski Dec 10, 2013

Member

Minimizing writes is probably a good metric, although who knows what the CPU does (can one write catch another?).

@kmsquire

This comment has been minimized.

Show comment
Hide comment
@kmsquire

kmsquire Dec 10, 2013

Member

Couple of comments

  1. After thinking about it a little more, I realized that placing the pivot at v[hi] is totally unnecessary. After the median sort, that value is already guaranteed to be greater than or equal to the pivot, which means that i will stop there, and will always be >=j if it does. Your original check is unnecessary, and so is my "fix"

  2. I'm seeing a really strange regression, where sorting an already sorted array with your current version is exceedingly slow. I haven't been able to explain it. (I changed your median sort to match my comment.)

Member

kmsquire commented Dec 10, 2013

Couple of comments

  1. After thinking about it a little more, I realized that placing the pivot at v[hi] is totally unnecessary. After the median sort, that value is already guaranteed to be greater than or equal to the pivot, which means that i will stop there, and will always be >=j if it does. Your original check is unnecessary, and so is my "fix"

  2. I'm seeing a really strange regression, where sorting an already sorted array with your current version is exceedingly slow. I haven't been able to explain it. (I changed your median sort to match my comment.)

@kmsquire

This comment has been minimized.

Show comment
Hide comment
@kmsquire

kmsquire Dec 10, 2013

Member

Never mind the slowness. Went away on a julia rebuild.

Member

kmsquire commented Dec 10, 2013

Never mind the slowness. Went away on a julia rebuild.

@illerucis

This comment has been minimized.

Show comment
Hide comment
@illerucis

illerucis Dec 10, 2013

Contributor

@kmsquire I fixed the median calculation, and tested it by putting med = median([v[lo], v[mi], v[hi]]) before the swapping, and @assert v[mi] == med afterwards (code is here).

I also made changes to not place the pivot at v[hi]. The algorithm is noticeably faster without, and like you said it's not necessary.

Regarding #5081 (comment), I'll think about how to reduce the number of swaps (a solution without introducing more comparisons is not immediately obvious to me). For now I pushed a v3 with the swapping mechanics as is.

Thanks!

Contributor

illerucis commented Dec 10, 2013

@kmsquire I fixed the median calculation, and tested it by putting med = median([v[lo], v[mi], v[hi]]) before the swapping, and @assert v[mi] == med afterwards (code is here).

I also made changes to not place the pivot at v[hi]. The algorithm is noticeably faster without, and like you said it's not necessary.

Regarding #5081 (comment), I'll think about how to reduce the number of swaps (a solution without introducing more comparisons is not immediately obvious to me). For now I pushed a v3 with the swapping mechanics as is.

Thanks!

@kmsquire

This comment has been minimized.

Show comment
Hide comment
@kmsquire

kmsquire Dec 10, 2013

Member

(a solution without introducing more comparisons is not immediately obvious to me)

There is a way with more total code, but with any path only having at most 3 comparisons. I can post the code, but only if you want--I don't want to rob you of the challenge of figuring it out for yourself. ;-)

Member

kmsquire commented Dec 10, 2013

(a solution without introducing more comparisons is not immediately obvious to me)

There is a way with more total code, but with any path only having at most 3 comparisons. I can post the code, but only if you want--I don't want to rob you of the challenge of figuring it out for yourself. ;-)

@illerucis

This comment has been minimized.

Show comment
Hide comment
@illerucis

illerucis Dec 10, 2013

Contributor

If the solution is allowed to have more total code, then challenge accepted.

Contributor

illerucis commented Dec 10, 2013

If the solution is allowed to have more total code, then challenge accepted.

@illerucis

This comment has been minimized.

Show comment
Hide comment
@illerucis

illerucis Dec 10, 2013

Contributor

At most 3 comparisons, at most two swaps?

Contributor

illerucis commented Dec 10, 2013

At most 3 comparisons, at most two swaps?

@kmsquire

This comment has been minimized.

Show comment
Hide comment
@kmsquire

kmsquire Dec 10, 2013

Member

At most 3 comparisons, at most 4 assignments.

Swap should involve 3 assignments, unless there exists special purpose instructions to swap the contents of registers. (I don't think the x86 has these.)

Member

kmsquire commented Dec 10, 2013

At most 3 comparisons, at most 4 assignments.

Swap should involve 3 assignments, unless there exists special purpose instructions to swap the contents of registers. (I don't think the x86 has these.)

@illerucis

This comment has been minimized.

Show comment
Hide comment
@illerucis

illerucis Dec 10, 2013

Contributor

I have at most 5 assignments (and it is slower than the current version). Looking at it more carefully now ...

Contributor

illerucis commented Dec 10, 2013

I have at most 5 assignments (and it is slower than the current version). Looking at it more carefully now ...

@illerucis

This comment has been minimized.

Show comment
Hide comment
@illerucis

illerucis Dec 10, 2013

Contributor

At most 3 comparisons, and most 4 assignments here. I'm finding that it is still slower than the current version (marginally).

Contributor

illerucis commented Dec 10, 2013

At most 3 comparisons, and most 4 assignments here. I'm finding that it is still slower than the current version (marginally).

@kmsquire

This comment has been minimized.

Show comment
Hide comment
@kmsquire

kmsquire Dec 11, 2013

Member

Okay. Nice job getting the logic!

Stylistically, I would still write the swaps as swaps, so it's more understandable (with the understanding that the compiler will likely change that to something close to what you did by hand).

Just for reference, here's what I came up with (which is functionally identical to yours):

        if lt(o, v[mi], v[lo])
            if lt(o, v[hi], v[mi])
                v[lo], v[hi] = v[hi], v[lo]
            elseif lt(o, v[hi], v[lo])
                v[lo], v[mi], v[hi] = v[mi], v[hi], v[lo]
            else
                v[lo], v[mi] = v[mi], v[lo]
            end
        elseif lt(o, v[hi], v[mi])
            if lt(o, v[hi], v[lo])
                v[lo], v[mi], v[hi] = v[hi], v[lo], v[mi]
            else
                v[mi], v[hi] = v[hi], v[mi]
            end
        end

Not pretty, but you can see the swaps at least. (The 3-way assignments might be sketchy, performance-wise--not sure.)

Since this isn't faster than the current version, leave the current version--it's much easier to understand.

Member

kmsquire commented Dec 11, 2013

Okay. Nice job getting the logic!

Stylistically, I would still write the swaps as swaps, so it's more understandable (with the understanding that the compiler will likely change that to something close to what you did by hand).

Just for reference, here's what I came up with (which is functionally identical to yours):

        if lt(o, v[mi], v[lo])
            if lt(o, v[hi], v[mi])
                v[lo], v[hi] = v[hi], v[lo]
            elseif lt(o, v[hi], v[lo])
                v[lo], v[mi], v[hi] = v[mi], v[hi], v[lo]
            else
                v[lo], v[mi] = v[mi], v[lo]
            end
        elseif lt(o, v[hi], v[mi])
            if lt(o, v[hi], v[lo])
                v[lo], v[mi], v[hi] = v[hi], v[lo], v[mi]
            else
                v[mi], v[hi] = v[hi], v[mi]
            end
        end

Not pretty, but you can see the swaps at least. (The 3-way assignments might be sketchy, performance-wise--not sure.)

Since this isn't faster than the current version, leave the current version--it's much easier to understand.

@kmsquire

This comment has been minimized.

Show comment
Hide comment
@kmsquire

kmsquire Dec 11, 2013

Member

Unfortunately, here's a possible regression. Can you run this and tell me what you get?

Using the version from base:

julia> d = rand(2^20); sort!(d); @time sort!(d); @time sort!(d, rev=true);
elapsed time: 0.048643169 seconds (96 bytes allocated)
elapsed time: 0.046859805 seconds (176 bytes allocated)

Using your pull request:

julia> d = rand(2^20); sort!(d); @time sort!(d); @time sort!(d, rev=true)
elapsed time: 0.055544636 seconds (48 bytes allocated)
ERROR: stack overflow
 in sort! at sort.jl:262
Member

kmsquire commented Dec 11, 2013

Unfortunately, here's a possible regression. Can you run this and tell me what you get?

Using the version from base:

julia> d = rand(2^20); sort!(d); @time sort!(d); @time sort!(d, rev=true);
elapsed time: 0.048643169 seconds (96 bytes allocated)
elapsed time: 0.046859805 seconds (176 bytes allocated)

Using your pull request:

julia> d = rand(2^20); sort!(d); @time sort!(d); @time sort!(d, rev=true)
elapsed time: 0.055544636 seconds (48 bytes allocated)
ERROR: stack overflow
 in sort! at sort.jl:262
@kmsquire

This comment has been minimized.

Show comment
Hide comment
@kmsquire

kmsquire Dec 11, 2013

Member

To clarify: this version of quicksort seems to have issues when the data is already sorted in reverse. E.g.,

julia> d = rand(2^20); @time sort!(d, rev=true); @time sort!(d)
elapsed time: 0.116975951 seconds (128 bytes allocated)
ERROR: stack overflow
 in sort! at sort.jl:262

julia> d = rand(2^20); @time sort!(d, rev=true); @time sort!(d, rev=true)
elapsed time: 0.15468553 seconds (128 bytes allocated)
ERROR: stack overflow
 in sort! at sort.jl:262

It's unclear to me why at this point.

Member

kmsquire commented Dec 11, 2013

To clarify: this version of quicksort seems to have issues when the data is already sorted in reverse. E.g.,

julia> d = rand(2^20); @time sort!(d, rev=true); @time sort!(d)
elapsed time: 0.116975951 seconds (128 bytes allocated)
ERROR: stack overflow
 in sort! at sort.jl:262

julia> d = rand(2^20); @time sort!(d, rev=true); @time sort!(d, rev=true)
elapsed time: 0.15468553 seconds (128 bytes allocated)
ERROR: stack overflow
 in sort! at sort.jl:262

It's unclear to me why at this point.

@illerucis

This comment has been minimized.

Show comment
Hide comment
@illerucis

illerucis Dec 11, 2013

Contributor

Major edit to this comment: I now am only getting a stack overflow error when sorting an array already sorted in reverse. The forward direction seems fine.

The below still stands ...

For the sorted array test, the Base version conveniently picks the middle element as the pivot, always. This is nice for sorted arrays, because each recursive iteration looks at an array of size N / 2 (~NlogN). However any naive pivot selection should have equally pathological cases that would cause ~N^2.

For any implementation I think the only way to guarantee ~NlogN is to do a Fisher-Yates shuffle (~N) prior to sorting.

Contributor

illerucis commented Dec 11, 2013

Major edit to this comment: I now am only getting a stack overflow error when sorting an array already sorted in reverse. The forward direction seems fine.

The below still stands ...

For the sorted array test, the Base version conveniently picks the middle element as the pivot, always. This is nice for sorted arrays, because each recursive iteration looks at an array of size N / 2 (~NlogN). However any naive pivot selection should have equally pathological cases that would cause ~N^2.

For any implementation I think the only way to guarantee ~NlogN is to do a Fisher-Yates shuffle (~N) prior to sorting.

@StefanKarpinski

This comment has been minimized.

Show comment
Hide comment
@StefanKarpinski

StefanKarpinski Dec 11, 2013

Member

Am I missing something here? Shouldn't the median of three and end up picking the middle element in both the forward and reverse sorted cases? Shuffling the array beforehand technically doesn't prevent O(n^2) worst-case run-time – it just makes it vanishingly unlikely and impossible to cause maliciously. It's just as effective to choose your pivots at random. The best-known way to actually certainly prevent the possibility of an O(n^2) worst case is the median of medians algorithm.

Member

StefanKarpinski commented Dec 11, 2013

Am I missing something here? Shouldn't the median of three and end up picking the middle element in both the forward and reverse sorted cases? Shuffling the array beforehand technically doesn't prevent O(n^2) worst-case run-time – it just makes it vanishingly unlikely and impossible to cause maliciously. It's just as effective to choose your pivots at random. The best-known way to actually certainly prevent the possibility of an O(n^2) worst case is the median of medians algorithm.

@illerucis

This comment has been minimized.

Show comment
Hide comment
@illerucis

illerucis Dec 11, 2013

Contributor

Am I missing something here?

No. I was thinking of the basic / naive Quicksort you first see in most algorithm textbooks (first element pivot), and edited my comment shortly after posting. My apologies.

Shouldn't the median of three and end up picking the middle element in both the forward and reverse sorted cases?

Yes. This should prevent O(N^2) in both sorted cases. The first thing I tested after reading Kevin's latest issue was the median selection in the reverse sorted case.

Still looking into the stack overflow error.

Contributor

illerucis commented Dec 11, 2013

Am I missing something here?

No. I was thinking of the basic / naive Quicksort you first see in most algorithm textbooks (first element pivot), and edited my comment shortly after posting. My apologies.

Shouldn't the median of three and end up picking the middle element in both the forward and reverse sorted cases?

Yes. This should prevent O(N^2) in both sorted cases. The first thing I tested after reading Kevin's latest issue was the median selection in the reverse sorted case.

Still looking into the stack overflow error.

@illerucis

This comment has been minimized.

Show comment
Hide comment
@illerucis

illerucis Dec 11, 2013

Contributor

I explicitly wrote the reverse direction, and am not getting a stack overflow error on an array of size 2^20 when the array is sorted in reverse order. Reverse code here

julia> require("candidates.jl")
julia> d = rand(2^20); @time qsort_c_mp!(d); @time qsort_c_mp!(d); @assert issorted(d, rev=true); 
elapsed time: 0.257379639 seconds (1011208 bytes allocated)
elapsed time: 0.101460163 seconds (48 bytes allocated)
Contributor

illerucis commented Dec 11, 2013

I explicitly wrote the reverse direction, and am not getting a stack overflow error on an array of size 2^20 when the array is sorted in reverse order. Reverse code here

julia> require("candidates.jl")
julia> d = rand(2^20); @time qsort_c_mp!(d); @time qsort_c_mp!(d); @assert issorted(d, rev=true); 
elapsed time: 0.257379639 seconds (1011208 bytes allocated)
elapsed time: 0.101460163 seconds (48 bytes allocated)
@kmsquire

This comment has been minimized.

Show comment
Hide comment
@kmsquire

kmsquire Dec 11, 2013

Member

For that code, can you try sorting it first in the forward direction (say,
with the built-in sort),, and then in reverse using your
reverse-specialized quicksort?

On Wednesday, December 11, 2013, Rob wrote:

I explicitly wrote the reverse direction, and am not getting a stack
overflow error on an array of size 2^20 when the array is sorted in
reverse order.

julia> d = rand(2^20); @time qsort_c_mp!(d); @time qsort_c_mp!(d); @Assert issorted(d, rev=true);
elapsed time: 0.257379639 seconds (1011208 bytes allocated)
elapsed time: 0.101460163 seconds (48 bytes allocated)


Reply to this email directly or view it on GitHubhttps://github.com/JuliaLang/julia/pull/5081#issuecomment-30336904
.

Member

kmsquire commented Dec 11, 2013

For that code, can you try sorting it first in the forward direction (say,
with the built-in sort),, and then in reverse using your
reverse-specialized quicksort?

On Wednesday, December 11, 2013, Rob wrote:

I explicitly wrote the reverse direction, and am not getting a stack
overflow error on an array of size 2^20 when the array is sorted in
reverse order.

julia> d = rand(2^20); @time qsort_c_mp!(d); @time qsort_c_mp!(d); @Assert issorted(d, rev=true);
elapsed time: 0.257379639 seconds (1011208 bytes allocated)
elapsed time: 0.101460163 seconds (48 bytes allocated)


Reply to this email directly or view it on GitHubhttps://github.com/JuliaLang/julia/pull/5081#issuecomment-30336904
.

@illerucis

This comment has been minimized.

Show comment
Hide comment
@illerucis

illerucis Dec 11, 2013

Contributor
julia> d = rand(2^20); @time sort!(d); @time qsort_c_mp!(d); @assert issorted(d, rev=true);
elapsed time: 0.145157068 seconds (48 bytes allocated)
elapsed time: 0.101269386 seconds (48 bytes allocated)

Like this?

Contributor

illerucis commented Dec 11, 2013

julia> d = rand(2^20); @time sort!(d); @time qsort_c_mp!(d); @assert issorted(d, rev=true);
elapsed time: 0.145157068 seconds (48 bytes allocated)
elapsed time: 0.101269386 seconds (48 bytes allocated)

Like this?

@kmsquire

This comment has been minimized.

Show comment
Hide comment
@kmsquire

kmsquire Dec 12, 2013

Member

Okay, so it seems to be limited to the version in base. At this point, it
would be best to try to dissect that one (instrument and/or profile). I
won't be able to look for a few days, so if you have time and can take a
closer look at that version, it would be great, Rob.

Cheers, Kevin

On Wednesday, December 11, 2013, Rob wrote:

julia> d = rand(2^20); @time sort!(d); @time qsort_c_mp!(d); @Assert issorted(d, rev=true);
elapsed time: 0.145157068 seconds (48 bytes allocated)
elapsed time: 0.101269386 seconds (48 bytes allocated)

Like this?


Reply to this email directly or view it on GitHubhttps://github.com/JuliaLang/julia/pull/5081#issuecomment-30354146
.

Member

kmsquire commented Dec 12, 2013

Okay, so it seems to be limited to the version in base. At this point, it
would be best to try to dissect that one (instrument and/or profile). I
won't be able to look for a few days, so if you have time and can take a
closer look at that version, it would be great, Rob.

Cheers, Kevin

On Wednesday, December 11, 2013, Rob wrote:

julia> d = rand(2^20); @time sort!(d); @time qsort_c_mp!(d); @Assert issorted(d, rev=true);
elapsed time: 0.145157068 seconds (48 bytes allocated)
elapsed time: 0.101269386 seconds (48 bytes allocated)

Like this?


Reply to this email directly or view it on GitHubhttps://github.com/JuliaLang/julia/pull/5081#issuecomment-30354146
.

@illerucis

This comment has been minimized.

Show comment
Hide comment
@illerucis

illerucis Dec 12, 2013

Contributor

Absolutely - I got sidetracked with something yesterday but I'll take a careful look today

Contributor

illerucis commented Dec 12, 2013

Absolutely - I got sidetracked with something yesterday but I'll take a careful look today

@illerucis

This comment has been minimized.

Show comment
Hide comment
@illerucis

illerucis Dec 12, 2013

Contributor

@kmsquire I worked on this today with @StefanKarpinski and he found that the issue was caused by nans2end(). Fixed here 0542ba8. I'm no longer getting a stack overflow error (it sorts the same speed on forward and reverse sorted arrays). I'm going to re-benchmark with this change to make sure we still have speed improvements with the new Quicksort.

Contributor

illerucis commented Dec 12, 2013

@kmsquire I worked on this today with @StefanKarpinski and he found that the issue was caused by nans2end(). Fixed here 0542ba8. I'm no longer getting a stack overflow error (it sorts the same speed on forward and reverse sorted arrays). I'm going to re-benchmark with this change to make sure we still have speed improvements with the new Quicksort.

@kmsquire

This comment has been minimized.

Show comment
Hide comment
@kmsquire

kmsquire Dec 12, 2013

Member

Nice work to both!

On Thursday, December 12, 2013, Rob wrote:

@kmsquire https://github.com/kmsquire I worked on this today with
@StefanKarpinski https://github.com/StefanKarpinski and he found that
the issue was caused by nans2end(). Fixed here 0542ba80542ba8.
I'm no longer getting a stack overflow error (it sorts the same speed on
forward and reverse sorted arrays). I'm going to re-benchmark with this
change to make sure we still have speed improvements with the new
Quicksort.


Reply to this email directly or view it on GitHubhttps://github.com/JuliaLang/julia/pull/5081#issuecomment-30469566
.

Member

kmsquire commented Dec 12, 2013

Nice work to both!

On Thursday, December 12, 2013, Rob wrote:

@kmsquire https://github.com/kmsquire I worked on this today with
@StefanKarpinski https://github.com/StefanKarpinski and he found that
the issue was caused by nans2end(). Fixed here 0542ba80542ba8.
I'm no longer getting a stack overflow error (it sorts the same speed on
forward and reverse sorted arrays). I'm going to re-benchmark with this
change to make sure we still have speed improvements with the new
Quicksort.


Reply to this email directly or view it on GitHubhttps://github.com/JuliaLang/julia/pull/5081#issuecomment-30469566
.

@StefanKarpinski

This comment has been minimized.

Show comment
Hide comment
@StefanKarpinski

StefanKarpinski Dec 16, 2013

Member

Ok, folks, what do we have to do at this point to get this merged? It would be nice to verify that there isn't a performance regression – and ideally a performance enhancement. To that end, let's try to a) get this code in what we think it's final state should be (maybe it's already there?), and b) run the full suite of sorting benchmarks and see how the new quicksort does, perhaps in comparison to the old quicksort too. The cleanest way to do the comparison may be to have a new SortingAlgorithm type that does the old one.

Member

StefanKarpinski commented Dec 16, 2013

Ok, folks, what do we have to do at this point to get this merged? It would be nice to verify that there isn't a performance regression – and ideally a performance enhancement. To that end, let's try to a) get this code in what we think it's final state should be (maybe it's already there?), and b) run the full suite of sorting benchmarks and see how the new quicksort does, perhaps in comparison to the old quicksort too. The cleanest way to do the comparison may be to have a new SortingAlgorithm type that does the old one.

@StefanKarpinski

This comment has been minimized.

Show comment
Hide comment
@StefanKarpinski

StefanKarpinski Dec 16, 2013

Member

Well, you need to at least add the @inbounds back in. Presumably you did that in the version you're benchmarking with since otherwise I doubt you'd be seeing any kind of performance increase, let alone an 8% one.

Member

StefanKarpinski commented Dec 16, 2013

Well, you need to at least add the @inbounds back in. Presumably you did that in the version you're benchmarking with since otherwise I doubt you'd be seeing any kind of performance increase, let alone an 8% one.

@illerucis

This comment has been minimized.

Show comment
Hide comment
@illerucis

illerucis Dec 16, 2013

Contributor

Right - I was benchmarking with an isolated version of the algorithm not in my branch's Base. I'm going to put the old quicksort into Base with a new sorting algorithm type, put @inbounds back, and benchmark from Base

Contributor

illerucis commented Dec 16, 2013

Right - I was benchmarking with an isolated version of the algorithm not in my branch's Base. I'm going to put the old quicksort into Base with a new sorting algorithm type, put @inbounds back, and benchmark from Base

@StefanKarpinski

This comment has been minimized.

Show comment
Hide comment
@StefanKarpinski

StefanKarpinski Dec 16, 2013

Member

sounds good.

Member

StefanKarpinski commented Dec 16, 2013

sounds good.

@kmsquire

This comment has been minimized.

Show comment
Hide comment
@kmsquire

kmsquire Dec 16, 2013

Member

Modulo the @inbounds question, it looks good to me. Minimal testing with SortPerf.jl suggested that it gives useful speedups over the original quicksort with most inputs. For some inputs (e.g., appending values to a large sorted list and then resorting), it might be slightly slower, but the speedups in most situations outweigh those, IMO.

Rob, you should check out and try to run SortPerf.jl (https://github.com/kmsquire/SortPerf.jl). Let me know if you have any issues (or just submit them), and feel free to suggest updates, etc.

Member

kmsquire commented Dec 16, 2013

Modulo the @inbounds question, it looks good to me. Minimal testing with SortPerf.jl suggested that it gives useful speedups over the original quicksort with most inputs. For some inputs (e.g., appending values to a large sorted list and then resorting), it might be slightly slower, but the speedups in most situations outweigh those, IMO.

Rob, you should check out and try to run SortPerf.jl (https://github.com/kmsquire/SortPerf.jl). Let me know if you have any issues (or just submit them), and feel free to suggest updates, etc.

@illerucis

This comment has been minimized.

Show comment
Hide comment
@illerucis

illerucis Dec 16, 2013

Contributor

@kmsquire will do!

Contributor

illerucis commented Dec 16, 2013

@kmsquire will do!

@ViralBShah

This comment has been minimized.

Show comment
Hide comment
@ViralBShah

ViralBShah Dec 25, 2013

Member

Bump. Seems like we are almost there on this one.

Member

ViralBShah commented Dec 25, 2013

Bump. Seems like we are almost there on this one.

@StefanKarpinski

This comment has been minimized.

Show comment
Hide comment
@StefanKarpinski

StefanKarpinski Dec 25, 2013

Member

@illerucis has assured me he's still working on this. I'll make sure it gets done :-)

Member

StefanKarpinski commented Dec 25, 2013

@illerucis has assured me he's still working on this. I'll make sure it gets done :-)

@illerucis

This comment has been minimized.

Show comment
Hide comment
@illerucis

illerucis Dec 27, 2013

Contributor

getting sick during the holidays has been pretty disruptive, but I'm back on this now.

Contributor

illerucis commented Dec 27, 2013

getting sick during the holidays has been pretty disruptive, but I'm back on this now.

@illerucis

This comment has been minimized.

Show comment
Hide comment
@illerucis

illerucis Dec 28, 2013

Contributor

@kmsquire @StefanKarpinski The latest PDF / TSV reports are here, as a result of running SortPerf.jl with these parameters.

Should I try it with more repetitions? 10,000 was actually taking quite some time when I tried (several hours before finishing the suite for one sorting algorithm)

Contributor

illerucis commented Dec 28, 2013

@kmsquire @StefanKarpinski The latest PDF / TSV reports are here, as a result of running SortPerf.jl with these parameters.

Should I try it with more repetitions? 10,000 was actually taking quite some time when I tried (several hours before finishing the suite for one sorting algorithm)

@StefanKarpinski

This comment has been minimized.

Show comment
Hide comment
@StefanKarpinski

StefanKarpinski Dec 28, 2013

Member

I don't really think we need more repetitions. It's really interesting that the old quicksort seems to be asymptotically better in a bunch of cases, but the new quicksort is a bit more resilient – especially to the quicksort killer.

Member

StefanKarpinski commented Dec 28, 2013

I don't really think we need more repetitions. It's really interesting that the old quicksort seems to be asymptotically better in a bunch of cases, but the new quicksort is a bit more resilient – especially to the quicksort killer.

@ViralBShah

This comment has been minimized.

Show comment
Hide comment
@ViralBShah

ViralBShah Dec 28, 2013

Member

@illerucis Good to hear you are better!

Member

ViralBShah commented Dec 28, 2013

@illerucis Good to hear you are better!

@StefanKarpinski

This comment has been minimized.

Show comment
Hide comment
@StefanKarpinski

StefanKarpinski Dec 28, 2013

Member

In fact, this behavior strikes me as backwards from what I would expect. I would expect the median-of-three algorithm to have better asymptotic behavior but a higher constant factor of overhead since each median selection is more expensive. Thus, I would expect pick the middle pivot selection to do better for small arrays where asymptotic behavior doesn't get a chance to compensate for the constant factor, and the median-of-three to do better for large arrays.

Member

StefanKarpinski commented Dec 28, 2013

In fact, this behavior strikes me as backwards from what I would expect. I would expect the median-of-three algorithm to have better asymptotic behavior but a higher constant factor of overhead since each median selection is more expensive. Thus, I would expect pick the middle pivot selection to do better for small arrays where asymptotic behavior doesn't get a chance to compensate for the constant factor, and the median-of-three to do better for large arrays.

@kmsquire

This comment has been minimized.

Show comment
Hide comment
@kmsquire

kmsquire Dec 28, 2013

Member

Nice! I say go for it.

In @illericis's implementation, there is very little overhead, actually.
The pivot choice takes 3 comparisons, but replaces the first high and low
comparison, so there's only one extra comparison, plus the pivot swap. (In
fact, there's no reason to even store the pivot in the first position, but
in my tests, changing this didn't make any difference).

Anyway, nice job Rob!

Kevin

On Fri, Dec 27, 2013 at 10:01 PM, Stefan Karpinski <notifications@github.com

wrote:

In fact, this behavior strikes me as backwards from what I would expect. I
would expect the median-of-three algorithm to have better asymptotic
behavior but a higher constant factor of overhead since each median
selection is more expensive. Thus, I would expect pick the middle pivot
selection to do better for small arrays where asymptotic behavior doesn't
get a chance to compensate for the constant factor, and the median-of-three
to do better for large arrays.


Reply to this email directly or view it on GitHubhttps://github.com/JuliaLang/julia/pull/5081#issuecomment-31291248
.

Member

kmsquire commented Dec 28, 2013

Nice! I say go for it.

In @illericis's implementation, there is very little overhead, actually.
The pivot choice takes 3 comparisons, but replaces the first high and low
comparison, so there's only one extra comparison, plus the pivot swap. (In
fact, there's no reason to even store the pivot in the first position, but
in my tests, changing this didn't make any difference).

Anyway, nice job Rob!

Kevin

On Fri, Dec 27, 2013 at 10:01 PM, Stefan Karpinski <notifications@github.com

wrote:

In fact, this behavior strikes me as backwards from what I would expect. I
would expect the median-of-three algorithm to have better asymptotic
behavior but a higher constant factor of overhead since each median
selection is more expensive. Thus, I would expect pick the middle pivot
selection to do better for small arrays where asymptotic behavior doesn't
get a chance to compensate for the constant factor, and the median-of-three
to do better for large arrays.


Reply to this email directly or view it on GitHubhttps://github.com/JuliaLang/julia/pull/5081#issuecomment-31291248
.

@illerucis

This comment has been minimized.

Show comment
Hide comment
@illerucis

illerucis Dec 28, 2013

Contributor

Thanks for the feedback, guys!

I'm having a hard time understanding why the OldQuickSort does better on Float sorted/reverse arrays, but this version does better on Int sorted/reverse arrays. Maybe there is still something going on with fpsort or nans2end? Or is there some other explanation I'm missing?

Contributor

illerucis commented Dec 28, 2013

Thanks for the feedback, guys!

I'm having a hard time understanding why the OldQuickSort does better on Float sorted/reverse arrays, but this version does better on Int sorted/reverse arrays. Maybe there is still something going on with fpsort or nans2end? Or is there some other explanation I'm missing?

@illerucis

This comment has been minimized.

Show comment
Hide comment
@illerucis

illerucis Dec 28, 2013

Contributor

I'm starting to check it out

Contributor

illerucis commented Dec 28, 2013

I'm starting to check it out

@ViralBShah

This comment has been minimized.

Show comment
Hide comment
@ViralBShah

ViralBShah Dec 29, 2013

Member

Does code_llvm, code_native reveal anything?

Member

ViralBShah commented Dec 29, 2013

Does code_llvm, code_native reveal anything?

@illerucis

This comment has been minimized.

Show comment
Hide comment
@illerucis

illerucis Jan 2, 2014

Contributor

I don't see anything immediately, but I'm going to look into that later tonight. Didn't get a chance to look at fpsort or nans2end yet but I hope to write some tests for that code later tonight to try and reveal something.

Contributor

illerucis commented Jan 2, 2014

I don't see anything immediately, but I'm going to look into that later tonight. Didn't get a chance to look at fpsort or nans2end yet but I hope to write some tests for that code later tonight to try and reveal something.

gitfoxi referenced this pull request Jan 5, 2014

fix isless and cmp/lexcmp for floating point
for now cmp() uses the total ordering, but we might change it to give a
DomainError for NaN arguments
@kmsquire

This comment has been minimized.

Show comment
Hide comment
@kmsquire

kmsquire Jan 31, 2014

Member

Bump. Any progress on this, @illerucis?

Member

kmsquire commented Jan 31, 2014

Bump. Any progress on this, @illerucis?

@illerucis

This comment has been minimized.

Show comment
Hide comment
@illerucis

illerucis Feb 1, 2014

Contributor

getting back into this now. I was going to head down the route of exploring fpsort and nans2end before I accept this int/float performance discrepancy in benchmarking

Contributor

illerucis commented Feb 1, 2014

getting back into this now. I was going to head down the route of exploring fpsort and nans2end before I accept this int/float performance discrepancy in benchmarking

@illerucis

This comment has been minimized.

Show comment
Hide comment
@illerucis

illerucis Feb 6, 2014

Contributor

@kmsquire - I don't see anything (fpsort and nans2end seem fine after looking at them pretty carefully)

Contributor

illerucis commented Feb 6, 2014

@kmsquire - I don't see anything (fpsort and nans2end seem fine after looking at them pretty carefully)

@kmsquire

This comment has been minimized.

Show comment
Hide comment
@kmsquire

kmsquire Feb 10, 2014

Member

Hi @illerucis, no worries. As @StefanKarpinski suggested in the other email thread, please go ahead and merge this. Thanks for all of your work (so far)!

If you're interested in exploring the dual pivot quicksort, I linked to a good paper here: #1691 (comment)

Cheers!

Member

kmsquire commented Feb 10, 2014

Hi @illerucis, no worries. As @StefanKarpinski suggested in the other email thread, please go ahead and merge this. Thanks for all of your work (so far)!

If you're interested in exploring the dual pivot quicksort, I linked to a good paper here: #1691 (comment)

Cheers!

@kmsquire

This comment has been minimized.

Show comment
Hide comment
@kmsquire

kmsquire Feb 10, 2014

Member

Wait, I just realized that you probably can't merge this, right?

Member

kmsquire commented Feb 10, 2014

Wait, I just realized that you probably can't merge this, right?

@kmsquire

This comment has been minimized.

Show comment
Hide comment
@kmsquire

kmsquire Feb 10, 2014

Member

(In which case, I or Stefan or someone else will.) It would be good (but not necessary) to remove the WIP from the title.

Member

kmsquire commented Feb 10, 2014

(In which case, I or Stefan or someone else will.) It would be good (but not necessary) to remove the WIP from the title.

@illerucis

This comment has been minimized.

Show comment
Hide comment
@illerucis

illerucis Feb 11, 2014

Contributor

@kmsquire thanks! I added the @inbounds macro (I thought I already took care of that). I'm definitely interested in exploring the dual pivot quicksort. I'll look at the paper. Thanks!

Contributor

illerucis commented Feb 11, 2014

@kmsquire thanks! I added the @inbounds macro (I thought I already took care of that). I'm definitely interested in exploring the dual pivot quicksort. I'll look at the paper. Thanks!

@kmsquire

This comment has been minimized.

Show comment
Hide comment
@kmsquire

kmsquire Feb 11, 2014

Member

LGTM. Sorry not to notice earlier, but can you squash the commits? (https://help.github.com/articles/interactive-rebase)

Member

kmsquire commented Feb 11, 2014

LGTM. Sorry not to notice earlier, but can you squash the commits? (https://help.github.com/articles/interactive-rebase)

@illerucis

This comment has been minimized.

Show comment
Hide comment
@illerucis

illerucis Feb 11, 2014

Contributor

absolutely

Contributor

illerucis commented Feb 11, 2014

absolutely

@kmsquire

This comment has been minimized.

Show comment
Hide comment
@kmsquire

kmsquire Feb 12, 2014

Member

@illerucis, Thanks for squashing.

Sorry to be a little picky, but all of your intermediate notes probably don't need to go in the final commit message, since that will become a (relatively) permanent record of the commit in the Julia tree. Can you do another interactive rebase and change the commit message to just include relevant information about the final commit? After that, I'll merge it.

Member

kmsquire commented Feb 12, 2014

@illerucis, Thanks for squashing.

Sorry to be a little picky, but all of your intermediate notes probably don't need to go in the final commit message, since that will become a (relatively) permanent record of the commit in the Julia tree. Can you do another interactive rebase and change the commit message to just include relevant information about the final commit? After that, I'll merge it.

@pao

This comment has been minimized.

Show comment
Hide comment
@pao

pao Feb 12, 2014

Member

You don't need an interactive rebase for that. git commit --amend -c HEAD works too. (Depending on your configuration, you can omit the -c HEAD.)

Member

pao commented Feb 12, 2014

You don't need an interactive rebase for that. git commit --amend -c HEAD works too. (Depending on your configuration, you can omit the -c HEAD.)

@kmsquire

This comment has been minimized.

Show comment
Hide comment
@kmsquire

kmsquire Feb 12, 2014

Member

Thanks, Patrick!

Member

kmsquire commented Feb 12, 2014

Thanks, Patrick!

A new median of three quicksort that explicitly places the pivot in t…
…he right place after the inner loop. Additional performance gains were achieved after removing unnecessary bounds checking on the inner loop, as suggested by Kevin Squire.
@illerucis

This comment has been minimized.

Show comment
Hide comment
@illerucis

illerucis Feb 12, 2014

Contributor

Cool. I did git commit --amend -m "Message". Let me know if
it's good, Kevin.

On Tue, Feb 11, 2014 at 7:56 PM, Kevin Squire notifications@github.comwrote:

Thanks, Patrick!

Reply to this email directly or view it on GitHubhttps://github.com/JuliaLang/julia/pull/5081#issuecomment-34827490
.

Contributor

illerucis commented Feb 12, 2014

Cool. I did git commit --amend -m "Message". Let me know if
it's good, Kevin.

On Tue, Feb 11, 2014 at 7:56 PM, Kevin Squire notifications@github.comwrote:

Thanks, Patrick!

Reply to this email directly or view it on GitHubhttps://github.com/JuliaLang/julia/pull/5081#issuecomment-34827490
.

@StefanKarpinski

This comment has been minimized.

Show comment
Hide comment
@StefanKarpinski

StefanKarpinski Feb 12, 2014

Member

Looks good to me. Let's do this!

Member

StefanKarpinski commented Feb 12, 2014

Looks good to me. Let's do this!

StefanKarpinski added a commit that referenced this pull request Feb 12, 2014

@StefanKarpinski StefanKarpinski merged commit 272c26c into JuliaLang:master Feb 12, 2014

1 check passed

default The Travis CI build passed
Details
@kmsquire

This comment has been minimized.

Show comment
Hide comment
@kmsquire

kmsquire Feb 12, 2014

Member

👍

Member

kmsquire commented Feb 12, 2014

👍

@ViralBShah

This comment has been minimized.

Show comment
Hide comment
@ViralBShah

ViralBShah Feb 14, 2014

Member

It's great to see this merged!

Member

ViralBShah commented Feb 14, 2014

It's great to see this merged!

@illerucis

This comment has been minimized.

Show comment
Hide comment
@illerucis

illerucis Feb 15, 2014

Contributor

Absolutely! And sorry for the extremely slow progress. I was super busy
over the last month

On Fri, Feb 14, 2014 at 3:32 AM, Viral B. Shah notifications@github.comwrote:

It's great to see this merged!

Reply to this email directly or view it on GitHubhttps://github.com/JuliaLang/julia/pull/5081#issuecomment-35065055
.

Contributor

illerucis commented Feb 15, 2014

Absolutely! And sorry for the extremely slow progress. I was super busy
over the last month

On Fri, Feb 14, 2014 at 3:32 AM, Viral B. Shah notifications@github.comwrote:

It's great to see this merged!

Reply to this email directly or view it on GitHubhttps://github.com/JuliaLang/julia/pull/5081#issuecomment-35065055
.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment