Skip to content

Conversation

@kmsquire
Copy link
Member

@kmsquire kmsquire commented Aug 6, 2013

This probably belongs in a package with #3825, but I wanted to put it here to get some feedback, and to see if there's a reason for it in base.

This is a fairly fast implementation of radix sort. Many of the ideas were inspired by http://www.stereopsis.com/radix.html:

  • histogramming all radixes in the same loop
  • 11-bit radixes, rather than 8-bit, to reduce the number of passes through the data
  • bit twiddling for signed integers and floating point numbers mapping them to unsigned integers, while maintaining sort order (see the uint_mapping function)

Additionally, one optimization occurs during the sort phase, where one extra comparison is done at the beginning of each outer loop see if any sorting is needed for the current radix; if none is needed, that iteration is skipped. This is useful for sorting, e.g., 64-bit integers with small-ish positive values/ranges, as many of the radixes will be zero for all numbers involved.

After some work, the results is quite performant, with a few caveats/limitations. Benefits include

  1. it is especially fast at sorting smaller Int sizes (32-bit and smaller), often 2-4x faster than QuickSort (which is already a pretty fast implementation), and generally faster even for 64-bit numbers when the data is random
  2. it is a stable sort
  3. it's integrated with the current sorting framework, and can work with all standard Orderings except Lt.

Limitations:

  1. it's limited to bits types: all sizes of Int, Uint, and Float32/Float64; no direct sorting of Strings
  2. it will work with functions of non-bit types (eg., sort!(a::Array{String}, by=length) as long as the function produces a bits type; however the function call overhead is currently pretty high
  3. in the same vein, it works with sortperm, with some overhead; this can still be faster than other algorithms for smaller data types
  4. it has an up-front overhead which makes it slower than the other sorts for lists with less than ~2^13 (8192) values, give or take, depending on the machine. Larger than that, it is generally faster than all other sorts (in Julia) for straight lists of Ints/Floats. (Edit: Actually, for 64-bit numbers, this depends on the range of values)
  5. For 128-bit numbers, it is quite slow (assuming the high order bits are not empty). This is not surprising, since the run time is proportional to the size of the items being sorted, and 128-bit numbers are big (O(kn), where k is the word size).
  6. It uses O(N) memory (like MergeSort), which affects performance at some point depending on list and machine being used

You can see a comparison of all sorts in the SortPerf.jl package. The README has a table with a summary suggesting when each algorithm should be used, and sortperf.pdf shows a comparison of all of the different sort algorithms run on various tests. (Note that most of the tests were originally written to show off TimSort, and have some structure that other sorts (but not radix sort) can take advantage of. RadixSort does come out on top sorting random lists, as well, as large lists with only a few unique values.)

Finally, I had to work around some type inference limitations for the uint_mapping function (look for "manually inlined" comments). I'll file those issues separately.

Comments/suggestions on the implementation or what to do with it are welcome.

CC: @StefanKarpinski

@StefanKarpinski
Copy link
Member

Also, I could have sworn that I opened an issue about moving all but the essential sorts out of base, but now I can't find it. How does this relate?

@ViralBShah
Copy link
Member

I think it would be great to have a good radix sort in base, especially if it works fast for sorting integers.

@StefanKarpinski
Copy link
Member

Since we trickily use integer comparisons for sorting floating-point numbers, we could also use this for floats.

@kmsquire
Copy link
Member Author

kmsquire commented Aug 6, 2013

#3825, mentioned at the top of the page. This sort should probably be put in the same package.

I have a nice table and summary in the README of the SortPerf.jl package (link above). Basically, for small arrays QuickSort or MergeSort are all you need. For larger arrays, it's worth considering the type and structure of the data.

@kmsquire
Copy link
Member Author

kmsquire commented Aug 6, 2013

Actually, I just discovered that I made a false claim: this doesn't actually work for floating point numbers. :-(

@kmsquire
Copy link
Member Author

kmsquire commented Aug 6, 2013

Okay, it does, but not through the normal sort mechanism:

julia> a = [-nan(Float64), nan(Float64), -inf(Float64), inf(Float64), -1e308, 1e308, 1.0e-323, -1e-323, -1, 0, 1]
11-element Float64 Array:
  NaN           
  NaN           
 -Inf           
  Inf           
   -1.0e308     
    1.0e308     
    9.88131e-324
   -9.88131e-324
   -1.0         
    0.0         
    1.0         

julia> sort!(a, 1, length(a), Base.Sort.RadixSort, Order.Forward)
11-element Float64 Array:
  NaN           
 -Inf           
   -1.0e308     
   -1.0         
   -9.88131e-324
    0.0         
    9.88131e-324
    1.0         
    1.0e308     
  Inf           
  NaN           

julia> a = [-nan(Float64), nan(Float64), -inf(Float64), inf(Float64), -1e308, 1e308, 1.0e-323, -1e-323, -1, 0, 1];

julia> sort(a, alg=RadixSort)
11-element Float64 Array:
   -9.88131e-324
   -1.0         
   -1.0e308     
 -Inf           
    0.0         
    9.88131e-324
    1.0         
    1.0e308     
  Inf           
  NaN           
  NaN

I need to "uninterpret" Left and Right ordering. Will fix.

What I was trying to show when I started this was that, if we're okay with sorting -NaN and Nan to different ends of the array, then the extra work you do moving NaNs isn't necessary (but you probably knew that).

@kmsquire
Copy link
Member Author

kmsquire commented Aug 6, 2013

Okay, I missed all of the conversation while I was typing the last couple of notes.

As of 8704f08, it does work properly for Floats (now) as is. More over, as I alluded to in my last note, it's possible to dispense with the current NaN shifting if you don't mind -NaN at the top of the array:

julia> a = [-nan(Float64), nan(Float64), -inf(Float64), inf(Float64), -1e308, 1e308, 1.0e-323, -1e-323, -1, 0, 1];

julia> sort!(a, 1, length(a), Base.Sort.RadixSort, Order.Forward)
11-element Float64 Array:
  NaN           
 -Inf           
   -1.0e308     
   -1.0         
   -9.88131e-324
    0.0         
    9.88131e-324
    1.0         
    1.0e308     
  Inf           
  NaN           

(That first NaN is actually -NaN.) This is actually rather faster:

julia> a = rand(10_000_000); b = copy(a); c = copy(a);

julia> @time sort!(a); issorted(a)
elapsed time: 1.433912066 seconds (96 bytes allocated)
true

julia> @time sort!(b, alg=RadixSort); issorted(b)
elapsed time: 1.324949715 seconds (160148960 bytes allocated)
true

julia> @time sort!(c, 1, 10_000_000, RadixSort, Order.Forward); issorted(c)
elapsed time: 1.092704451 seconds (80148800 bytes allocated)
true

Thoughts?

@kmsquire
Copy link
Member Author

kmsquire commented Aug 6, 2013

Note that this holds for arrays down to about 10,000, but not for smaller (although it may not matter much):

julia> a = rand(10_000); b = copy(a); c = copy(a);

julia> @time sort!(a); issorted(a)
elapsed time: 0.001132448 seconds (96 bytes allocated)
true

julia> @time sort!(b, alg=RadixSort); issorted(b)
elapsed time: 0.001046899 seconds (308960 bytes allocated)
true

julia> @time sort!(c, 1, 10_000, RadixSort, Order.Forward); issorted(c)
elapsed time: 0.000888866 seconds (228800 bytes allocated)
true

julia> a = rand(1000); b = copy(a); c = copy(a);

julia> @time sort!(a); issorted(a)
elapsed time: 9.7284e-5 seconds (96 bytes allocated)
true

julia> @time sort!(b, alg=RadixSort); issorted(b)
elapsed time: 0.000212502 seconds (164992 bytes allocated)
true

julia> @time sort!(c, 1, 1000, RadixSort, Order.Forward); issorted(c)
elapsed time: 0.000179684 seconds (156816 bytes allocated)
true

@JeffBezanson
Copy link
Member

Nice. Yes, let's put it in a package with the other sorting algs for now, and maybe decide to pull this into Base in the future.

@kmsquire
Copy link
Member Author

kmsquire commented Aug 7, 2013

Okay.

@StefanKarpinski
Copy link
Member

The easiest course at this point is to merge it into Base and then move TimSort, HeapSort and RadixSort out into a package all at once.

StefanKarpinski added a commit that referenced this pull request Aug 7, 2013
@StefanKarpinski StefanKarpinski merged commit 6fbac9a into JuliaLang:master Aug 7, 2013
@quinnj
Copy link
Member

quinnj commented Aug 13, 2013

Just as a point of interest, R's data.table package (author @mdowle) uses radix sorting by default and is known for it's speed. FAQ here and search 'radix'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants