partial radix sort with early exit #474

richardstartin · 2021-04-08T23:28:28Z

Since we're beating up on my 3 year old radix sort, here's another variant which has two benefits

Reduce the number of passes over the data to build the histograms by populating both at once
If the maximum element in the data has an empty high byte, skip one level of the sort

I added a benchmark which varies the number of bits in the input data. On my branch (skylake 2.6GHz) I get

Benchmark              (bits)  (seed)     (size)  Mode  Cnt  Score   Error  Units
RadixSort.partialSort      23       0  100000000  avgt    5  0.507 ▒ 0.011   s/op
RadixSort.partialSort      25       0  100000000  avgt    5  0.600 ▒ 0.016   s/op

On master I get

Benchmark              (bits)  (seed)     (size)  Mode  Cnt  Score   Error  Units
RadixSort.partialSort      23       0  100000000  avgt    5  0.921 ▒ 0.165   s/op
RadixSort.partialSort      25       0  100000000  avgt    5  0.749 ▒ 0.003   s/op

I haven't dug in to why the existing 2 pass algorithm is sensitive to there being 9 leading zeros but I suspect there is a dependency on the histogram's singularly populated bucket on the last pass.

NB I still worry that this code isn't production worthy because of the linear space requirement!

richardstartin · 2021-04-09T00:03:11Z

For 1M elements:

branch

Benchmark              (bits)  (seed)   (size)  Mode  Cnt     Score     Error  Units
RadixSort.partialSort      23       0  1000000  avgt    5  4507.471 ▒  80.487  us/op
RadixSort.partialSort      25       0  1000000  avgt    5  5615.708 ▒ 111.422  us/op

master

Benchmark              (bits)  (seed)   (size)  Mode  Cnt     Score    Error  Units
RadixSort.partialSort      23       0  1000000  avgt    5  8580.655 ▒ 91.773  us/op
RadixSort.partialSort      25       0  1000000  avgt    5  7077.933 ▒ 61.236  us/op

jmh/src/jmh/java/org/roaringbitmap/RadixSort.java

Ignition · 2021-04-09T10:01:56Z

LGTM. Ran it locally, I see same results. For the cost an extra small histogram buffer this is nice gain.

lemire · 2021-04-09T12:37:36Z

Saving a whole pass over the data (in all cases) is a really nice optimization. Of course, it increases slightly the size of the buffer memory but spending another kilobyte is probably worth it for large inputs.

richardstartin · 2021-04-09T20:38:44Z

I added some more test cases and in doing so noticed a couple more cases we can optimise for just by looking at the histograms after the first pass:

all of the values have the same bits in positions 16-24, so we don't need to do the first sort
all of the values have the same bits in positions 16-32, in which case we don't allocate the copy or do any sorting at all

When we can skip the copy and the sort, we get out a lot quicker:

Benchmark                                           (bits)  (seed)   (size)  Mode  Cnt     Score       Error   Units
RadixSort.partialSort                                   16       0  1000000  avgt    5  2197.522 ▒   107.813   us/op
RadixSort.partialSort:▒gc.alloc.rate                    16       0  1000000  avgt    5     0.494 ▒     0.052  MB/sec
RadixSort.partialSort:▒gc.alloc.rate.norm               16       0  1000000  avgt    5  2097.281 ▒     1.588    B/op
RadixSort.partialSort:▒gc.churn.G1_Eden_Space           16       0  1000000  avgt    5     1.830 ▒    15.757  MB/sec
RadixSort.partialSort:▒gc.churn.G1_Eden_Space.norm      16       0  1000000  avgt    5  8000.035 ▒ 68882.713    B/op
RadixSort.partialSort:▒gc.count                         16       0  1000000  avgt    5     1.000              counts
RadixSort.partialSort:▒gc.time                          16       0  1000000  avgt    5     2.000                  ms

Compared to e.g.

Benchmark                                           (bits)  (seed)   (size)  Mode  Cnt        Score         Error   Units
RadixSort.partialSort                                   24       0  1000000  avgt    5     4883.339 ▒     117.287   us/op
RadixSort.partialSort:▒gc.alloc.rate                    24       0  1000000  avgt    5      455.707 ▒       6.660  MB/sec
RadixSort.partialSort:▒gc.alloc.rate.norm               24       0  1000000  avgt    5  4002114.282 ▒       0.293    B/op
RadixSort.partialSort:▒gc.churn.G1_Eden_Space           24       0  1000000  avgt    5        4.996 ▒       2.864  MB/sec
RadixSort.partialSort:▒gc.churn.G1_Eden_Space.norm      24       0  1000000  avgt    5    43882.183 ▒   25380.760    B/op
RadixSort.partialSort:▒gc.churn.G1_Old_Gen              24       0  1000000  avgt    5      468.996 ▒     137.111  MB/sec
RadixSort.partialSort:▒gc.churn.G1_Old_Gen.norm         24       0  1000000  avgt    5  4118724.911 ▒ 1197359.334    B/op
RadixSort.partialSort:▒gc.count                         24       0  1000000  avgt    5       29.000                counts
RadixSort.partialSort:▒gc.time                          24       0  1000000  avgt    5       16.000                    ms

I also applied the suggestion to mask out rather than bound the random numbers in the benchmark for better comparability.

richardstartin requested a review from lemire April 8, 2021 23:28

richardstartin force-pushed the early-exit-radix-sort branch from 5782fba to 2c07934 Compare April 8, 2021 23:43

richardstartin force-pushed the early-exit-radix-sort branch from 2c07934 to ba8ee8e Compare April 9, 2021 00:25

richardstartin mentioned this pull request Apr 9, 2021

partialRadixSort optimisation #472

Closed

Ignition reviewed Apr 9, 2021

View reviewed changes

jmh/src/jmh/java/org/roaringbitmap/RadixSort.java Outdated Show resolved Hide resolved

lemire approved these changes Apr 9, 2021

View reviewed changes

richardstartin force-pushed the early-exit-radix-sort branch from ba8ee8e to 6e195f5 Compare April 9, 2021 20:32

partial radix sort with early exit

5f24b72

richardstartin force-pushed the early-exit-radix-sort branch from 6e195f5 to 5f24b72 Compare April 9, 2021 20:50

richardstartin merged commit 13b1376 into RoaringBitmap:master Apr 9, 2021

richardstartin deleted the early-exit-radix-sort branch April 9, 2021 21:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

partial radix sort with early exit #474

partial radix sort with early exit #474

richardstartin commented Apr 8, 2021 •

edited

richardstartin commented Apr 9, 2021 •

edited

Ignition commented Apr 9, 2021

lemire commented Apr 9, 2021

richardstartin commented Apr 9, 2021 •

edited

partial radix sort with early exit #474

partial radix sort with early exit #474

Conversation

richardstartin commented Apr 8, 2021 • edited

richardstartin commented Apr 9, 2021 • edited

Ignition commented Apr 9, 2021

lemire commented Apr 9, 2021

richardstartin commented Apr 9, 2021 • edited

richardstartin commented Apr 8, 2021 •

edited

richardstartin commented Apr 9, 2021 •

edited

richardstartin commented Apr 9, 2021 •

edited