New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
partial radix sort with early exit #474
partial radix sort with early exit #474
Conversation
5782fba
to
2c07934
Compare
For 1M elements: branch
master
|
2c07934
to
ba8ee8e
Compare
LGTM. Ran it locally, I see same results. For the cost an extra small histogram buffer this is nice gain. |
Saving a whole pass over the data (in all cases) is a really nice optimization. Of course, it increases slightly the size of the buffer memory but spending another kilobyte is probably worth it for large inputs. |
ba8ee8e
to
6e195f5
Compare
I added some more test cases and in doing so noticed a couple more cases we can optimise for just by looking at the histograms after the first pass:
When we can skip the copy and the sort, we get out a lot quicker:
Compared to e.g.
I also applied the suggestion to mask out rather than bound the random numbers in the benchmark for better comparability. |
6e195f5
to
5f24b72
Compare
Since we're beating up on my 3 year old radix sort, here's another variant which has two benefits
I added a benchmark which varies the number of bits in the input data. On my branch (skylake 2.6GHz) I get
On master I get
I haven't dug in to why the existing 2 pass algorithm is sensitive to there being 9 leading zeros but I suspect there is a dependency on the histogram's singularly populated bucket on the last pass.
NB I still worry that this code isn't production worthy because of the linear space requirement!