New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Add MSD radix sort #5129
[WIP] Add MSD radix sort #5129
Conversation
@@ -136,7 +136,7 @@ class QuantileTDigest | |||
{ | |||
if (unmerged > 0) | |||
{ | |||
RadixSort<RadixSortTraits>::execute(summary.data(), summary.size()); | |||
RadixSort<RadixSortTraits>::executeLsd(summary.data(), summary.size()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Abbreviations are always in all caps, example: HTML, not Html, JSON, not Json.
dbms/src/Common/RadixSort.h
Outdated
|
||
static KeyBits forward(KeyBits x) { return x; } | ||
static KeyBits backward(KeyBits x) { return x; } | ||
static bool compare(TElement x, TElement y) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is named less
, not compare
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why we compare Elements, should we compare Keys instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's a mistake.
dbms/src/Common/RadixSort.h
Outdated
@@ -163,6 +180,8 @@ struct RadixSort | |||
using CountType = typename Traits::CountType; | |||
using KeyBits = typename Traits::KeyBits; | |||
|
|||
static constexpr size_t INSERT_SORT_THRESHOLD = 64; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing comment (what it is; the motivation for constant: how it was derived)
PS. It is names "insertion sort", not "insert sort".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a comment.
This constant is used in kxsort. I did not attempt to optimize it.
dbms/src/Common/RadixSort.h
Outdated
@@ -179,8 +198,101 @@ struct RadixSort | |||
static KeyBits keyToBits(Key x) { return ext::bit_cast<KeyBits>(x); } | |||
static Key bitsToKey(KeyBits x) { return ext::bit_cast<Key>(x); } | |||
|
|||
static inline void insertSortInternal(Element * arr, size_t size) | |||
{ | |||
for (Element * i = arr + 1; i < arr + size; ++i) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to check if arr + size
is calculated in every loop iteration (it is possible due to aliasing).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We must check if std::sort
or pdqsort
will be better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Created a pre-computed variable to ensure this never happens.
dbms/src/Common/RadixSort.h
Outdated
@@ -179,8 +198,101 @@ struct RadixSort | |||
static KeyBits keyToBits(Key x) { return ext::bit_cast<KeyBits>(x); } | |||
static Key bitsToKey(KeyBits x) { return ext::bit_cast<Key>(x); } | |||
|
|||
static inline void insertSortInternal(Element * arr, size_t size) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No motivation for inline
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed the specifier. It does not affect performance.
dbms/src/Common/RadixSort.h
Outdated
template <int PASS> | ||
static inline void msdRadixSortInternal(Element * arr, size_t size, size_t limit) | ||
{ | ||
Element *last_[HISTOGRAM_SIZE + 1]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Style.
dbms/src/Common/RadixSort.h
Outdated
|
||
last_[0] = last_[1] = arr; | ||
|
||
size_t bucketsForRecursion = HISTOGRAM_SIZE; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Style.
dbms/src/Common/RadixSort.h
Outdated
static inline void msdRadixSortInternal(Element * arr, size_t size, size_t limit) | ||
{ | ||
Element *last_[HISTOGRAM_SIZE + 1]; | ||
Element ** last = last_ + 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing comment. What is going on here?
dbms/src/Common/RadixSort.h
Outdated
} | ||
} | ||
|
||
template <int PASS> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
int?
dbms/src/Common/RadixSort.h
Outdated
} | ||
|
||
template <int PASS> | ||
static inline void msdRadixSortInternal(Element * arr, size_t size, size_t limit) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing comment.
dbms/src/Common/RadixSort.h
Outdated
static inline void msdRadixSortInternal(Element * arr, size_t size, size_t limit) | ||
{ | ||
Element *last_[HISTOGRAM_SIZE + 1]; | ||
Element ** last = last_ + 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Terrible naming.
dbms/src/Common/RadixSort.h
Outdated
public: | ||
static void execute(Element * arr, size_t size) | ||
/* Least significant digit radix sort | ||
* The most efficient stable general-purpose sorting algorithm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This statement is unfounded.
dbms/src/Common/RadixSort.h
Outdated
|
||
/* Most significant digit radix sort | ||
* Usually slower than LSD and is not stable, but allows partial sorting | ||
* Based on https://github.com/voutcn/kxsort |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You have partially copied some implementation details, but it requires to mention copyright and license.
I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en
For changelog. Remove if this is non-significant change.
Category (leave one):
Short description (up to few sentences):
Implemented MSD radix sort (based on kxsort), and partial sorting