Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Add MSD radix sort #5129

Merged
merged 3 commits into from Apr 29, 2019
Merged

[WIP] Add MSD radix sort #5129

merged 3 commits into from Apr 29, 2019

Conversation

kvinty
Copy link
Contributor

@kvinty kvinty commented Apr 26, 2019

I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en

For changelog. Remove if this is non-significant change.

Category (leave one):

  • New Feature

Short description (up to few sentences):
Implemented MSD radix sort (based on kxsort), and partial sorting

@@ -136,7 +136,7 @@ class QuantileTDigest
{
if (unmerged > 0)
{
RadixSort<RadixSortTraits>::execute(summary.data(), summary.size());
RadixSort<RadixSortTraits>::executeLsd(summary.data(), summary.size());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Abbreviations are always in all caps, example: HTML, not Html, JSON, not Json.


static KeyBits forward(KeyBits x) { return x; }
static KeyBits backward(KeyBits x) { return x; }
static bool compare(TElement x, TElement y)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is named less, not compare.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why we compare Elements, should we compare Keys instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's a mistake.

@@ -163,6 +180,8 @@ struct RadixSort
using CountType = typename Traits::CountType;
using KeyBits = typename Traits::KeyBits;

static constexpr size_t INSERT_SORT_THRESHOLD = 64;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing comment (what it is; the motivation for constant: how it was derived)
PS. It is names "insertion sort", not "insert sort".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a comment.
This constant is used in kxsort. I did not attempt to optimize it.

@@ -179,8 +198,101 @@ struct RadixSort
static KeyBits keyToBits(Key x) { return ext::bit_cast<KeyBits>(x); }
static Key bitsToKey(KeyBits x) { return ext::bit_cast<Key>(x); }

static inline void insertSortInternal(Element * arr, size_t size)
{
for (Element * i = arr + 1; i < arr + size; ++i)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to check if arr + size is calculated in every loop iteration (it is possible due to aliasing).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We must check if std::sort or pdqsort will be better.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created a pre-computed variable to ensure this never happens.

@@ -179,8 +198,101 @@ struct RadixSort
static KeyBits keyToBits(Key x) { return ext::bit_cast<KeyBits>(x); }
static Key bitsToKey(KeyBits x) { return ext::bit_cast<Key>(x); }

static inline void insertSortInternal(Element * arr, size_t size)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No motivation for inline.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the specifier. It does not affect performance.

template <int PASS>
static inline void msdRadixSortInternal(Element * arr, size_t size, size_t limit)
{
Element *last_[HISTOGRAM_SIZE + 1];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Style.


last_[0] = last_[1] = arr;

size_t bucketsForRecursion = HISTOGRAM_SIZE;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Style.

static inline void msdRadixSortInternal(Element * arr, size_t size, size_t limit)
{
Element *last_[HISTOGRAM_SIZE + 1];
Element ** last = last_ + 1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing comment. What is going on here?

}
}

template <int PASS>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

int?

}

template <int PASS>
static inline void msdRadixSortInternal(Element * arr, size_t size, size_t limit)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing comment.

static inline void msdRadixSortInternal(Element * arr, size_t size, size_t limit)
{
Element *last_[HISTOGRAM_SIZE + 1];
Element ** last = last_ + 1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Terrible naming.

public:
static void execute(Element * arr, size_t size)
/* Least significant digit radix sort
* The most efficient stable general-purpose sorting algorithm
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This statement is unfounded.


/* Most significant digit radix sort
* Usually slower than LSD and is not stable, but allows partial sorting
* Based on https://github.com/voutcn/kxsort
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have partially copied some implementation details, but it requires to mention copyright and license.

@alexey-milovidov alexey-milovidov merged commit 0a88bcb into ClickHouse:master Apr 29, 2019
@abyss7 abyss7 added the pr-feature Pull request with new product feature label Apr 30, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr-feature Pull request with new product feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants