New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve ColumnDecimal, ColumnVector getPermutation performance using pdqsort with RadixSort #35961
Improve ColumnDecimal, ColumnVector getPermutation performance using pdqsort with RadixSort #35961
Conversation
Insertion into MergeTree. Probably not common case. CREATE TABLE test_table_insert (id Decimal64(8), value1 UInt64, value2 UInt64, value3 UInt64) ENGINE=MergeTree ORDER BY id; Before:
After:
|
Sort by decimal column. CREATE TABLE test_table_sort (id UInt64, value Decimal64(4)) ENGINE=MergeTree ORDER BY id;
INSERT INTO test_table_sort SELECT number, rand64() % 50000 FROM system.numbers LIMIT 50000000; Before: SELECT * FROM test_table_sort ORDER BY value FORMAT Null;
0 rows in set. Elapsed: 1.404 sec. Processed 50.00 million rows, 800.00 MB (35.61 million rows/s., 569.68 MB/s.) After:
|
src/Columns/ColumnDecimal.cpp
Outdated
if (!limit) | ||
{ | ||
/// A case for radix sort | ||
/// LSD RadixSort is stable | ||
if constexpr (is_arithmetic_v<NativeT> && !is_big_int_v<NativeT>) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can reverse the 2 ifs and do the if (!limit)
check only for arithmetic types.
I assume compiler is smart enough to remove the branch with no code, but I think it's good to be explicit in any case.
src/Columns/ColumnVector.cpp
Outdated
::partial_sort(res.begin(), res.begin() + limit, res.end(), greater_stable(*this, nan_direction_hint)); | ||
} | ||
else | ||
if (!limit) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same things as above, if you reverse the checks, you can eliminate a branch for arithmetic type.
b687c9e
to
aa19007
Compare
But the numbers in #35961 (comment) look good. |
@kitaisreal May I ask what happened to this PR for not merging? |
@zhanglistar it can provide degradation in a couple of cases, for example when data is completely sorted. |
Continued in #46559. |
This is an automated comment for commit f7494a5 with description of existing statuses. It's updated for the latest CI running ❌ Click here to open a full report in a separate page Successful checks
|
2b1d414
to
228957a
Compare
eee5cd2
to
1b71e7f
Compare
@kitaisreal it does not build. |
0ffcb39
to
6df6155
Compare
6df6155
to
6513892
Compare
6513892
to
720b717
Compare
c9753d7
to
64f5032
Compare
64f5032
to
0057f29
Compare
0057f29
to
f7494a5
Compare
Insert into MergeTree improvements. Setup table:
Before:
After:
Before:
After:
|
Sorting already sorted data improvements: Setup table:
Before:
After:
|
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Improve performance of sorting for decimal columns. Improve performance of insertion into MergeTree if ORDER BY contains Decimal column. Improve performance of sorting when data is already sorted or almost sorted.