Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Floyd-Rivest selection algorithm instead of std::partial_sort #16825

Merged
merged 16 commits into from Nov 10, 2020
Merged

Use Floyd-Rivest selection algorithm instead of std::partial_sort #16825

merged 16 commits into from Nov 10, 2020

Conversation

@danlark1
Copy link
Contributor

@danlark1 danlark1 commented Nov 9, 2020

I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en

Changelog category (leave one):

  • Performance Improvement

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Use Floyd-Rivest algorithm, it should be the best for the ClickHouse use case of partial sorting. Bechmarks are in https://github.com/danlark1/miniselect and here

@@ -3,7 +3,7 @@
5
1 1
2 1
3 4
3 3

This comment has been minimized.

@alexey-milovidov

alexey-milovidov Nov 9, 2020
Member

Is it because of unspecified "ties" handling?

This comment has been minimized.

@danlark1

danlark1 Nov 9, 2020
Author Contributor

Yep, likely some other tests will also have this issue

Copy link
Member

@alexey-milovidov alexey-milovidov left a comment

LGTM

@alexey-milovidov alexey-milovidov self-assigned this Nov 9, 2020
@amosbird
Copy link
Member

@amosbird amosbird commented Nov 10, 2020

Can we construct any performance tests to validate this method?

@danlark1
Copy link
Contributor Author

@danlark1 danlark1 commented Nov 10, 2020

Can we construct any performance tests to validate this method?

I am happy to add more, however I am pretty sure there are enough queries with ORDER BY LIMIT N in the performance tests. I want to see them first

@danlark1
Copy link
Contributor Author

@danlark1 danlark1 commented Nov 10, 2020

@alexey-milovidov any chances ci can go through due to Yandex checks? UPD: Solved by force tests marker

@danlark1
Copy link
Contributor Author

@danlark1 danlark1 commented Nov 10, 2020

Perf tests are here. 2.5% win overall for all tests, for desc data the speedup in 1400%, some more representative benchmarks as string_sort showed 15% boost

0.350 | 0.301 | -1.163x | -0.141 | 0.140 | string_sort | 7 | SELECT Referer FROM hits_100m_single ORDER BY Referer LIMIT 300 format Null
0.435 | 0.376 | -1.154x | -0.134 | 0.133 | string_sort | 8 | SELECT Title FROM hits_100m_single ORDER BY Title LIMIT 300 format Null
0.488 | 0.423 | -1.154x | -0.134 | 0.133 | string_sort | 50 | SELECT Title FROM hits_100m_single ORDER BY Title, CounterID LIMIT 300 format Null
@danlark1
Copy link
Contributor Author

@danlark1 danlark1 commented Nov 10, 2020

One query became 5-10% slower, 15-20 became significantly faster, up to 20x

SELECT MobilePhoneModel, MobilePhoneModel FROM hits_100m_single ORDER BY MobilePhoneModel, MobilePhoneModel LIMIT 10 FORMAT Null

It is expected as Floyd-Rivest is not performing way faster when there are many equal elements in the array, it might be several percent worse. Other than that everything is really good. I believe somewhen in the future I can fix this issue and we will have even better partial sorting.

@danlark1
Copy link
Contributor Author

@danlark1 danlark1 commented Nov 10, 2020

@alexey-milovidov you can merge, the tests from 435f410 are good and after that I fixed the performance test which I checked locally, it should be good

@alexey-milovidov alexey-milovidov merged commit 41dc55c into ClickHouse:master Nov 10, 2020
3 of 9 checks passed
3 of 9 checks passed
@robot-clickhouse
ClickHouse special build check 0/5 builds finished
Details
@robot-ch-test-poll4
Docs check Started
Details
@robot-ch-test-poll4
Fast test Started
Details
@robot-ch-test-poll4
Marker check Starting checks (Labeled 'force tests')
Details
@robot-ch-test-poll2
PVS check Started
Details
@robot-ch-test-poll4
Push to dockerhub Started
Details
@robot-ch-test-poll4
Description check Description OK
Details
@robot-ch-test-poll4
Integration tests flaky check (asan) Not found changed integration tests
@robot-clickhouse
Style check Style check passed
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

6 participants