Use Floyd-Rivest selection algorithm instead of std::partial_sort #16825
Conversation
| @@ -3,7 +3,7 @@ | |||
| 5 | |||
| 1 1 | |||
| 2 1 | |||
| 3 4 | |||
| 3 3 | |||
alexey-milovidov
Nov 9, 2020
•
Member
Is it because of unspecified "ties" handling?
Is it because of unspecified "ties" handling?
danlark1
Nov 9, 2020
Author
Contributor
Yep, likely some other tests will also have this issue
Yep, likely some other tests will also have this issue
|
LGTM |
|
Can we construct any performance tests to validate this method? |
I am happy to add more, however I am pretty sure there are enough queries with ORDER BY LIMIT N in the performance tests. I want to see them first |
|
@alexey-milovidov any chances ci can go through due to Yandex checks? UPD: Solved by force tests marker |
|
Perf tests are here. 2.5% win overall for all tests, for desc data the speedup in 1400%, some more representative benchmarks as string_sort showed 15% boost |
|
One query became 5-10% slower, 15-20 became significantly faster, up to 20x It is expected as Floyd-Rivest is not performing way faster when there are many equal elements in the array, it might be several percent worse. Other than that everything is really good. I believe somewhen in the future I can fix this issue and we will have even better partial sorting. |
|
@alexey-milovidov you can merge, the tests from 435f410 are good and after that I fixed the performance test which I checked locally, it should be good |
41dc55c
into
ClickHouse:master
I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Use Floyd-Rivest algorithm, it should be the best for the ClickHouse use case of partial sorting. Bechmarks are in https://github.com/danlark1/miniselect and here