-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-39700: [C++] Feature: use inplace_merge to replace merge. #39701
base: main
Are you sure you want to change the base?
Conversation
|
I didn't review it carefully but generally this looks ok to me, I think this is because |
|
Did you run some benchmarks? You can't claim something improves performance without measuring it.
|
@ursabot please benchmark lang=C++ |
Benchmark runs are scheduled for commit 393e429. Watch https://buildkite.com/apache-arrow and https://conbench.ursa.dev for updates. A comment will be posted here when the runs are complete. |
Thanks for your patience. Conbench analyzed the 3 benchmarking runs that have been run so far on PR commit 393e429. There were 7 benchmark results indicating a performance regression:
The full Conbench report has more details. |
|
That's true, but there doesn't seem to be any improvement either. I've run the benchmarks locally and neither do I see any improvement. We can try to reason on the code changes here:
So, at least theoretically, the code in this PR is less efficient. Unless you can exhibit benchmark improvements on some configuration, I would recommend rejecting this. |
Also, if you filter for chunked array sorts on https://conbench.ursa.dev/compare/runs/6961d70de8424138aaf0b77dc6cba908...d3ea371166c146d4845ac4625d70d2ad/ and https://conbench.ursa.dev/compare/runs/d0e8a5b5cde24106b3a2c60699933ea1...37da92dc9fca4e159568f2563f562a1d/, you'll see that most benchmarks show a slight performance decrease (between 0 and 10%). |
Indeed....local testing is down a bit |
A possible experiment would be to use three-way merging instead of two-way merging. This might increase performance as indexing a chunked array is not trivial. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
std::inplace_merge can allocate a temporary buffer and the conditions for falling back to the non-allocating Nlog(N)
algorithm are not specified (IIUC, libcxx unconditionally allocates a buffer). Therefore, I think this PR should instead rewrite the non-allocating merge algorithm rather than use std::inplace_merge. That will give us a more reliable performance comparison between the two approaches, and might suggest our own explicit heuristic for choosing between them. See libstdc++ for an example
Rationale for this change
just like StarRocks/starrocks#14609
we can use std::inplace_merge to replace merge.
Since our indices are also a whole block of memory, after using std::inplace_merge, we can reduce the memory allocation of temp_indices and reduce std::copy operations, which has a natural advantage for us, so here I think std::inplace_merge is more suitable
What changes are included in this PR?
sort operator, vector_sort.cc
Are these changes tested?
yes, run vector_sort_test.cc
Are there any user-facing changes?
no.