Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-39700: [C++] Feature: use inplace_merge to replace merge. #39701

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

Light-City
Copy link
Contributor

@Light-City Light-City commented Jan 19, 2024

Rationale for this change

just like StarRocks/starrocks#14609

we can use std::inplace_merge to replace merge.

Since our indices are also a whole block of memory, after using std::inplace_merge, we can reduce the memory allocation of temp_indices and reduce std::copy operations, which has a natural advantage for us, so here I think std::inplace_merge is more suitable

What changes are included in this PR?

sort operator, vector_sort.cc

Are these changes tested?

yes, run vector_sort_test.cc

Are there any user-facing changes?

no.

Copy link

⚠️ GitHub issue #39700 has been automatically assigned in GitHub to PR creator.

@mapleFU
Copy link
Member

mapleFU commented Jan 19, 2024

I didn't review it carefully but generally this looks ok to me, I think this is because inplace_merge is introduced in C++17, so previous code don't use it.
Would you mind fix the lint first?

@Light-City
Copy link
Contributor Author

Details
The formatting problem has been solved, thank you

@pitrou
Copy link
Member

pitrou commented Jan 19, 2024

Did you run some benchmarks? You can't claim something improves performance without measuring it.

std::inplace_merge does not avoid a memory allocation, it just allocates the memory implicitly. Quoting the documentation:

This function attempts to allocate a temporary buffer. If the allocation fails, the less efficient algorithm is chosen.

@pitrou
Copy link
Member

pitrou commented Jan 19, 2024

@ursabot please benchmark lang=C++

@ursabot
Copy link

ursabot commented Jan 19, 2024

Benchmark runs are scheduled for commit 393e429. Watch https://buildkite.com/apache-arrow and https://conbench.ursa.dev for updates. A comment will be posted here when the runs are complete.

Copy link

Thanks for your patience. Conbench analyzed the 3 benchmarking runs that have been run so far on PR commit 393e429.

There were 7 benchmark results indicating a performance regression:

The full Conbench report has more details.

@Light-City
Copy link
Contributor Author

https://conbench.ursa.dev
It seems that the regression results of the benchmark have nothing to do with my changes.

@pitrou
Copy link
Member

pitrou commented Jan 19, 2024

It seems that the regression results of the benchmark have nothing to do with my changes.

That's true, but there doesn't seem to be any improvement either.

I've run the benchmarks locally and neither do I see any improvement.

We can try to reason on the code changes here:

  • the original code allocates the temporary buffer once at the beginning of the sort operation, and reuses it for all std::merge calls
  • the changed code lets std::inplace_merge allocate a new temporary buffer at each invocation

So, at least theoretically, the code in this PR is less efficient. Unless you can exhibit benchmark improvements on some configuration, I would recommend rejecting this.

@pitrou
Copy link
Member

pitrou commented Jan 19, 2024

Also, if you filter for chunked array sorts on https://conbench.ursa.dev/compare/runs/6961d70de8424138aaf0b77dc6cba908...d3ea371166c146d4845ac4625d70d2ad/ and https://conbench.ursa.dev/compare/runs/d0e8a5b5cde24106b3a2c60699933ea1...37da92dc9fca4e159568f2563f562a1d/, you'll see that most benchmarks show a slight performance decrease (between 0 and 10%).

@Light-City
Copy link
Contributor Author

Also, if you filter for chunked array sorts on https://conbench.ursa.dev/compare/runs/6961d70de8424138aaf0b77dc6cba908...d3ea371166c146d4845ac4625d70d2ad/ and https://conbench.ursa.dev/compare/runs/d0e8a5b5cde24106b3a2c60699933ea1...37da92dc9fca4e159568f2563f562a1d/, you'll see that most benchmarks show a slight performance decrease (between 0 and 10%).

Indeed....local testing is down a bit

@pitrou
Copy link
Member

pitrou commented Jan 22, 2024

A possible experiment would be to use three-way merging instead of two-way merging. This might increase performance as indexing a chunked array is not trivial.

Copy link
Member

@bkietz bkietz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

std::inplace_merge can allocate a temporary buffer and the conditions for falling back to the non-allocating Nlog(N) algorithm are not specified (IIUC, libcxx unconditionally allocates a buffer). Therefore, I think this PR should instead rewrite the non-allocating merge algorithm rather than use std::inplace_merge. That will give us a more reliable performance comparison between the two approaches, and might suggest our own explicit heuristic for choosing between them. See libstdc++ for an example

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[C++] Feature: use inplace_merge to replace merge.
5 participants