Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[C++][Compute] Refine compare sorting kernel #24336

Closed
asfimport opened this issue Mar 16, 2020 · 3 comments
Closed

[C++][Compute] Refine compare sorting kernel #24336

asfimport opened this issue Mar 16, 2020 · 3 comments

Comments

@asfimport
Copy link

Sorting kernel implements two comparison functions, CompareValues use array.Value() for numeric data and CompareViews uses array.GetView() for non-numeric ones. It can be simplified by using GetView() only as all data types support GetView().

To my surprise, benchmark shows about 40% performance improvement after the change.

After some digging, I find in current code, the comparison callback is not inlined (check disassembled code), it leads to a function call. It's very bad for this hot loop. Using only GetView() fixes this issue, code inlined okay.

Reporter: Yibo Cai / @cyb70289
Assignee: Yibo Cai / @cyb70289

PRs and other links:

Note: This issue was originally created as ARROW-8129. Please see the migration documentation for further details.

@asfimport
Copy link
Author

Pierre Belzile:
Yibo,

Given that you are looking at this, I wonder if the following wouldn't be faster for CompareValues:

auto values_begin = values.raw_values(); // this also exists everywhere
std::stable_sort(indices_begin, nulls_begin,
 [values_begin](uint64_t left, uint64_t right) {
  return *(values_begin + left) < *(values_begin + right);
 });

It avoids dereferencing the shared_ptr data_ and adding the offset twice for each comparisons.

@asfimport
Copy link
Author

Yibo Cai / @cyb70289:
Pierre,

Another 10% performance improvement is observed after applying your change. The only problem is binary/string types don't have raw_values() member function.

Disassembled code shows the difference (may be not very accurate as it's hard to locate the original code after compiler optimization)

*(values_begin + left) < *(values_begin + right);

7b6d29:       49 8b 7c fd 00          mov    0x0(%r13,%rdi,8),%rdi
7b6d2e:       4b 39 7c cd 00          cmp    %rdi,0x0(%r13,%r9,8)
values.GetView(left) < values.GetView(right);

7ba98f:       4c 8b 49 08             mov    0x8(%rcx),%r9
7ba993:       4c 8b 51 20             mov    0x20(%rcx),%r10
7ba997:       4d 8b 49 20             mov    0x20(%r9),%r9
7ba99b:       49 8d 1c da             lea    (%r10,%rbx,8),%rbx
7ba99f:       49 8d 3c fa             lea    (%r10,%rdi,8),%rdi
7ba9a3:       4a 8b 3c cf             mov    (%rdi,%r9,8),%rdi
7ba9a7:       4a 39 3c cb             cmp    %rdi,(%rbx,%r9,8)

@asfimport
Copy link
Author

Antoine Pitrou / @pitrou:
Issue resolved by pull request 6640
#6640

@asfimport asfimport added this to the 0.17.0 milestone Jan 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants