Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-8843: [C++] Compare bitmaps in words #7285

Closed
wants to merge 2 commits into from

Conversation

cyb70289
Copy link
Contributor

Unaligned bitmap comparision are currently processed bit-by-bit.
Comparing word-by-word in uint64 improves performance significantly.
Bechmark(comparing two identical bitmaps with 64K bits) jumps from
86M/s to 5.7G/s on my test machine.

NOTE: This patch may hurt performance if two bitmaps differ at the very
begining bits, as it always loads and compares 64 bits if possible. Bit
by bit comparison will notice the difference and return false earlier.
This should not be a problem in practice.

@github-actions
Copy link

@cyb70289 cyb70289 force-pushed the bitmap-equals branch 4 times, most recently from b6b716b to 9c6b82c Compare May 27, 2020 06:32
Unaligned bitmap comparision are currently processed bit-by-bit.
Comparing word-by-word in uint64 improves performance significantly.
Bechmark(comparing two identical bitmaps with 64K bits) jumps from
86M/s to 5.7G/s on my test machine.

NOTE: This patch may hurt performance if two bitmaps differ at the very
begining bits, as it always loads and compares 64 bits if possible. Bit
by bit comparison will notice the difference and return false earlier.
This should not be a problem in practice.
@cyb70289
Copy link
Contributor Author

Before (apply only benchmark code)

BitmapEqualsWithoutOffset/8192                195 ns          195 ns      3618366 bytes_per_second=39.1055G/s
BitmapEqualsWithOffset/8192                 91272 ns        91235 ns         7673 bytes_per_second=85.6307M/s

After

BitmapEqualsWithoutOffset/8192                193 ns          193 ns      3635971 bytes_per_second=39.6261G/s
BitmapEqualsWithOffset/8192                  1332 ns         1331 ns       510343 bytes_per_second=5.73058G/s

@fsaintjacques fsaintjacques self-requested a review May 27, 2020 16:17
@fsaintjacques
Copy link
Contributor

NOTE: This patch may hurt performance if two bitmaps differ at the very
begining bits, as it always loads and compares 64 bits if possible. Bit
by bit comparison will notice the difference and return false earlier.
This should not be a problem in practice.

This should be negligible in practice. The processor still load in unit of words.

@wesm wesm requested a review from pitrou May 29, 2020 16:33
Copy link
Contributor

@fsaintjacques fsaintjacques left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the code can be improved a bit for readability.

right_offset %= 8;

// process in 64 bits
int64_t nwords = bit_length / 64;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use snake_case for variables, e.g. n_words.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you. Will change.
A quick question, is there formal doc for arrow C++ coding style?
I see class names are camel case.
Most variables are snake case, with some exceptions.
Most function names are camel case, with some exceptions(simple getter should be snake case?)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://google.github.io/styleguide/cppguide.html

We indeed use snake_case for member accessors (this used to be the guidance in the Google C++ style guide but they might have made some changes).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW I think nwords is fine, we use nbytes in plenty of places.

if (nwords > 1) {
bit_length -= (nwords - 1) * 64;

uint64_t left_word0 = BitUtil::ToLittleEndian(util::SafeLoadAs<uint64_t>(left));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be more readable if you wrap load into a lambda. The compiler will inline it.

auto load_word = [](uint8_t* bytes) { return BitUtil::ToLittleEndian(util::SafeLoadAs<uint64_t>(bytes)); }

auto left_word0 = load_word(left);
auto right_word0 = load_word(right);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

uint64_t left_word0 = BitUtil::ToLittleEndian(util::SafeLoadAs<uint64_t>(left));
uint64_t right_word0 = BitUtil::ToLittleEndian(util::SafeLoadAs<uint64_t>(right));

do {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make a shift lambda:

auto shift = [](uint64_t word, uint64_t next, uint8_t shift) -> uint64_t {
  if (shift == 0) return word;
  return (word >> shift) | (next << (64 - shift);
}; 

auto next = load_word(left);
auto left_word = shift(left_word0, next, left_offset);
left_word0 = next;

I'd also think it would be worth transforming the loop into a fixed for-loop.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Much cleaner code after refinement. Thanks.

@fsaintjacques
Copy link
Contributor

fsaintjacques commented Jun 1, 2020

@cyb70289 Thank for this. For future reference, if you add a new benchmark, do it first as a seperate commit, and then add the improvement in a following (chronologically) second commit. You'll be able to check the difference with ursabot, e.g. @ursabot benchmark [filter] <sha_of_first_commit>.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants