New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARROW-8843: [C++] Compare bitmaps in words #7285
Conversation
b6b716b
to
9c6b82c
Compare
Unaligned bitmap comparision are currently processed bit-by-bit. Comparing word-by-word in uint64 improves performance significantly. Bechmark(comparing two identical bitmaps with 64K bits) jumps from 86M/s to 5.7G/s on my test machine. NOTE: This patch may hurt performance if two bitmaps differ at the very begining bits, as it always loads and compares 64 bits if possible. Bit by bit comparison will notice the difference and return false earlier. This should not be a problem in practice.
Before (apply only benchmark code)
After
|
This should be negligible in practice. The processor still load in unit of words. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the code can be improved a bit for readability.
cpp/src/arrow/util/bit_util.cc
Outdated
right_offset %= 8; | ||
|
||
// process in 64 bits | ||
int64_t nwords = bit_length / 64; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use snake_case for variables, e.g. n_words
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you. Will change.
A quick question, is there formal doc for arrow C++ coding style?
I see class names are camel case.
Most variables are snake case, with some exceptions.
Most function names are camel case, with some exceptions(simple getter should be snake case?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://google.github.io/styleguide/cppguide.html
We indeed use snake_case for member accessors (this used to be the guidance in the Google C++ style guide but they might have made some changes).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW I think nwords
is fine, we use nbytes
in plenty of places.
cpp/src/arrow/util/bit_util.cc
Outdated
if (nwords > 1) { | ||
bit_length -= (nwords - 1) * 64; | ||
|
||
uint64_t left_word0 = BitUtil::ToLittleEndian(util::SafeLoadAs<uint64_t>(left)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be more readable if you wrap load into a lambda. The compiler will inline it.
auto load_word = [](uint8_t* bytes) { return BitUtil::ToLittleEndian(util::SafeLoadAs<uint64_t>(bytes)); }
auto left_word0 = load_word(left);
auto right_word0 = load_word(right);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
cpp/src/arrow/util/bit_util.cc
Outdated
uint64_t left_word0 = BitUtil::ToLittleEndian(util::SafeLoadAs<uint64_t>(left)); | ||
uint64_t right_word0 = BitUtil::ToLittleEndian(util::SafeLoadAs<uint64_t>(right)); | ||
|
||
do { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make a shift lambda:
auto shift = [](uint64_t word, uint64_t next, uint8_t shift) -> uint64_t {
if (shift == 0) return word;
return (word >> shift) | (next << (64 - shift);
};
auto next = load_word(left);
auto left_word = shift(left_word0, next, left_offset);
left_word0 = next;
I'd also think it would be worth transforming the loop into a fixed for-loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Much cleaner code after refinement. Thanks.
@cyb70289 Thank for this. For future reference, if you add a new benchmark, do it first as a seperate commit, and then add the improvement in a following (chronologically) second commit. You'll be able to check the difference with ursabot, e.g. |
Unaligned bitmap comparision are currently processed bit-by-bit.
Comparing word-by-word in uint64 improves performance significantly.
Bechmark(comparing two identical bitmaps with 64K bits) jumps from
86M/s to 5.7G/s on my test machine.
NOTE: This patch may hurt performance if two bitmaps differ at the very
begining bits, as it always loads and compares 64 bits if possible. Bit
by bit comparison will notice the difference and return false earlier.
This should not be a problem in practice.