-
Notifications
You must be signed in to change notification settings - Fork 4k
ARROW-10598: [C++] Separate out bit-packing in internal::GenerateBitsUnrolled for better performance #8671
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
cpp/src/arrow/util/bitmap_generate.h
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This presumes that g() either returns a bool or 0/1 -- might want to enforce this (at compile time) but wasn't sure the best way to do that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe out_results[0] = !!g(); ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm adding a compile-time assertion
|
Tested on gcc-7.5 has big improvement. clang-9 no benefit. |
|
The PR looks reasonable to me. |
c0b08ee to
50221b3
Compare
|
+1, thanks all |
This issue was pointed out to me by some folks at UWisc and just getting around to trying out a fix. This produces a substantial performance benefit in numerical comparisons with gcc8 (with sse4.2 only)
The difference is less pronounced with clang-8, but still present
I don't know why this seems to cause performance to be worse in the string comparison cases.
There might be some other things to try here like bit-packing in larger batches.