-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Speed up string view comparison (up to 3x) #9250
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speed up string view comparison (up to 3x) #9250
Conversation
|
run benchmark comparison_kernels |
|
🤖 |
|
🤖: Benchmark completed Details
|
Seems a good direction |
|
@zhuqi-lucas as you have spent time optimzing string view comparison in other PRs, do you have time to help review this PR as well? |
|
run benchmark comparison_kernels |
|
🤖 |
Nice improvement, i plan to review this PR in few days. |
|
🤖: Benchmark completed Details
|
|
run benchmark comparison_kernels |
|
🤖 |
|
run benchmark comparison_kernels |
|
run benchmark arrow_reader_clickbench |
I found some more time for improvement 🚀 |
|
🤖: Benchmark completed Details
|
|
🤖 |
|
On my machine (Apple M2) it makes more of a difference, but still overall improvements (also I think the prefix change might make more of a difference.in real life due to avoiding random access). |
|
🤖: Benchmark completed Details
|
|
🤖 |
|
|
🤖: Benchmark completed Details
|
|
Improvements on clickbench seem to correlate well with |
|
Previous optimization attempt, mostly for the empty string case: #7767 |
|
run benchmark arrow_reader_clickbench |
|
🤖 |
|
🤖: Benchmark completed Details
|
zhuqi-lucas
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, great work @Dandandan , the benchmark result is promising, and the code is more clean.
| /// - The combination ensures that a single `u128` comparison correctly orders by inline data | ||
| /// first (lexicographically), then by length (numerically). | ||
| #[inline(always)] | ||
| pub fn inline_key_fast(raw: u128) -> u128 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The datafusion seems also use this function, we also need to run datafusion based this change to make sure it's not breaking something.
| if l_len == 0 && r_len == 0 { | ||
|
|
||
| // Both are empty | ||
| if l_len == 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found this will compile to same code finally, but it's more clean here, i agree.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah...
|
I have a few more tests I would like to propose: |
# Which issue does this PR close? - Follow on to #9250 # Rationale for this change While (posthumously) reviewing #9250 from @Dandandan and @zhuqi-lucas I noticed that some of the special case branches are not covered. # What changes are included in this PR? Add some more tests to cover all the special cases # Are these changes tested? Yes, only tests # Are there any user-facing changes? <!-- If there are user-facing changes then we may require documentation to be updated before approving the PR. If there are any breaking changes to public APIs, please call them out. -->
Which issue does this PR close?
Rationale for this change
I saw this was a bit hot in profiles.
Details
What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?