-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LUCENE-9636: Extract and operation to get a SIMD optimize #2139
Conversation
This is excellent, thank you! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great. I'm curious if you tested other numbers of bits per value than 15?
Co-authored-by: 郭峰 <guofeng.my@bytedance.com>
The
Accroding to this result, methods will get a bit slower when bits per value <= 12. but when i removed their optimization, the end-to-end benchmark result become slower...
So I chose to pay more attention to the end-to-end result, and reserved all optimizations for them:) |
Co-authored-by: 郭峰 <guofeng.my@bytedance.com>
Co-authored-by: 郭峰 <guofeng.my@bytedance.com>
Description
In
decode6()
decode7()
decode12()
decode14()
decode15()
decode24()
, longs always&
a same mask and do some shift. By printing assemble language, i find that JIT did not optimize them with SIMD instructions. But when we extract all&
operations and do them first, JIT will use SIMD to optimize them.Tests
Java Version:
Method Benchmark
Using
decode15()
as an example, here is a microbenchmark based on JMH:code:
Result:
End-to-end Benchmark
An end-to-end benchmark based on wikimedium1m also looks positive overall: