-
Notifications
You must be signed in to change notification settings - Fork 962
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use max BPV encoding in postings if doc buffer size less than ForUtil.BLOCK_SIZE #12717
Comments
Does this actually matter for performance? My gut feeling is that either a value has a long postings list, and then the vast majority of blocks will be encoded with PFOR and should be fast, or it has a short postings list and then queries should be fast anyway? |
@jpountz Thanks for your explanation, i got some flame graph that shows the |
Since we're changing the postings format anyway in #12741, I wonder if it would be worth looking into different encodings for these tail postings. Maybe we could use group-varint and only fall back to regular vints for the last 1, 2 or 3 postings? (Group-varint is supposedly faster to decode than regular vints.) |
Ohhhhh.. Group-varint is a interesting encoder, I'd love to try it later in the week |
Description
Currently we use vint encoding the doc IDs if the doc buffer < 128, then decode in
Lucene90PostingsReader#readVIntBlock
. In the high cardinality field, it it possibly slow to decoding, can we use the max BPV to encode the block likeDocIdsWriter#writeDocIds
orDirectWriter
?The text was updated successfully, but these errors were encountered: