-
Notifications
You must be signed in to change notification settings - Fork 962
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove lastPosBlockOffset
from term metadata for Lucene90PostingsFormat
#12536
Comments
I quickly tried this out and realized it doesn't work when we need to skip with the skip lists we we lost track of how many positions that got skipped. |
Sorry, what went wrong when you tried to remove Hmm I think I see -- there are in general more positions blocks (since there can of course be many more positions than doc/freq) than doc/freq blocks, and so as Maybe make a PR adding a nice comment explaining why we really do need this |
Today when it skips, the skipper can tell us 1) the offset of the position block we should seek to 2) how many positions it needs to skip within this block. This is because post-skipping the first doc's position is not always at the start of a positions block (that's actually the most common case). Additionally, advancing to a specific doc that is not the start of a block means we need to skip those docs as well as their positions. In theory, if the skipper can tell us how many positions it has skipped that would work. This will require storing more information in the skip data than the current scheme.
|
#12541 adds more comment for |
OK, and it seems like that would not be a good tradeoff? Every skip entry would need to record how many total positions were skipped, whereas the |
#12541 is merged and I'll close this one |
Description
lastPosBlockOffset
is used to identify the last non-packed-encoded positions block. The same information can be derived usetotalTermFreq
.This could save space taken by the FSTs for positions enabled fields.
The text was updated successfully, but these errors were encountered: