-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize FST suffix sharing for block tree index #12702
Comments
But won't the leading bytes of
This is impressive! I would have expected a worse impact. This is likely because
I'm curious: are there any On the "limit how much RAM FST Compiler is allowed to use to share suffixes" PR I also tested fully disabling Similarly, if we explore experimental codecs that hold all terms in an FST, now possible / reasonable since the FST is off-heap, sharing the suffixes will be important. |
Maybe we should just disable suffix sharing when building the BlockTree terms index FST? It would sidestep the whole extra RAM that NodeHash will have to allocate? |
+1. I'll confirm the index/search performance later. |
on Queries (Nothing changed obviously):
index (slightly faster):
|
Description
Today our block tree index will encode
floordata
as output. The floor data is guaranteed to be stored within single arc (never be prefix shared) in FST because fp is encoded before it. As a suffix, floor data can rarely be shared too. I wonder if we can avoid adding floor data outputs intoNodeHash
some way?Out of curiosity, i tried to completely disable suffix sharing in block tree index, result in only 1.47% total .tip size increased for
wikimediumall
.The text was updated successfully, but these errors were encountered: