Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make FSTPostingsFormat load FSTs off-heap #12552

Merged
merged 4 commits into from
Sep 19, 2023
Merged

Conversation

Tony-X
Copy link
Contributor

@Tony-X Tony-X commented Sep 12, 2023

Description

FSTs supports to load offheap for a while. As we were trying to use FSTPostingsFormat for some fields we realized heap usage bumped.

Upon further investigation we realized the FSTPostingsFormat does not load FSTs offheap. This PR addresses that.

this.dict = new FST<>(in, in, new FSTTermOutputs(fieldInfo));
OffHeapFSTStore offHeapFSTStore = new OffHeapFSTStore();
this.dict = new FST<>(in, in, new FSTTermOutputs(fieldInfo), offHeapFSTStore);
in.skipBytes(offHeapFSTStore.size());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just curious - why do we need to skip to the end?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The OffHeapFSTStore doesn't advance the input (but the on-heap one does), we need to seek it manually since the input contains multiple FSTs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was worried at first that we are not cloning this IndexInput anywhere and that this would cause concurrency bugs when two queries pull a TermsEnum here, but we are OK because OffHeapFSTStore does this cloning when it pulls a random access slice from the IndexInput.

@mikemccand
Copy link
Member

@Tony-X have you tried passing all Lucene unit tests using this Codec? I think you can add -Dtests.codec=... to force all tests to use it.

@Tony-X
Copy link
Contributor Author

Tony-X commented Sep 14, 2023

@mikemccand hey Mike, I did not make a new Codec for this. IIRC, FSTPostingsFormat will be exercised by the RandomCodec. Also there is TestFSTPostingsFormat extends BasePostingsFormatTestCase that tested this PostingsFormat.

Oh wait, I see that the test target support -Dtests.postingsformat to override posting format. So I ran ./gradlew test -Dtests.postingsformat=FST50 and all tests passed.

Since the successful output doesn't provide randomization info, I also ran with an non-existent postings format ./gradlew test -Dtests.postingsformat=FST5x and it failed, which means the override is happening.
`

@msokolov
Copy link
Contributor

Thanks for the CHANGES entry - I'll push shortly

@msokolov msokolov merged commit ca69ae6 into apache:main Sep 19, 2023
4 checks passed
@Tony-X
Copy link
Contributor Author

Tony-X commented Sep 19, 2023

Thanks @msokolov !

@Tony-X Tony-X deleted the off-heap-fst branch September 19, 2023 23:19
@mikemccand
Copy link
Member

I think this was mistakingly not backported to 9.x? (I only caught this because I was seeing merge conflicts trying to backport #12803 and saw this. I'll backport shortly -- I think this is low risk for the pending 9.9.0 release: this has baked in main for a couple months, and this is a very rarely used experimental postings format.

mikemccand pushed a commit that referenced this pull request Nov 28, 2023
* Make FSTPostingsFormat load FSTs off-heap
@mikemccand mikemccand added this to the 9.9.0 milestone Nov 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants