New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Where should we stream FST to disk directly? #12902
Comments
A point for reference, Tantivy saves the metadata to the end of file, and it will first jump to the end to know the size and starting node. But we couldn't do it as one file might contain multiple FST |
A candidate could be the |
I just briefly looked at the code, but it seems As we can't write the FST main body into the same IndexOutput as the FST metadata, there are 2 possibilities:
|
I realized If we change |
Put the first PR for |
Description
Most of the use cases with FST seems to be writing the FST eventually to a DataOutput (is it IndexOutput?). In that case we are currently writing the FST to an on heap DataOutput (previously BytesStore and now ReadWriteDataOutput) and then save it to the on disk.
With #12624 it's possible to write the FST to an on disk DataOutput. So maybe let first compile a list of places which can be migrated to the new way?
Note: With the new way, there is a catch: We can't write the metadata in the same DataOutput as the main FST body, it has to be written separately.
The text was updated successfully, but these errors were encountered: