New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LUCENE-9907: Remove packedInts#getReaderNoHeader dependency on TermsVectorFieldsFormat #72
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't feel great about how we're guessing the number of bytes it takes to store the data. In every other place where we use DirectWriter/DirectReader we treat it as an implementation detail and make sure to record how many bytes were required in the file format. Could you update this PR to make the DirectWriter
write into a temporary ByteBuffersDataOutput
first so that we can know how many bytes were needed, and write this number as a vInt
to the file format? This will use a few more bytes per block than the current file format, but I don't think that it would matter in practice.
On the read-side, I'd like to find a way to avoid creating a new slice for every block. Maybe we should create an anonymous RandomAccessInput that wraps the IndexInput like the default implementation of IndexInput#randomAccessSlice
does? (minus the cloning)
I have added the length needed to store the int array so we can retrieve it before reading it. In the read side, I found that wrapping the IndexInput is tricky as you might be changing the current position of the index at any point. I took a different approach and I am reading the data into a byte[] which is equivalent to what we were doing before by reading it into a long[]. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great!
Replaces the usages of PackedInts#getReaderNoHeader with DirecReader#getInstance.