Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LUCENE-9907: Remove packedInts#getReaderNoHeader dependency on TermsVectorFieldsFormat #72

Merged
merged 6 commits into from Apr 15, 2021

Conversation

iverase
Copy link
Contributor

@iverase iverase commented Apr 8, 2021

Replaces the usages of PackedInts#getReaderNoHeader with DirecReader#getInstance.

Copy link
Contributor

@jpountz jpountz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't feel great about how we're guessing the number of bytes it takes to store the data. In every other place where we use DirectWriter/DirectReader we treat it as an implementation detail and make sure to record how many bytes were required in the file format. Could you update this PR to make the DirectWriter write into a temporary ByteBuffersDataOutput first so that we can know how many bytes were needed, and write this number as a vInt to the file format? This will use a few more bytes per block than the current file format, but I don't think that it would matter in practice.

On the read-side, I'd like to find a way to avoid creating a new slice for every block. Maybe we should create an anonymous RandomAccessInput that wraps the IndexInput like the default implementation of IndexInput#randomAccessSlice does? (minus the cloning)

@iverase
Copy link
Contributor Author

iverase commented Apr 15, 2021

I have added the length needed to store the int array so we can retrieve it before reading it.

In the read side, I found that wrapping the IndexInput is tricky as you might be changing the current position of the index at any point. I took a different approach and I am reading the data into a byte[] which is equivalent to what we were doing before by reading it into a long[].

Copy link
Contributor

@jpountz jpountz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants