LUCENE-9907: Remove packedInts#getReaderNoHeader dependency on TermsVectorFieldsFormat #72

iverase · 2021-04-08T09:49:11Z

Replaces the usages of PackedInts#getReaderNoHeader with DirecReader#getInstance.

…rmat

jpountz

I don't feel great about how we're guessing the number of bytes it takes to store the data. In every other place where we use DirectWriter/DirectReader we treat it as an implementation detail and make sure to record how many bytes were required in the file format. Could you update this PR to make the DirectWriter write into a temporary ByteBuffersDataOutput first so that we can know how many bytes were needed, and write this number as a vInt to the file format? This will use a few more bytes per block than the current file format, but I don't think that it would matter in practice.

On the read-side, I'd like to find a way to avoid creating a new slice for every block. Maybe we should create an anonymous RandomAccessInput that wraps the IndexInput like the default implementation of IndexInput#randomAccessSlice does? (minus the cloning)

iverase · 2021-04-15T12:12:10Z

I have added the length needed to store the int array so we can retrieve it before reading it.

In the read side, I found that wrapping the IndexInput is tricky as you might be changing the current position of the index at any point. I took a different approach and I am reading the data into a byte[] which is equivalent to what we were doing before by reading it into a long[].

jpountz

Great!

...java/org/apache/lucene/codecs/lucene90/compressing/Lucene90CompressingTermVectorsWriter.java

...java/org/apache/lucene/codecs/lucene90/compressing/Lucene90CompressingTermVectorsReader.java

iverase added 4 commits April 8, 2021 11:46

LUCENE-9907: Remove packedInts#getReader dependency on StoredFieldsFo…

c5d196c

…rmat

add expected size

939a98b

Merge branch 'main' into termsVectorReader

ceceb3d

fixes after refreshing

e4e7623

jpountz reviewed Apr 15, 2021

View reviewed changes

iter

f54e723

jpountz approved these changes Apr 15, 2021

View reviewed changes

...java/org/apache/lucene/codecs/lucene90/compressing/Lucene90CompressingTermVectorsWriter.java Show resolved Hide resolved

...java/org/apache/lucene/codecs/lucene90/compressing/Lucene90CompressingTermVectorsReader.java Outdated Show resolved Hide resolved

address review comments

fc4ff1c

jpountz approved these changes Apr 15, 2021

View reviewed changes

iverase merged commit 873ac5f into apache:main Apr 15, 2021

iverase deleted the termsVectorReader branch April 15, 2021 14:04

asfimport mentioned this pull request Dec 8, 2021

Remove dependency on PackedInts#getReader() in all current codecs [LUCENE-9907] #10946

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LUCENE-9907: Remove packedInts#getReaderNoHeader dependency on TermsVectorFieldsFormat #72

LUCENE-9907: Remove packedInts#getReaderNoHeader dependency on TermsVectorFieldsFormat #72

iverase commented Apr 8, 2021

jpountz left a comment

iverase commented Apr 15, 2021

jpountz left a comment

LUCENE-9907: Remove packedInts#getReaderNoHeader dependency on TermsVectorFieldsFormat #72

LUCENE-9907: Remove packedInts#getReaderNoHeader dependency on TermsVectorFieldsFormat #72

Conversation

iverase commented Apr 8, 2021

jpountz left a comment

Choose a reason for hiding this comment

iverase commented Apr 15, 2021

jpountz left a comment

Choose a reason for hiding this comment