New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Faster bulk numeric reads from BufferedIndexInput #12453
Faster bulk numeric reads from BufferedIndexInput #12453
Conversation
Reading ints/floats/longs one-by-one from a heap-byte-buffer, including doing our own bounds checks is not very efficient. We can use the ability to translate the buffer and read in bulk while taking turns with one-off reading/refilling instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left some nit-picky comments and some questions, but overall this looks like a good and useful change.
@@ -159,6 +159,63 @@ public final long readLong() throws IOException { | |||
} | |||
} | |||
|
|||
@Override | |||
public void readFloats(float[] floats, int offset, int len) throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Could we change the argument names to be consistent (dst
, len
)? I see the parent class, DataInput
, has the same inconsistency.
@@ -159,6 +159,63 @@ public final long readLong() throws IOException { | |||
} | |||
} | |||
|
|||
@Override | |||
public void readFloats(float[] floats, int offset, int len) throws IOException { | |||
int remaining = len; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: I found remaining
to be a little confusing just because the buffer also calls to remaining()
. What if we called it remainingDst
to better differentiate from the buffer's remaining
?
} | ||
|
||
@Override | ||
public void readLongs(long[] dst, int offset, int length) throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering why we're following a different pattern with these methods than with readBytes
. If we rewrote readBytes
using this model, would it just work?
@Override
public final void readBytes(byte[] dst, int offset, int len) throws IOException {
int remaining = len;
while (remaining > 0) {
int cnt = Math.min(buffer.remaining(), remaining);
buffer.get(dst, offset + len - remaining, cnt);
remaining -= cnt;
if (remaining > 0) {
if (buffer.hasRemaining()) {
dst[offset + len - remaining] = readByte();
--remaining;
} else {
refill();
}
}
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would come out to about that code if we inlined readBytes
with useBuffer=true
except that
if (buffer.hasRemaining()) {
dst[offset + len - remaining] = readByte();
--remaining;
}
is never entered, hasRemaining
is just always false because we don't have anything like 1 one of 4 int bytes available.
-> the difference I think is just in that we have to deal with the alignment at the buffer boundary
} | ||
} | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should TestBufferedIndexInput
have some basic tests for these new methods?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
++ good call, already had one bug that I only found when running benchmarks. Added a test for each of the 3 that should cover all the boundary/alignment/offset cases we can expect.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me. Can you add an entry to CHANGES.txt?
.put(byten(offset + 1)) | ||
.put(byten(offset + 2)) | ||
.put(byten(offset + 3)); | ||
assertEquals(bb.getFloat(0), floatBuffer[idx], 0f); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: would it make more sense to compare the int bits rather than the float itself, to make sure +/-0 or the different representations of NaN are considered different?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right that's technically more exact I think :) Adjusted that and added changes.txt entry
lucene/CHANGES.txt
Outdated
@@ -84,6 +84,8 @@ Optimizations | |||
|
|||
* GITHUB#12372: Reduce allocation during HNSW construction (Jonathan Ellis) | |||
|
|||
* GITHUB#12453: Faster bulk numeric reads from BufferedIndexInput (Armin Braun) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the section for the upcoming Lucene major (10.0), your change looks safe for backport, let's move it to section 9.8?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I pushed a change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Adrien!
Reading ints/floats/longs one-by-one from a heap-byte-buffer, including doing our own bounds checks is not very efficient. We can use the ability to translate the buffer and read in bulk while taking turns with one-off reading/refilling instead.
Reading ints/floats/longs one-by-one from a heap-byte-buffer, including doing our own bounds checks is not very efficient. We can use the ability to translate the buffer and read in bulk while taking turns with one-off reading/refilling instead.