-
Notifications
You must be signed in to change notification settings - Fork 995
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clean up ByteBlockPool #12506
Clean up ByteBlockPool #12506
Conversation
* 0</code> bytes before they reused or passed to {@link Allocator#recycleByteBlocks(byte[][], | ||
* int, int)}. Calling {@link ByteBlockPool#nextBuffer()} is not needed after reset. | ||
* Resets the pool to its initial state, reusing the first buffer and filling all buffers with | ||
* {@code 0} bytes before they are reused or passed to {@link |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we really need to zero out the buffers that will be recycled. Maybe it makes more sense to have the recycling method do it if necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to make this cleaner (who zeros). Why does byte slicing even require pre-zero'd buffers?
Maybe open a follow-on issue for this? This change is already great.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, there's enough going on in this PR already. I opened an issue to look into this separately: #12734
I noticed the failing checks on this PR, but I haven't been able to reproduce them. They appear related to the nested javadoc tags I had introduced. I've removed them now. Hopefully that satisfies the checks. |
1. Move slice functionality to a separate class, ByteSlicePool. 2. Add exception case if the requested slice size is larger than the block size. 3. Make pool buffers private instead of the comment asking not to modify it. 4. Consolidate setBytesRef methods with int offsets. 5. Simplify ramBytesUsed. 6. Update and expand comments and javadoc.
2a6272a
to
a86b4de
Compare
The last commit is a large rebase + conflict resolution after #12625 got merged. What this PR does hasn't really changed. |
Thanks @stefanvodita -- I'll try to have a look soon! And thank you for gracefully handling the "two people made very similar changes" situation :) This happens often in open source, but it's actually a good thing since you get two very different perspectives and the final solution is best of both. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great, thanks @stefanvodita! I left a few minor comments...
I wonder if this will impact indexing performance ... I don't think we need to test before pushing, but let's watch the nightly run after this is merged to see if there was any impact?
@@ -46,6 +65,7 @@ protected Allocator(int blockSize) { | |||
|
|||
public abstract void recycleByteBlocks(byte[][] blocks, int start, int end); | |||
|
|||
// TODO: This is not used. Can we remove? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to remove now! This is an internal API -- we are free to suddenly change it.
*/ | ||
void setBytesRef(BytesRefBuilder builder, BytesRef result, long offset, int length) { | ||
void setBytesRef(BytesRefBuilder builder, BytesRef result, int offset, int length) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm why the change from long
-> int
? Previously we were able to address 32 + BYTE_BLOCK_SHIFT
bits of address space using long
? Or did that fail to work somewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I missed that the purpose of the long was to increase the address space. I'll change this back.
result.length = length; | ||
|
||
int bufferIndex = (int) (offset >> BYTE_BLOCK_SHIFT); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we stick with long
we should change this to Math.toIntExact
to catch overflow?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea!
size += RamUsageEstimator.shallowSizeOf(buffers); | ||
for (byte[] buf : buffers) { | ||
if (buf == buffer) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did the previous code special case buffer
? Isn't buffer
a full sized block?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't figured out why buffer
was a special case. Even if it were not a full block, would that make a difference if we're calling RamUsageEstimator#sizeOfObject
on it in both cases?
|
||
/** | ||
* Class that Posting and PostingVector use to write byte streams into shared fixed-size byte[] | ||
* arrays. The idea is to allocate slices of increasing lengths For example, the first slice is 5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Period after lengths -- of increasing lengths. For example,
package org.apache.lucene.util; | ||
|
||
/** | ||
* Class that Posting and PostingVector use to write byte streams into shared fixed-size byte[] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe say write interleaved byte streams
?
* | ||
* @lucene.internal | ||
*/ | ||
public class ByteSlicePool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
final
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe move this class to org.apache.lucene.index
and be package private?
public class ByteSlicePool { | ||
/** | ||
* The underlying structure consists of fixed-size blocks. We overlay variable-length slices on | ||
* top. Each slice is contiguous in memory, i.e. it does not strddle multiple blocks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
strddle
-> straddle
import org.apache.lucene.tests.util.TestUtil; | ||
|
||
public class TestByteSlicePool extends LuceneTestCase { | ||
public void testAllocKnowSizeSlice() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Know
-> Known
?
} | ||
} | ||
|
||
public void testAllocLargeSlice() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we maybe add a randomized test that writes a random number of random length interleaves streams, then reads them back and confirms the streams match?
I like the introduction of In the case we keep this class, let's remove the static method I introduced. |
Thanks for the review @iverase! I’m putting together a new revision. Do you have a name suggestion for |
I’ve integrated most of the suggestions. There’s just the matter of the name for the slice pool class and the right package for it. I don’t have a strong opinion on this. Maybe we move it to |
+1 to move to |
Thanks @mikemccand! I just pushed a commit that does that move. |
@mikemccand @iverase - what do you think, is this PR ready? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changes LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great, thank you @stefanvodita for giving some love to this long neglected quite hairy class.
I'll merge today.
* Clean up ByteBlockPool 1. Move slice functionality to a separate class, ByteSlicePool. 2. Add exception case if the requested slice size is larger than the block size. 3. Make pool buffers private instead of the comment asking not to modify it. 4. Consolidate setBytesRef methods with int offsets. 5. Simplify ramBytesUsed. 6. Update and expand comments and javadoc. * Revert to long offsets in ByteBlockPool; Clean-up * Remove slice functionality from TermsHashPerField * First working test * [Broken] Test interleaving slices * Fixed randomized test * Remove redundant tests * Tidy * Add CHANGES * Move ByteSlicePool to oal.index
…backport where Java 11 doesn't have this API
There are several changes here. The ones that change public API:
ByteBlockPool
to its own class. There's probably more we can do here, but separating the slices from the block pool seems like a step in the right direction.ByteBlockPool
. I've made it private instead and readable through a getter to enforce that.The varioussetBytesRef
are consolidated. The offsets they worked with were all ints, so I've changed those that were declared as longs.Functionality changes:
newSlice
with a size larger than a block will throw an exception.Decoding a length larger than the remaining buffer space insetBytesRef
will trip an assertion.ramBytesUsage
has become simpler, but produces the same result.ByteBlockPool.Allocator.recycleByteBlocks
is removed.ByteBlockPool.setBytesRef
overflows.Other than that, comments and javadocs are updated or expanded.
Closes #6675.