Clean up ByteBlockPool #12506

stefanvodita · 2023-08-12T22:24:43Z

There are several changes here. The ones that change public API:

Slice functionality is moved out from ByteBlockPool to its own class. There's probably more we can do here, but separating the slices from the block pool seems like a step in the right direction.
There was a comment asking not to modify the buffers array outside ByteBlockPool. I've made it private instead and readable through a getter to enforce that.
~~The various setBytesRef are consolidated. The offsets they worked with were all ints, so I've changed those that were declared as longs.~~

Functionality changes:

Calling newSlice with a size larger than a block will throw an exception.
~~Decoding a length larger than the remaining buffer space in setBytesRef will trip an assertion.~~
ramBytesUsage has become simpler, but produces the same result.
ByteBlockPool.Allocator.recycleByteBlocks is removed.
Throw an exception if the buffer index in ByteBlockPool.setBytesRef overflows.

Other than that, comments and javadocs are updated or expanded.

Closes #6675.

stefanvodita · 2023-08-14T18:16:55Z

lucene/core/src/java/org/apache/lucene/util/ByteBlockPool.java

-   * 0</code> bytes before they reused or passed to {@link Allocator#recycleByteBlocks(byte[][],
-   * int, int)}. Calling {@link ByteBlockPool#nextBuffer()} is not needed after reset.
+   * Resets the pool to its initial state, reusing the first buffer and filling all buffers with
+   * {@code 0} bytes before they are reused or passed to {@link


I wonder if we really need to zero out the buffers that will be recycled. Maybe it makes more sense to have the recycling method do it if necessary?

+1 to make this cleaner (who zeros). Why does byte slicing even require pre-zero'd buffers?

Maybe open a follow-on issue for this? This change is already great.

Right, there's enough going on in this PR already. I opened an issue to look into this separately: #12734

stefanvodita · 2023-09-30T10:51:16Z

I noticed the failing checks on this PR, but I haven't been able to reproduce them. They appear related to the nested javadoc tags I had introduced. I've removed them now. Hopefully that satisfies the checks.

1. Move slice functionality to a separate class, ByteSlicePool. 2. Add exception case if the requested slice size is larger than the block size. 3. Make pool buffers private instead of the comment asking not to modify it. 4. Consolidate setBytesRef methods with int offsets. 5. Simplify ramBytesUsed. 6. Update and expand comments and javadoc.

stefanvodita · 2023-10-21T13:21:47Z

The last commit is a large rebase + conflict resolution after #12625 got merged. What this PR does hasn't really changed.

mikemccand · 2023-10-21T13:50:24Z

Thanks @stefanvodita -- I'll try to have a look soon! And thank you for gracefully handling the "two people made very similar changes" situation :)

This happens often in open source, but it's actually a good thing since you get two very different perspectives and the final solution is best of both.

mikemccand

This looks great, thanks @stefanvodita! I left a few minor comments...

I wonder if this will impact indexing performance ... I don't think we need to test before pushing, but let's watch the nightly run after this is merged to see if there was any impact?

mikemccand · 2023-10-23T16:49:08Z

lucene/core/src/java/org/apache/lucene/util/ByteBlockPool.java

@@ -46,6 +65,7 @@ protected Allocator(int blockSize) {

    public abstract void recycleByteBlocks(byte[][] blocks, int start, int end);

+    // TODO: This is not used. Can we remove?


+1 to remove now! This is an internal API -- we are free to suddenly change it.

mikemccand · 2023-10-23T16:54:54Z

lucene/core/src/java/org/apache/lucene/util/ByteBlockPool.java

   */
-  void setBytesRef(BytesRefBuilder builder, BytesRef result, long offset, int length) {
+  void setBytesRef(BytesRefBuilder builder, BytesRef result, int offset, int length) {


Hmm why the change from long -> int? Previously we were able to address 32 + BYTE_BLOCK_SHIFT bits of address space using long? Or did that fail to work somewhere?

I missed that the purpose of the long was to increase the address space. I'll change this back.

mikemccand · 2023-10-23T16:55:12Z

lucene/core/src/java/org/apache/lucene/util/ByteBlockPool.java

    result.length = length;

-    int bufferIndex = (int) (offset >> BYTE_BLOCK_SHIFT);


If we stick with long we should change this to Math.toIntExact to catch overflow?

mikemccand · 2023-10-23T16:56:21Z

lucene/core/src/java/org/apache/lucene/util/ByteBlockPool.java

    size += RamUsageEstimator.shallowSizeOf(buffers);
    for (byte[] buf : buffers) {
-      if (buf == buffer) {


Why did the previous code special case buffer? Isn't buffer a full sized block?

I haven't figured out why buffer was a special case. Even if it were not a full block, would that make a difference if we're calling RamUsageEstimator#sizeOfObject on it in both cases?

mikemccand · 2023-10-23T16:57:22Z

lucene/core/src/java/org/apache/lucene/util/ByteSlicePool.java

+
+/**
+ * Class that Posting and PostingVector use to write byte streams into shared fixed-size byte[]
+ * arrays. The idea is to allocate slices of increasing lengths For example, the first slice is 5


Period after lengths -- of increasing lengths. For example,

mikemccand · 2023-10-23T16:57:48Z

lucene/core/src/java/org/apache/lucene/util/ByteSlicePool.java

+package org.apache.lucene.util;
+
+/**
+ * Class that Posting and PostingVector use to write byte streams into shared fixed-size byte[]


Maybe say write interleaved byte streams?

mikemccand · 2023-10-23T16:58:13Z

lucene/core/src/java/org/apache/lucene/util/ByteSlicePool.java

+ *
+ * @lucene.internal
+ */
+public class ByteSlicePool {


maybe move this class to org.apache.lucene.index and be package private?

mikemccand · 2023-10-23T16:58:56Z

lucene/core/src/java/org/apache/lucene/util/ByteSlicePool.java

+public class ByteSlicePool {
+  /**
+   * The underlying structure consists of fixed-size blocks. We overlay variable-length slices on
+   * top. Each slice is contiguous in memory, i.e. it does not strddle multiple blocks.


strddle -> straddle

mikemccand · 2023-10-23T17:00:15Z

lucene/core/src/test/org/apache/lucene/util/TestByteSlicePool.java

+import org.apache.lucene.tests.util.TestUtil;
+
+public class TestByteSlicePool extends LuceneTestCase {
+  public void testAllocKnowSizeSlice() {


Know -> Known?

mikemccand · 2023-10-23T17:01:07Z

lucene/core/src/test/org/apache/lucene/util/TestByteSlicePool.java

+    }
+  }
+
+  public void testAllocLargeSlice() {


Could we maybe add a randomized test that writes a random number of random length interleaves streams, then reads them back and confirms the streams match?

iverase · 2023-10-24T10:48:54Z

I like the introduction of ByteSlicePool but I wonder if the naming is correct as it does not feel a generic slicer class but very tied to the format used by TermsHashPerField. Just wondering if we can have a more descriptive name ,(That is one of the reason I kept it inside TermsHashPerField).

In the case we keep this class, let's remove the static method I introduced.

stefanvodita · 2023-10-26T21:05:24Z

Thanks for the review @iverase! I’m putting together a new revision. Do you have a name suggestion for ByteSlicePool? I don’t really have a better idea. By putting it in a separate class, I was also trying to account for the possibility that it would find other uses in the future, but maybe that's not likely.

stefanvodita · 2023-10-27T21:28:00Z

I’ve integrated most of the suggestions. There’s just the matter of the name for the slice pool class and the right package for it. I don’t have a strong opinion on this. Maybe we move it to org.apache.lucene.index and keep the name. It goes well with ByteSliceReader.

mikemccand · 2023-10-29T16:04:51Z

+1 to move to oal.index, and make it package private if possible? ByteSlicePool name sounds good to me :) Naming is the hardest part!

stefanvodita · 2023-10-29T22:07:18Z

Thanks @mikemccand! I just pushed a commit that does that move.

stefanvodita · 2023-11-06T09:29:50Z

@mikemccand @iverase - what do you think, is this PR ready?

iverase

changes LGTM

mikemccand

Looks great, thank you @stefanvodita for giving some love to this long neglected quite hairy class.

I'll merge today.

* Clean up ByteBlockPool 1. Move slice functionality to a separate class, ByteSlicePool. 2. Add exception case if the requested slice size is larger than the block size. 3. Make pool buffers private instead of the comment asking not to modify it. 4. Consolidate setBytesRef methods with int offsets. 5. Simplify ramBytesUsed. 6. Update and expand comments and javadoc. * Revert to long offsets in ByteBlockPool; Clean-up * Remove slice functionality from TermsHashPerField * First working test * [Broken] Test interleaving slices * Fixed randomized test * Remove redundant tests * Tidy * Add CHANGES * Move ByteSlicePool to oal.index

…backport where Java 11 doesn't have this API

stefanvodita commented Aug 14, 2023

View reviewed changes

stefanvodita mentioned this pull request Oct 11, 2023

Refactor ByteBlockPool so it is just a "shift/mask big array" #12625

Merged

stefanvodita force-pushed the simplify-alloc branch from 2a6272a to a86b4de Compare October 21, 2023 13:20

mikemccand reviewed Oct 23, 2023

View reviewed changes

stefanvodita added 4 commits October 23, 2023 19:10

Revert to long offsets in ByteBlockPool; Clean-up

f397a94

Remove slice functionality from TermsHashPerField

a644ba4

First working test

eaa59ca

[Broken] Test interleaving slices

3e264b0

stefanvodita added 4 commits October 27, 2023 20:12

Fixed randomized test

b5449d8

Remove redundant tests

27c1647

Tidy

a502f14

Add CHANGES

ae28f7c

Move ByteSlicePool to oal.index

5039339

Merge branch 'main' into simplify-alloc

375d117

iverase approved these changes Nov 6, 2023

View reviewed changes

mikemccand approved these changes Nov 6, 2023

View reviewed changes

mikemccand merged commit 6aed68f into apache:main Nov 6, 2023
4 checks passed

mikemccand mentioned this pull request Nov 6, 2023

Should we ban Random#nextInt(int, int)? #12771

Closed

mikemccand added a commit that referenced this pull request Nov 6, 2023

#12506: switch to TestUtil for random ints to be consistent with 9.x …

2a31f28

…backport where Java 11 doesn't have this API

mikemccand added a commit that referenced this pull request Nov 6, 2023

#12506: escape > in javadocs

9b82aa3

mikemccand added a commit that referenced this pull request Nov 6, 2023

#12506: escape > in javadocs

3acc3c8

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clean up ByteBlockPool #12506

Clean up ByteBlockPool #12506

stefanvodita commented Aug 12, 2023 •

edited

Loading

stefanvodita Aug 14, 2023

mikemccand Oct 29, 2023

stefanvodita Oct 29, 2023

stefanvodita commented Sep 30, 2023

stefanvodita commented Oct 21, 2023

mikemccand commented Oct 21, 2023

mikemccand left a comment

mikemccand Oct 23, 2023

mikemccand Oct 23, 2023

stefanvodita Oct 27, 2023

mikemccand Oct 23, 2023

stefanvodita Oct 27, 2023

mikemccand Oct 23, 2023

stefanvodita Oct 27, 2023

mikemccand Oct 23, 2023

mikemccand Oct 23, 2023

mikemccand Oct 23, 2023

iverase Oct 24, 2023

mikemccand Oct 23, 2023

mikemccand Oct 23, 2023

mikemccand Oct 23, 2023

iverase commented Oct 24, 2023

stefanvodita commented Oct 26, 2023

stefanvodita commented Oct 27, 2023

mikemccand commented Oct 29, 2023

stefanvodita commented Oct 29, 2023

stefanvodita commented Nov 6, 2023

iverase left a comment

mikemccand left a comment

		@@ -46,6 +65,7 @@ protected Allocator(int blockSize) {

		public abstract void recycleByteBlocks(byte[][] blocks, int start, int end);

		// TODO: This is not used. Can we remove?

		result.length = length;

		int bufferIndex = (int) (offset >> BYTE_BLOCK_SHIFT);

Clean up ByteBlockPool #12506

Clean up ByteBlockPool #12506

Conversation

stefanvodita commented Aug 12, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

stefanvodita commented Sep 30, 2023

stefanvodita commented Oct 21, 2023

mikemccand commented Oct 21, 2023

mikemccand left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

iverase commented Oct 24, 2023

stefanvodita commented Oct 26, 2023

stefanvodita commented Oct 27, 2023

mikemccand commented Oct 29, 2023

stefanvodita commented Oct 29, 2023

stefanvodita commented Nov 6, 2023

iverase left a comment

Choose a reason for hiding this comment

mikemccand left a comment

Choose a reason for hiding this comment

stefanvodita commented Aug 12, 2023 •

edited

Loading