Implement method to add all stream elements into a PriorityQueue. by vsop-479 · Pull Request #15823 · apache/lucene

vsop-479 · 2026-03-14T03:19:07Z

In DisjunctionMaxBulkScorer's constructor all values to compare is 0, so we can skip the shift.

…s constructor.

Copilot

Pull request overview

This PR introduces a specialized PriorityQueue insertion method that skips heap adjustment and uses it to optimize initialization of DisjunctionMaxBulkScorer, where all initial queue keys are equal (next == 0).

Changes:

Add PriorityQueue#addNoShift(T) to append elements without upHeap.
Use addNoShift when building the scorer queue in DisjunctionMaxBulkScorer’s constructor.
Minor local refactor in DisjunctionMaxBulkScorer.collect to reuse the computed delta.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.

File	Description
lucene/core/src/java/org/apache/lucene/util/PriorityQueue.java	Adds `addNoShift` as a new insertion API that bypasses heap restoration.
lucene/core/src/java/org/apache/lucene/search/DisjunctionMaxBulkScorer.java	Uses `addNoShift` during queue initialization; small `delta` reuse cleanup.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

  }

+  /**
+   * Adds an Object to a PriorityQueue without upHeap. Note: only use it when all elements have a


+   * Adds an Object to a PriorityQueue without upHeap. Note: only use it when all elements have a
+   * same value.


+    // TODO: Check element equalsTo lastElement, if we can get the comparator.
+
+    int index = size + 1;


+   * same value.
+   */
+  public final void addNoShift(T element) {
+    // TODO: Check element equalsTo lastElement, if we can get the comparator.


+   * Adds an Object to a PriorityQueue without upHeap. Note: only use it when all elements have a
+   * same value.
+   */
+  public final void addNoShift(T element) {


gsmiller · 2026-03-18T16:26:30Z

I'm a bit uncomfortable with adding a "no shift" API to PQ. It has potential to be pretty trappy. Is there a better way to get at what you're trying to do? Maybe we could leverage #addAll from the DisjunctionMaxBulkScorer ctor? I get that we don't need the downHeap operation if everything is actually equal, but it doesn't seem particularly expensive?

vsop-479 · 2026-03-19T03:10:48Z

Thanks for your reply @gsmiller !

but it doesn't seem particularly expensive?

No, it doesn't.

Maybe we could leverage #addAll from the DisjunctionMaxBulkScorer ctor?

I will try.

…rityQueue, and leverage it in DisjunctionMaxBulkScorer's constructor.

vsop-479 · 2026-03-19T05:55:32Z

Maybe we could leverage #addAll from the DisjunctionMaxBulkScorer ctor?

I used a variant: add an ElementSupplier to avoid duplicate loop.

gsmiller · 2026-03-19T14:58:45Z

I used an variant: add an ElementSupplier to avoid duplicate loop.

Oh, that works too. That's a creative way to avoid the comparison check on initialization :)

gsmiller

In general, I'm supportive of adding this but I'm a bit on-the-fence with whether-or-not it's worth expanding the API of PriorityQueue for this case. I'm leaning towards adding it though. It does seem like a nice-to-have feature for cases where calling code needs to do some "element conversion" when populating a PQ instance.

Added a few small comments. Thanks!

gsmiller · 2026-03-19T15:27:09Z

  }

+  /**
+   * Similar to {@link #addAll(Collection)}, but supply an {@link ElementSupplier} to convert origin


Maybe we could be more explicit in the documentation that the value of this version of addAll is for cases where you need to convert between element types since it doesn't require the caller to do the conversion at the call site?

I added more documentation, not sure it is explicit enough.

gsmiller · 2026-03-19T15:27:35Z

+   * Similar to {@link #addAll(Collection)}, but supply an {@link ElementSupplier} to convert origin
+   * element to target element. This is useful when source elements does not fit target elements.
+   */
+  public <S> void addAll(Collection<S> elements, ElementSupplier<T, S> elementSupplier) {


Can you add test coverage for this method?

dweiss · 2026-03-19T17:48:42Z

+   * Similar to {@link #addAll(Collection)}, but supply an {@link ElementSupplier} to convert origin
+   * element to target element. This is useful when source elements does not fit target elements.
+   */
+  public <S> void addAll(Collection<S> elements, ElementSupplier<T, S> elementSupplier) {


Wouldn't it be more natural for addAll to accept Iterable, followed by an utility method Iterables.transforming(Iterable, Function<Y,X>). Many libraries do this kind of view thing - we can't reuse them but it should be trivial to implement. Just a thought.

I use Guava's Iterables implemented it, it works too (transform in iterating). Here is the code(uncommitted):

public <S> void addAll( Collection<S> elements, com.google.common.base.Function<S, T> elementConverter) { if (this.size + elements.size() > this.maxSize) { throw new ArrayIndexOutOfBoundsException( "Cannot add " + elements.size() + " elements to a queue with remaining capacity: " + (maxSize - size)); } Iterable<T> transform = Iterables.transform(elements, elementConverter); // Heap with size S always takes first S elements of the array, // and thus it's safe to fill array further - no actual non-sentinel value will be overwritten. for (T element : transform) { this.heap[size + 1] = element; this.size++; } // The loop goes down to 1 as heap is 1-based not 0-based. for (int i = (size >>> 1); i >= 1; i--) { downHeap(i); } }

Many libraries do this kind of view thing - we can't reuse them but it should be trivial to implement.

If we prefer this way(I think current committed way is ok too:), do you mean we should implement lucene's Iterables, Iterators, etc. Ranther than use these libraries directly?

I suspect the idea here would be:

Not use a 3rd party library directly (we don't want that dependency in core)

Have an addAll method that simply takes a java.lang.Iterable as a single input but create our own implementation of Iterable that applies the transformation internally. Much like what Guava's Iterables#transform method does. This is probably something we'd put in core (maybe in o.a.l.util)? We could probably deprecate the current addAll method that takes a Collection in favor of this new API as well.

This approach makes a lot of sense to me, and I don't think it would be too much work to do. (Hopefully I'm understanding the suggestion correctly).

Not use a 3rd party library directly (we don't want that dependency in core).

Thanks for explicit that.

create our own implementation of Iterable that applies the transformation internally.

OK, I will try this approach.

… case.

vsop-479 · 2026-03-23T06:37:58Z

I implemented it with Iterables/Iterators (they only have transform function for now).

Please take a look when you get a chance @dweiss, @gsmiller .

I like the design of Iterables/Iterators. But, there are some notes i think we may care in Guava's Iterables:

Java 8+ users: several common uses for this class are now more comprehensively addressed by the new Stream library. Read the method documentation below for comparisons. This class is not being deprecated, but we gently encourage you to migrate to streams.

Note on Iterables#transform:

Stream equivalent: Stream.map.

So, I test it with Stream, it works too. The code with stream likes:

// Use stream to transform and iterate.
    elements.stream().map(elementTransformer).forEach(e -> {
      this.heap[size + 1] = e;
      this.size++;
    });

gsmiller · 2026-03-25T00:14:49Z

+   * param elements is different from the type of this {@link PriorityQueue}, since it doesn't
+   * require the caller to do the conversion at the call site.
+   */
+  public <S> void addAll(Collection<S> elements, Function<S, T> elementTransformer) {


So I think the proposed idea is a little different actually. The idea would be for this method signature to simply be:
public void addAll(Iterable<T> elements).

Then, from your calling code in DisjunctionMaxBulkScorer, you'd call like this:
this.scorers.addAll(Iterables.transform(scorers, BulkScorerAndNext::new));

The goal is that PriorityQueue doesn't need to know anything about the translation logic. That all gets encapsulated behind the Iterator that is setup by the calling code. WDYT?

The goal is that PriorityQueue doesn't need to know anything about the translation logic. That all gets encapsulated behind the Iterator that is setup by the calling code. WDYT?

+1. There is a minor issue with public void addAll(Iterable<T> elements), the original addAll validate the size with Collection#size firstly. If we want keep this validation, maybe the caller need input the size of elements? like this:public void addAll(Iterable<T> elements, int size)?

I wonder how important that validation is. It's an exceptional case (the caller really shouldn't be overflowing their PQ capacity). We'll hit the same AIOOB exception anyway in the for-each loop that's populating the array, so the outcome is really the same if you overflow the capacity, so I'm not sure that early check adds a ton of value. It's nice to do up-front, but I don't think I'd design the API around it. So I guess I don't think it's a problem to remove this up-front sizing check, but maybe you're considering something I'm not? Do you think it's critical? Thanks again for iterating!

We'll hit the same AIOOB exception anyway in the for-each loop that's populating the array

Indeed. That is what PriorityQueue#add do (when caller iterate their elements to call ). So i removed the validation and implmented public void addAll(Iterable<T> elements).

…able.

uschindler · 2026-03-26T07:51:29Z

Hi,
I don't think we should add those self made Spliterators and all this internal collection wrapper stuff.

What I generally do when I want a filtered Iterator: get a stream from the collection/iterable (for iterables there's a trick to get it, but I'd prefer a collection instead of Iterables). The use all stream filtering or mapping and finally wrap the thing into an iterator.

uschindler · 2026-03-26T07:52:45Z

So, -1 to add all this collection code that already has native support in jdk.

vsop-479 · 2026-03-26T08:03:44Z

Hi @uschindler , you prefer approach like this?

So, I test it with Stream, it works too. The code with stream likes:
// Use stream to transform and iterate.
elements.stream().map(elementTransformer).forEach(e -> {
this.heap[size + 1] = e;
this.size++;
});

uschindler · 2026-03-26T08:23:04Z

Yes, or instead of foreach call iterator()

vsop-479 · 2026-03-28T04:08:50Z

This is how it's used in the disj scorer, so only allow to initialize. Later calling addAll would lead to trouble.
So basically a factory method like: produce a PQ with that size and take the elements from that stream.
AddAll should not be allowed when heap is already populated because code may misuse it and fail suddenly.

@uschindler - I don't completely follow your comment about addAll only working on a newly-constructed/empty heap. Can you elaborate a bit? I believe both of the addAll methods (existing and this proposed one) should work correctly on a heap that already contains some elements, but I might be missing something? Or maybe I'm misunderstanding your point? Oh wait, after re-reading, maybe you're describing a different option that would appropriately size a new heap for a provided stream? One that would handle the overflow case by essentially allocating a bigger PQ?

I am trying to understand this too :)

uschindler · 2026-03-29T12:38:25Z

Hi,
Sorry. Add is also without overflow. So naming is fine.
In that case it's fine. Ignore my previous comment.
You can also revert the exception change. It's consistent with add, which also throws AIOOBE.

vsop-479 · 2026-03-29T13:29:30Z

I think there is a difference between add and addAll, with add the caller can use the element to compare to PQ’s top and maybe pop the top and reAdd the element after got an AIOOBE(maybe the normal usage is judge wether the queue is full firstly, and call updateTop?), but if we want addAll behave the same after AIOOBE, we need keep remaining elements in the stream, so the caller can add them intersect(maybe stream already works like this? I will verify it)

uschindler · 2026-03-29T16:52:58Z

That's not working with streams. If it works it is an implementation detail and could break with different types of streams.
After the exception the stream is by definition no longer useable. I have to lookup the spec. The correct reuse case would be if addAll would return the number of added entries.

Then one could redo the add with a new stream using stream.skip(return value). In that case it should not throw exception. It is all not easy to decide. For now I think the current impl is fine if we add enough documentation.

I figured out that add() also throws AIOOBE. Maybe we should keep the exception; I would remove the catch block and only keep the finally intact.

Uwe

uschindler · 2026-03-29T16:56:19Z

Quote from stream Java docs:

A stream should be operated on (invoking an intermediate or terminal stream operation) only once. This rules out, for example, "forked" streams, where the same source feeds two or more pipelines, or multiple traversals of the same stream. A stream implementation may throw IllegalStateException if it detects that the stream is being reused. However, since some stream operations may return their receiver rather than a new stream object, it may not be possible to detect reuse in all cases.

vsop-479 · 2026-03-30T03:23:17Z

Oh, I also noticed you're targeting 11.0 to release this. Is there any reason not to target 10.5 instead?

I figured out that add() also throws AIOOBE. Maybe we should keep the exception; I would remove the catch block and only keep the finally intact.

I moved the change entity to 10.5.0, and removed the catch block.

uschindler · 2026-03-30T08:19:34Z

+    // Heap with size S always takes first S elements of the array,
+    // and thus it's safe to fill array further - no actual non-sentinel value will be overwritten.
+    try {
+      elements.forEach(


Nitpick: Better to use forEachOrdered(). This makes practically no difference unless stream is parallel, but we should be correct here. Especially when we have the skip(long) mentioned in the javadcos.

gsmiller · 2026-03-30T15:07:03Z

This looks mostly good to me, thanks!

I left an earlier comment regarding consistency differences between this new addAll method and the existing addAll(Collection) method that I'd like to resolve (let's either, (a) decide the inconsistency is OK, (b) change the current addAll(Collection) behavior, or (c) open a follow-up issue to address it). The inconsistent behavior relates to how we handle an "overflow" case where more elements are provided than PQ capacity:

Current addAll(Collection) method: Detects the issue up-front and will not add any elements to the PQ.
New addAll(Stream) method: Because it cannot detect the issue up-front, it partially completes the operation by adding elements until capacity.

I'd propose: Modify the current addAll(Collection) method to be consistent with the newly added addAll(Stream) method (this is the only option for consistency). (I think it's OK to do this in a follow-up issue if we want to separate the changes).

I recognize this is fairly nitpicky, but I think it's helpful to maintain as much consistency as we can here. WDYT?

gsmiller · 2026-03-30T15:11:44Z

+      int left = i << 1;
+      int right = left + 1;
+      if (right < heapArray.length) {
+        assert (Integer) heapArray[i] <= (Integer) heapArray[right];


nit: This test will silently not work if asserts are disabled in the jvm. Let's use the idiomatic juint assert methods instead. I'd have this assertHeap method return a boolean then use assertTrue from the calling method.

gsmiller · 2026-03-30T15:13:32Z

+        () -> pq.addAll(elements.stream().map(String::hashCode)));
+    // Partly added.
+    assertEquals(10, pq.size());
+    assertHeap(pq);


This approach works, but you could also pop elements off the heap in a loop and assert the ordered nature of what comes out. It's nice that you avoid mutating the heap here with your check, but since it's the last thing we're doing in a unit test, it might be simpler to just pop stuff off the heap and check rather than writing custom heap traversal logic? I don't have strong opinions I suppose. Take-it-or-leave it :)

but you could also pop elements off the heap in a loop and assert the ordered nature of what comes out. It's nice that you avoid mutating the heap here with your check

Yes, pop and assert the order came to my mind firstly. And in order to avoid mutate the heap, I picked this apporach.

but since it's the last thing we're doing in a unit test, it might be simpler to just pop stuff off the heap and check rather than writing custom heap traversal logic?

But, you are right.

Yeah, totally fine by me either way :) Thanks!

uschindler

Making the two methods consistent if great! Thanks.

gsmiller · 2026-03-31T15:39:19Z

Thanks for making the two addAll methods consistent (and for the other minor revisions)! I'll go ahead and merge this since I don't see any other outstanding feedback. Thanks again!

…5823)

gsmiller · 2026-03-31T16:23:52Z

Oh and @vsop-479, just to respond to your earlier comment ("I will move it to 10.5. To be honest, I am not sure where it should be."), I would generally release a change in the next minor version unless it has to wait for the next major version because of something like a backwards compatibility issue. There more here if you're interested: https://cwiki.apache.org/confluence/display/LUCENE/BackwardsCompatibility

vsop-479 · 2026-04-01T02:14:02Z

Oh and @vsop-479, just to respond to your earlier comment ("I will move it to 10.5. To be honest, I am not sure where it should be."), I would generally release a change in the next minor version unless it has to wait for the next major version because of something like a backwards compatibility issue. There more here if you're interested: https://cwiki.apache.org/confluence/display/LUCENE/BackwardsCompatibility

Thanks, it is helpful to me. And thanks everyone for the review!

uschindler · 2026-04-01T06:56:01Z

I think there is a bit of backwards compatibility issue because the already existing Collection based version now behaves a bit different on exception. But I don't know if this is really an issue (does anybody adds element with a collection to an already full PQ)?

uschindler · 2026-04-01T06:59:29Z

I had a similar issue with changing an Exception type for the caller checks #15877, but I decided then: It's not an issue, because nobody is allowed the method from externally so the thrown exception won't matter, because there is no way to recover from that issue.

Here it is similar: If you hit an exception the difference is now that heap is sane afterwards, before it was not sane. So it is better afterwards. Anybody who tried to cleanup from it in the past would have made the heap sane, but doing that twice is not an issue.

vsop-479 · 2026-04-01T09:04:50Z

I think there is a bit of backwards compatibility issue because the already existing Collection based version now behaves a bit different on exception.

Oh, I missed this.

Here it is similar: If you hit an exception the difference is now that heap is sane afterwards, before it was not sane. So it is better afterwards. Anybody who tried to cleanup from it in the past would have made the heap sane, but doing that twice is not an issue.

+1.

gsmiller · 2026-04-01T19:21:34Z

@uschindler do you know what our policy generally is on @lucene.internal tagged classes w.r.t. backwards compatibility? My understanding was that we can be a little more flexibility within reason on these tagged classes. Either way, I still think we should bring this to 10.5 personally.

Add PriorityQueue#addNoShift, and use it in DisjunctionMaxBulkScorer'…

06d6c71

…s constructor.

github-actions Bot added the module:core/search label Mar 14, 2026

vsop-479 requested review from Copilot and jpountz March 14, 2026 03:27

Copilot started reviewing on behalf of vsop-479 March 14, 2026 03:27 View session

Copilot AI reviewed Mar 14, 2026

View reviewed changes

Implement method to convert and add all collection elements to a Prio…

19bd655

…rityQueue, and leverage it in DisjunctionMaxBulkScorer's constructor.

github-actions Bot added this to the 11.0.0 milestone Mar 19, 2026

Tidy.

41a436a

vsop-479 changed the title ~~Add PriorityQueue#addNoShift, and use it in DisjunctionMaxBulkScorer's constructor.~~ Implement method to convert and add all collection elements to a PriorityQueue, and leverage it in DisjunctionMaxBulkScorer's constructor. Mar 19, 2026

gsmiller reviewed Mar 19, 2026

View reviewed changes

dweiss reviewed Mar 19, 2026

View reviewed changes

vsop-479 added 4 commits March 20, 2026 11:23

Use Function instead of ElementSupplier; More documentation; Add test…

acc7e7e

… case.

Revert TestDisjunctionMaxQuery.

419778f

Init Iterables.

ad3564d

Add javadocs.

bf94767

Remove useless method.

0810374

gsmiller reviewed Mar 25, 2026

View reviewed changes

vsop-479 added 2 commits March 26, 2026 14:08

Change the input type of PriorityQueue#addAll from Collection to Iter…

e00b885

…able.

Update changes entity.

66f2269

vsop-479 added 3 commits March 30, 2026 10:33

Remove catch block.

2f9b0b5

More documentation.

5e137d5

Remove duplicate changes entity in 11.0.0.

84bf719

uschindler requested changes Mar 30, 2026

View reviewed changes

Use Stream#forEachOrdered.

5ff2fa7

uschindler approved these changes Mar 30, 2026

View reviewed changes

gsmiller reviewed Mar 30, 2026

View reviewed changes

vsop-479 added 2 commits March 31, 2026 09:56

Use junit assert.

560f4bb

Make two addAll methods behave consistent.

2054aa0

uschindler approved these changes Mar 31, 2026

View reviewed changes

gsmiller merged commit a68055a into apache:main Mar 31, 2026
13 checks passed

gsmiller modified the milestones: 11.0.0, 10.5.0 Mar 31, 2026

gsmiller pushed a commit that referenced this pull request Mar 31, 2026

Implement method to add all stream elements into a PriorityQueue. (#1…

8a23cee

…5823)

vsop-479 deleted the add_PriorityQueue#addNoShift branch April 1, 2026 01:38

txwei mentioned this pull request May 18, 2026

sync upstream 05 18 26 mongodb-forks/lucene-mongot#13

Closed

		* Adds an Object to a PriorityQueue without upHeap. Note: only use it when all elements have a
		* same value.

		// TODO: Check element equalsTo lastElement, if we can get the comparator.

		int index = size + 1;

Conversation

vsop-479 commented Mar 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

gsmiller commented Mar 18, 2026

Uh oh!

vsop-479 commented Mar 19, 2026

Uh oh!

vsop-479 commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gsmiller commented Mar 19, 2026

Uh oh!

gsmiller left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vsop-479 Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vsop-479 commented Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

uschindler commented Mar 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

uschindler commented Mar 26, 2026

Uh oh!

vsop-479 commented Mar 26, 2026

Uh oh!

uschindler commented Mar 26, 2026

Uh oh!

vsop-479 commented Mar 28, 2026

Uh oh!

uschindler commented Mar 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vsop-479 commented Mar 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

uschindler commented Mar 29, 2026

Uh oh!

uschindler commented Mar 29, 2026

Uh oh!

vsop-479 commented Mar 30, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gsmiller commented Mar 30, 2026

Uh oh!

vsop-479 commented Mar 14, 2026 •

edited

Loading

vsop-479 commented Mar 19, 2026 •

edited

Loading

vsop-479 Mar 20, 2026 •

edited

Loading

vsop-479 commented Mar 23, 2026 •

edited

Loading

uschindler commented Mar 26, 2026 •

edited

Loading

uschindler commented Mar 29, 2026 •

edited

Loading

vsop-479 commented Mar 29, 2026 •

edited

Loading