Improve cost estimation in SortedSetDocValuesRangeQuery when using DocValuesSkipper by iverase · Pull Request #16061 · apache/lucene

iverase · 2026-05-13T12:14:05Z

I noticed that the current implementation always return values.cost() because we are delaying the construction of the iterator until the last moment. This is because we need to do some lookups for resolving the ordinals of the query.

Still when there is a skipper and the index is dense and used as primary sort, we have a very fast way to compute the iterator and the exact cost so let's move that conditions when building the score supplier. This is inline to what we are doing in SortedNumericDocValuesRangeQuery.

…cValuesSkipper

romseygeek · 2026-05-13T12:22:29Z

Might be me not grokking things properly but I don't see where this is changing the cost() implementation?

iverase · 2026-05-13T12:27:59Z

          final DocIdSetIterator psIterator =
              getDocIdSetIteratorForDensePrimarySort(
                  context.reader(), singleton, skipper, minOrd, maxOrd);
          return ConstantScoreScorerSupplier.fromIterator(
              psIterator, score(), scoreMode, context.reader().maxDoc());

This supplier will use the cost of the iterator which will be maxDocId - minDocId.

romseygeek · 2026-05-13T12:45:58Z

+                score(), scoreMode, context.reader().maxDoc());
+          }
+          final DocIdSetIterator psIterator =
+              getDocIdSetIteratorForDensePrimarySort(


This is the bit that is called out as expensive below so I'm not sure if we should be doing it in the scorerSupplier call directly? A couple of weeks back we were discussing ways of improving cost estimates using skippers, so I wonder if that's a better way to go?

I think that comment predates skippers and it is because the resolution of the ordinals. But sure, if we don't wan to to do any IO during the construction of the supplier we shouldn't do this.

My question is, should we then modify SortedNumericDocValuesRangeQuery?

I have been thinking how to estimate the cost inside the ScorerSupplier without having to create the full iterator. The idea is that in cost we will only visit the skipper and estimate the cost, similar to what we do in PointValues. When building the iterator we will visit the doc values if necessary.

We need to keep a bit of state around but it is not too bad, let me know what you think.

I like this a lot! I wonder if its possible to use SkipBlockRangeIterator to simplify the code that finds bounds a bit?

I think we want to use the skipper directly here, I noticed we are being a bit inefficient here. For example:

skipper.advance(minOrd, Long.MAX_VALUE); skipperMinDocId = skipper.minDocID(0); skipperMinDocIdExact = false;

we are visiting always the doc values but it can be change by:

skipper.advance(minOrd, Long.MAX_VALUE); skipperMinDocId = skipper.minDocID(0); skipperMinDocIdExact = skipper.minValue(0) == minOrd;

So we only visit the doc values if the block does not start from the minOrd. This should be true very often because of the way we are building the blocks. There are other tricks from the maxOrd too!

I pushed the optimizations I was thinking about in 2d89064

romseygeek

LGTM

…cValuesSkipper (#16061)

Improve cost estimation in SortedSetDocValuesRangeQuery when using Do…

3a9ea6d

…cValuesSkipper

iverase added this to the 10.5.0 milestone May 13, 2026

iverase requested a review from romseygeek May 13, 2026 12:14

github-actions Bot added the module:core/other label May 13, 2026

iverase mentioned this pull request May 13, 2026

BooleanQuery: narrow bulk scoring when FILTER matches primary index sort #15991

Closed

romseygeek reviewed May 13, 2026

View reviewed changes

iverase added 4 commits May 14, 2026 08:08

Approach to compute cost inside the ScorerSupplier

e86e8a2

iter

beabac2

rename variable

f1802d1

optimize

2d89064

github-actions Bot added module:core/search and removed module:core/other labels May 14, 2026

iter

dcc6a15

romseygeek reviewed May 14, 2026

View reviewed changes

Comment thread lucene/core/src/test/org/apache/lucene/search/TestDocValuesQueries.java

iverase added 3 commits May 14, 2026 12:31

test cost too

c189216

doh

32d02ab

bit better

48f8369

romseygeek approved these changes May 14, 2026

View reviewed changes

Add CHANGES.txt

e5d7836

iverase merged commit 091b35a into apache:main May 14, 2026
13 checks passed

iverase deleted the SortedSetDocValuesRangeQuery branch May 14, 2026 15:13

iverase added a commit that referenced this pull request May 14, 2026

Improve cost estimation in SortedSetDocValuesRangeQuery when using Do…

0206215

…cValuesSkipper (#16061)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve cost estimation in SortedSetDocValuesRangeQuery when using DocValuesSkipper#16061

Improve cost estimation in SortedSetDocValuesRangeQuery when using DocValuesSkipper#16061
iverase merged 10 commits into
apache:mainfrom
iverase:SortedSetDocValuesRangeQuery

iverase commented May 13, 2026

Uh oh!

romseygeek commented May 13, 2026

Uh oh!

iverase commented May 13, 2026

Uh oh!

romseygeek May 13, 2026

Uh oh!

iverase May 13, 2026

Uh oh!

iverase May 14, 2026

Uh oh!

romseygeek May 14, 2026

Uh oh!

iverase May 14, 2026 •

edited

Loading

Uh oh!

iverase May 14, 2026

Uh oh!

Uh oh!

romseygeek left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

iverase commented May 13, 2026

Uh oh!

romseygeek commented May 13, 2026

Uh oh!

iverase commented May 13, 2026

Uh oh!

romseygeek May 13, 2026

Choose a reason for hiding this comment

Uh oh!

iverase May 13, 2026

Choose a reason for hiding this comment

Uh oh!

iverase May 14, 2026

Choose a reason for hiding this comment

Uh oh!

romseygeek May 14, 2026

Choose a reason for hiding this comment

Uh oh!

iverase May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

iverase May 14, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

romseygeek left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

iverase May 14, 2026 •

edited

Loading