Skip to content

Conversation

@martijnvg
Copy link
Contributor

@martijnvg martijnvg commented Nov 3, 2025

When construction a new CompetitiveDISIBuilder, then check whether global min/max points or global min/max doc values skipper are comparative with the bottom. If so, then update competitiveIterator with an empty iterator, because no documents will have a value that is competitive with the current recorded bottom in the current segment.

Doing this at CompetitiveDISIBuilder construction is cheap and allows to immediately prune, instead of waiting until doUpdateCompetitiveIterator(...) is invoked.

I didn't add tests in this PR, because I think that TestSortOptimization provides sufficient coverage. But I'm happy to add more tests.

… with the recorded bottom.

When construction a new CompetitiveDISIBuilder, then check whether global min/max points or global min/max doc values skipper are comparative with the bottom.
If so, then update competitiveIterator with an empty iterator, because no documents will have a value that is competitive with the current recorded bottom in the current segment.

Doing this at CompetitiveDISIBuilder construction is cheap and allows to immediately prune, instead of waiting until doUpdateCompetitiveIterator(...) is invoked.
@martijnvg martijnvg added this to the 10.4.0 milestone Nov 3, 2025
@martijnvg martijnvg changed the title NumericComparator: immediately check whether a segment is comparative with the recorded bottom NumericComparator: immediately check whether a segment is competitive with the recorded bottom Nov 3, 2025
@github-actions
Copy link
Contributor

github-actions bot commented Nov 3, 2025

This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog label to it and you will stop receiving this reminder on future updates to the PR.

Copy link
Contributor

@romseygeek romseygeek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

Comment on lines 376 to 378
if (queueFull) {
long bottom = leafComparator.bottomAsComparableLong();
long minValue = sortableBytesToLong(pointValues.getMinPackedValue());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a nice optimization. I am wondering if we still need to check for missing point values?

// if some documents have missing points, check that missing values prohibits optimization
if (docCount() < maxDoc && isMissingValueCompetitive()) {
    return;
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, we still need it. Otherwise we would incorrectly ignore the docs with missing values is missing sort value is competitive. I pushed this commit: 2df7908

I also added a test for this case. The TestSortOptimization test suite didn't catch this problem also after running it for thousands of times . I think this is because in the test, a document without a field is encountered before the queue is full and then bottom is not competitive with missing value.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we also need to reach the hitsThreshold before setting the empty iterator?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, this hitsThreshold also needs to be taken into account: 63fe755

Copy link
Contributor

@jainankitk jainankitk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!

@martijnvg
Copy link
Contributor Author

Thanks for the review @jainankitk!

@martijnvg martijnvg merged commit efa5204 into apache:main Nov 6, 2025
12 checks passed
martijnvg added a commit to martijnvg/lucene that referenced this pull request Nov 6, 2025
…competitive with the recorded bottom

Backporting apache#15397 to branch_10x branch.

When construction a new CompetitiveDISIBuilder, then check whether global min/max points or global min/max doc values skipper are comparative with the bottom.
If so, then update competitiveIterator with an empty iterator, because no documents will have a value that is competitive with the current recorded bottom in the current segment.

Doing this at CompetitiveDISIBuilder construction is cheap and allows to immediately prune, instead of waiting until doUpdateCompetitiveIterator(...) is invoked.
martijnvg added a commit that referenced this pull request Nov 6, 2025
…competitive with the recorded bottom (#15410)

Backporting #15397 to branch_10x branch.

When construction a new CompetitiveDISIBuilder, then check whether global min/max points or global min/max doc values skipper are comparative with the bottom.
If so, then update competitiveIterator with an empty iterator, because no documents will have a value that is competitive with the current recorded bottom in the current segment.

Doing this at CompetitiveDISIBuilder construction is cheap and allows to immediately prune, instead of waiting until doUpdateCompetitiveIterator(...) is invoked.
@dweiss
Copy link
Contributor

dweiss commented Nov 6, 2025

Not sure if it's related but jenkins started failing after this commit.

@dweiss
Copy link
Contributor

dweiss commented Nov 6, 2025

Ok, I don't think it's related, may be just a coincidence. I checked before and after and this fails in both cases:

./gradlew :lucene:join:test --tests "org.apache.lucene.search.join.TestParentsChildrenBlockJoinQuery.testAdvance" -Ptests.asserts=true -Ptests.file.encoding=UTF-8 -Ptests.gui=true -Ptests.haltonfailure=false -Ptests.jvmargs= -Ptests.jvms=6 -Ptests.multiplier=3 -Ptests.seed=540C685E79D7974A -Ptests.vectorsize=512

@martijnvg
Copy link
Contributor Author

Quickly looking at the test and I don't think it is doing any sorting or using a numeric comparator. So I also don't think it is related to this change.

romseygeek added a commit to elastic/elasticsearch that referenced this pull request Nov 14, 2025
…ator (#138087)

We seem to get the pruning performance back on the stock lucene iterator
with apache/lucene#15397, which is also present in
our forked version of the code.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants