Introduce support for pruning and skipping to FirstPassGroupingCollector #15210

alexmm-amzn · 2025-09-22T12:52:19Z

Description

Extends the FirstPassGroupingCollector to support pruning (for numeric sort fields using competitiveIterator) and skipping of non-competitive documents (for relevance score sorting using Scorable#setMinCompetitiveScore).

Both optimizations are enabled automatically, thereby reducing the hit count of the collector if circumstances allow.

@jainankitk Are we fine with enabling this by default, or do we need this configurable (e.g. configurable hit threshold)?

Benchmark results using luceneutils for the TermBGroup1M scenario (combines first and second pass grouping) using a modified wikimedium.10M.nostopwords.tasks job. This scenario uses sort by relevance score.

> grep TermBGroup1M tasks/wikimedium500.tasks > tasks/wikimedium.10M.nostopwords.tasks
> python src/python/localrun.py -source wikimediumall

Running on m6a.2xlarge using Corretto 24:

                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
                        PKLookup      161.09     (12.8%)      156.56     (11.2%)   -2.8% ( -23% -   24%) 0.460
                    TermBGroup1M       11.47     (14.8%)       13.44     (13.7%)   17.1% (  -9% -   53%) 0.000

=> ~17% overall performance improvement (first+second pass).

@jpountz I'm getting some rare test failures for TestGrouping caused by the assert canSetMinCompetitiveScore assertion in AssertingScorer#setMinCompetitiveScore, even though the FirstPassGroupingCollector uses ScoreMode.TOP_SCORES in all configurations when it calls Scorable#setMinCompetitiveScore. Is this a known issue?

Reproduce with: gradlew test --tests TestGrouping.testRandom -Dtests.seed=EC2EC279F564DD82 -Dtests.locale=de-AT -Dtests.timezone=America/St_Thomas -Dtests.asserts=true -Dtests.file.encoding=UTF-8

edit//Seems to be caused by the Weight that gets instantiated by the unit tests with either ScoreMode.COMPLETE or ScoreMode.COMPLETE_NO_SCORES regardless of the actual collectors. I updated the code to instantiate a new Weight instance for every collector that is in line with the collector ScoreMode.

github-actions · 2025-09-22T12:53:11Z

This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog label to it and you will stop receiving this reminder on future updates to the PR.

…roupingCollector (apache#15136) Also adjust CachingCollector behavior to disable minimum competitive scores to allow caching the exhaustive list of hits.

github-actions · 2025-09-23T16:46:30Z

This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog label to it and you will stop receiving this reminder on future updates to the PR.

… collectors for the Weight (apache#15136)

github-actions · 2025-10-15T00:25:39Z

This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the dev@lucene.apache.org list. Thank you for your contribution!

jainankitk

Mostly looks correct to me. Will wait for couple of days, in case, someone has concerns with this change!

jainankitk · 2025-10-27T23:47:01Z

lucene/grouping/src/java/org/apache/lucene/search/grouping/FirstPassGroupingCollector.java


+      final Pruning pruning;
+      if (i == 0) {
+        pruning = compIDXEnd >= 0 ? Pruning.GREATER_THAN : Pruning.GREATER_THAN_OR_EQUAL_TO;


Isn't pruning always equal to Pruning.GREATER_THAN since compIDXEnd = sortFields.length - 1?

jainankitk · 2025-10-27T23:52:33Z

lucene/grouping/src/java/org/apache/lucene/search/grouping/FirstPassGroupingCollector.java

+      scoreMode = groupSort.needsScores() ? ScoreMode.TOP_DOCS_WITH_SCORES : ScoreMode.TOP_DOCS;
+      canSetMinScore = false;


We don't need ScoreMode.COMPLETE / ScoreMode.COMPLETE_NO_SCORES here, since we don't have totalHitsThreshold like parameter?

github-actions · 2025-11-11T00:27:31Z

This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the dev@lucene.apache.org list. Thank you for your contribution!

github-actions bot added the module:grouping label Sep 22, 2025

alexmm-amzn added 2 commits September 23, 2025 18:41

Add pruning support for FirstPassGroupingCollector (apache#15136)

3cfd7aa

Add support for skipping over non-competitive documents in FirstPassG…

e618ab3

…roupingCollector (apache#15136) Also adjust CachingCollector behavior to disable minimum competitive scores to allow caching the exhaustive list of hits.

alexmm-amzn closed this Sep 23, 2025

alexmm-amzn deleted the pr/15136-2 branch September 23, 2025 16:41

alexmm-amzn reopened this Sep 23, 2025

alexmm-amzn force-pushed the pr/15136-2 branch from 70a5f38 to ef23cb1 Compare September 23, 2025 16:45

github-actions bot added the module:core/search label Sep 23, 2025

github-actions bot added this to the 11.0.0 milestone Sep 23, 2025

Fix TestGrouping unit tests to use ScoreMode that is aligned with the…

6c29e6f

… collectors for the Weight (apache#15136)

alexmm-amzn force-pushed the pr/15136-2 branch from ef23cb1 to db5d3dd Compare September 23, 2025 16:51

Update CHANGES.txt (apache#15136)

74417a2

alexmm-amzn force-pushed the pr/15136-2 branch from db5d3dd to 74417a2 Compare September 23, 2025 17:16

github-actions bot added the Stale label Oct 15, 2025

jainankitk approved these changes Oct 27, 2025

View reviewed changes

github-actions bot removed the Stale label Oct 28, 2025

github-actions bot added the Stale label Nov 11, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Introduce support for pruning and skipping to FirstPassGroupingCollector #15210

Introduce support for pruning and skipping to FirstPassGroupingCollector #15210

alexmm-amzn commented Sep 22, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Sep 22, 2025

Uh oh!

github-actions bot commented Sep 23, 2025

Uh oh!

github-actions bot commented Oct 15, 2025

Uh oh!

jainankitk left a comment

Uh oh!

jainankitk Oct 27, 2025

Uh oh!

jainankitk Oct 27, 2025

Uh oh!

github-actions bot commented Nov 11, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		scoreMode = groupSort.needsScores() ? ScoreMode.TOP_DOCS_WITH_SCORES : ScoreMode.TOP_DOCS;
		canSetMinScore = false;

Introduce support for pruning and skipping to FirstPassGroupingCollector #15210

Are you sure you want to change the base?

Introduce support for pruning and skipping to FirstPassGroupingCollector #15210

Conversation

alexmm-amzn commented Sep 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Uh oh!

github-actions bot commented Sep 22, 2025

Uh oh!

github-actions bot commented Sep 23, 2025

Uh oh!

github-actions bot commented Oct 15, 2025

Uh oh!

jainankitk left a comment

Choose a reason for hiding this comment

Uh oh!

jainankitk Oct 27, 2025

Choose a reason for hiding this comment

Uh oh!

jainankitk Oct 27, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Nov 11, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

alexmm-amzn commented Sep 22, 2025 •

edited

Loading