-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Introduce support for pruning and skipping to FirstPassGroupingCollector #15210
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog label to it and you will stop receiving this reminder on future updates to the PR. |
…roupingCollector (apache#15136) Also adjust CachingCollector behavior to disable minimum competitive scores to allow caching the exhaustive list of hits.
70a5f38 to
ef23cb1
Compare
|
This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog label to it and you will stop receiving this reminder on future updates to the PR. |
… collectors for the Weight (apache#15136)
ef23cb1 to
db5d3dd
Compare
db5d3dd to
74417a2
Compare
|
This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the dev@lucene.apache.org list. Thank you for your contribution! |
jainankitk
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly looks correct to me. Will wait for couple of days, in case, someone has concerns with this change!
|
|
||
| final Pruning pruning; | ||
| if (i == 0) { | ||
| pruning = compIDXEnd >= 0 ? Pruning.GREATER_THAN : Pruning.GREATER_THAN_OR_EQUAL_TO; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't pruning always equal to Pruning.GREATER_THAN since compIDXEnd = sortFields.length - 1?
| scoreMode = groupSort.needsScores() ? ScoreMode.TOP_DOCS_WITH_SCORES : ScoreMode.TOP_DOCS; | ||
| canSetMinScore = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need ScoreMode.COMPLETE / ScoreMode.COMPLETE_NO_SCORES here, since we don't have totalHitsThreshold like parameter?
|
This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the dev@lucene.apache.org list. Thank you for your contribution! |
Description
Extends the
FirstPassGroupingCollectorto support pruning (for numeric sort fields usingcompetitiveIterator) and skipping of non-competitive documents (for relevance score sorting usingScorable#setMinCompetitiveScore).Both optimizations are enabled automatically, thereby reducing the hit count of the collector if circumstances allow.
@jainankitk Are we fine with enabling this by default, or do we need this configurable (e.g. configurable hit threshold)?
Benchmark results using
luceneutilsfor theTermBGroup1Mscenario (combines first and second pass grouping) using a modifiedwikimedium.10M.nostopwords.tasksjob. This scenario uses sort by relevance score.Running on
m6a.2xlargeusing Corretto 24:=> ~17% overall performance improvement (first+second pass).
@jpountz I'm getting some rare test failures for
TestGroupingcaused by theassert canSetMinCompetitiveScoreassertion inAssertingScorer#setMinCompetitiveScore, even though theFirstPassGroupingCollectorusesScoreMode.TOP_SCORESin all configurations when it callsScorable#setMinCompetitiveScore. Is this a known issue?Reproduce with:
gradlew test --tests TestGrouping.testRandom -Dtests.seed=EC2EC279F564DD82 -Dtests.locale=de-AT -Dtests.timezone=America/St_Thomas -Dtests.asserts=true -Dtests.file.encoding=UTF-8edit//Seems to be caused by the
Weightthat gets instantiated by the unit tests with eitherScoreMode.COMPLETEorScoreMode.COMPLETE_NO_SCORESregardless of the actual collectors. I updated the code to instantiate a newWeightinstance for every collector that is in line with the collectorScoreMode.