Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LUCENE-10555: avoid NumericLeafComparator#iteratorCost repeated initialization when NumericLeafComparator#setScorer is called #864

Merged
merged 4 commits into from
May 10, 2022

Conversation

wjp719
Copy link
Contributor

@wjp719 wjp719 commented May 3, 2022

Elasticsearch use CancellableBulkScorer to fast cancel long time query execution by splitting one segment docs to many doc sets. When users search a topN query, for every split doc sets, TopFieldLeafCollector#setScorer is called, then NumericLeafComparator#setScorer is called successively. As a result, for one segment, NumericLeafComparator#setScorer is called many times.

Every time NumericLeafComparator#setScorer is called, the NumericLeafComparator#iteratorCost is reset to the Scorer.cost and will increase many unnecessary pointValues#intersect calls to get competitive docIds. This result in performance degradation

This pr checks NumericLeafComparator#setScorer to be called only once for one segment

Checklist

Please review the following and check all that apply:

  • I have reviewed the guidelines for How to Contribute and my code conforms to the standards described there to the best of my ability.
  • I have given Lucene maintainers access to contribute to my PR branch. (optional but recommended)
  • I have developed this patch against the main branch.
  • I have run ./gradlew check.
  • I have added tests for my changes.

@wjp719 wjp719 changed the title LUCENE-10555: avoid repeated NumericLeafComparator setScorer calls LUCENE-10555: avoid repeated NumericLeafComparator#setScorer calls May 3, 2022
@wjp719
Copy link
Contributor Author

wjp719 commented May 6, 2022

@jpountz Hi, can you help to review this pr? thanks

Copy link
Contributor

@jpountz jpountz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW this is not specific to this Elasticsearch bulk scorer, BooleanScorer does the same thing. Your change makes sense to me, I left a minor nitpick. Can you add a CHANGES entry under 9.2?

* main:
  LUCENE-10532: remove @Slow annotation (apache#832)
  LUCENE-10312: Add PersianStemmer (apache#540)
  LUCENE-10558: Implement URL ctor to support classpath/module usage in Kuromoji and Nori dictionaries (main branch) (apache#871)
  LUCENE-10436: Reinstate public getdocValuesdocIdSetIterator method on DocValues (apache#869)
  Disable liftbot, we have our own tools
  LUCENE-10553: Fix WANDScorer's handling of 0 and +Infty. (apache#860)
  Make CONTRIBUTING.md a bit more succinct (apache#866)
  LUCENE-10504: KnnGraphTester to use KnnVectorQuery (apache#796)
  Add change line for LUCENE-9848
  LUCENE-9848 Sort HNSW graph neighbors for construction (apache#862)
@wjp719 wjp719 changed the title LUCENE-10555: avoid repeated NumericLeafComparator#setScorer calls LUCENE-10555: avoid NumericLeafComparator#iteratorCost repeated initialization when NumericLeafComparator#setScorer is called May 10, 2022
@wjp719
Copy link
Contributor Author

wjp719 commented May 10, 2022

@jpountz Hi, I have add the change entry. please review again, thanks

@wjp719 wjp719 requested a review from jpountz May 10, 2022 08:29
@jpountz jpountz merged commit f431511 into apache:main May 10, 2022
@wjp719 wjp719 deleted the feature/avoid_repeated_set_scorer branch May 10, 2022 11:45
jpountz pushed a commit that referenced this pull request May 10, 2022
…alization when NumericLeafComparator#setScorer is called (#864)
wjp719 added a commit to wjp719/lucene that referenced this pull request May 11, 2022
* main:
  Fix rare test failures in TestSortOptimization.
  fix bkd test logic error and doc error (apache#863)
  LUCENE-10555: avoid NumericLeafComparator#iteratorCost repeated initialization   when NumericLeafComparator#setScorer is called (apache#864)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants