LUCENE-10553: Fix WANDScorer's handling of 0 and +Infty. #860

jpountz · 2022-05-02T13:35:25Z

The computation of the scaling factor has special cases for these two values,
but the current logic is backwards.

The computation of the scaling factor has special cases for these two values, but the current logic is backwards.

romseygeek · 2022-05-03T07:41:04Z

lucene/core/src/java/org/apache/lucene/search/WANDScorer.java

@@ -86,7 +86,6 @@ static int scalingFactor(float f) {
   * sure we do not miss any matches.
   */
  static long scaleMaxScore(float maxScore, int scalingFactor) {
-    assert scalingFactor(maxScore) >= scalingFactor;


Does this assertion no longer hold? I'm a bit nervous about removing it, as it tripping was what alerted us to the bug in the first place.

Good question, I'm a bit torn about it too and it's related to the comment that I updated. This assertion verifies that the max score of a sub clause over a range of doc IDs cannot be greater than the sum of the max scores of sub clauses over the entire doc ID range. However it is currently not illegal for a clause to return a higher score on a subset of doc IDs than on the entire doc ID space, so I was a bit nervous about keeping this assertion.

However it is currently not illegal for a clause to return a higher score on a subset of doc IDs than on the entire doc ID space

OK. Maybe this is something we can think about restricting more in the future?

Yeah. We wouldn't want to do this check at search time because these max scores are computed lazily based on lists of (freq,norm) pairs, but maybe we could add the restriction on the Scorer#getMaxScore API that max scores must be non decreasing as upTo increases.

The computation of the scaling factor has special cases for these two values, but the current logic is backwards.

* main: LUCENE-10532: remove @Slow annotation (apache#832) LUCENE-10312: Add PersianStemmer (apache#540) LUCENE-10558: Implement URL ctor to support classpath/module usage in Kuromoji and Nori dictionaries (main branch) (apache#871) LUCENE-10436: Reinstate public getdocValuesdocIdSetIterator method on DocValues (apache#869) Disable liftbot, we have our own tools LUCENE-10553: Fix WANDScorer's handling of 0 and +Infty. (apache#860) Make CONTRIBUTING.md a bit more succinct (apache#866) LUCENE-10504: KnnGraphTester to use KnnVectorQuery (apache#796) Add change line for LUCENE-9848 LUCENE-9848 Sort HNSW graph neighbors for construction (apache#862)

* main: (24 commits) LUCENE-10532: remove @Slow annotation (apache#832) LUCENE-10312: Add PersianStemmer (apache#540) LUCENE-10558: Implement URL ctor to support classpath/module usage in Kuromoji and Nori dictionaries (main branch) (apache#871) LUCENE-10436: Reinstate public getdocValuesdocIdSetIterator method on DocValues (apache#869) Disable liftbot, we have our own tools LUCENE-10553: Fix WANDScorer's handling of 0 and +Infty. (apache#860) Make CONTRIBUTING.md a bit more succinct (apache#866) LUCENE-10504: KnnGraphTester to use KnnVectorQuery (apache#796) Add change line for LUCENE-9848 LUCENE-9848 Sort HNSW graph neighbors for construction (apache#862) LUCENE-10524 Add benchmark suite details to CONTRIBUTING.md (apache#853) LUCENE-10552: KnnVectorQuery has incorrect equals/ hashCode (apache#859) LUCENE-10534: MinFloatFunction / MaxFloatFunction calls exists twice (apache#837) LUCENE-10188: Give SortedSetDocValues a docValueCount() (apache#663) Allow to link to github PR from changes (apache#854) LUCENE-10551: improve testing of LowercaseAsciiCompression (apache#858) LUCENE-10542: FieldSource exists implementations can avoid value retrieval (apache#847) LUCENE-10539: Return a stream of completions from FSTCompletion. (apache#844) gradle 7.3.3 quick upgrade (apache#856) LUCENE-10530: Avoid floating point precision bug in TestTaxonomyFacetAssociations (apache#848) ...

LUCENE-10553: Fix WANDScorer's handling of 0 and +Infty.

d5b49ff

The computation of the scaling factor has special cases for these two values, but the current logic is backwards.

romseygeek reviewed May 3, 2022

View reviewed changes

romseygeek approved these changes May 4, 2022

View reviewed changes

romseygeek merged commit 2630189 into apache:main May 5, 2022

asfgit pushed a commit that referenced this pull request May 5, 2022

LUCENE-10553: Fix WANDScorer's handling of 0 and +Infty. (#860)

efa5d6f

The computation of the scaling factor has special cases for these two values, but the current logic is backwards.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LUCENE-10553: Fix WANDScorer's handling of 0 and +Infty. #860

LUCENE-10553: Fix WANDScorer's handling of 0 and +Infty. #860

jpountz commented May 2, 2022

romseygeek May 3, 2022

jpountz May 4, 2022

romseygeek May 4, 2022

jpountz May 4, 2022

LUCENE-10553: Fix WANDScorer's handling of 0 and +Infty. #860

LUCENE-10553: Fix WANDScorer's handling of 0 and +Infty. #860

Conversation

jpountz commented May 2, 2022

romseygeek May 3, 2022

Choose a reason for hiding this comment

jpountz May 4, 2022

Choose a reason for hiding this comment

romseygeek May 4, 2022

Choose a reason for hiding this comment

jpountz May 4, 2022

Choose a reason for hiding this comment