Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LUCENE-10553: Fix WANDScorer's handling of 0 and +Infty. #860

Merged
merged 1 commit into from
May 5, 2022

Conversation

jpountz
Copy link
Contributor

@jpountz jpountz commented May 2, 2022

The computation of the scaling factor has special cases for these two values,
but the current logic is backwards.

The computation of the scaling factor has special cases for these two values,
but the current logic is backwards.
@@ -86,7 +86,6 @@ static int scalingFactor(float f) {
* sure we do not miss any matches.
*/
static long scaleMaxScore(float maxScore, int scalingFactor) {
assert scalingFactor(maxScore) >= scalingFactor;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this assertion no longer hold? I'm a bit nervous about removing it, as it tripping was what alerted us to the bug in the first place.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question, I'm a bit torn about it too and it's related to the comment that I updated. This assertion verifies that the max score of a sub clause over a range of doc IDs cannot be greater than the sum of the max scores of sub clauses over the entire doc ID range. However it is currently not illegal for a clause to return a higher score on a subset of doc IDs than on the entire doc ID space, so I was a bit nervous about keeping this assertion.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However it is currently not illegal for a clause to return a higher score on a subset of doc IDs than on the entire doc ID space

OK. Maybe this is something we can think about restricting more in the future?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. We wouldn't want to do this check at search time because these max scores are computed lazily based on lists of (freq,norm) pairs, but maybe we could add the restriction on the Scorer#getMaxScore API that max scores must be non decreasing as upTo increases.

@romseygeek romseygeek merged commit 2630189 into apache:main May 5, 2022
asfgit pushed a commit that referenced this pull request May 5, 2022
The computation of the scaling factor has special cases for these two values,
but the current logic is backwards.
wjp719 added a commit to wjp719/lucene that referenced this pull request May 10, 2022
* main:
  LUCENE-10532: remove @Slow annotation (apache#832)
  LUCENE-10312: Add PersianStemmer (apache#540)
  LUCENE-10558: Implement URL ctor to support classpath/module usage in Kuromoji and Nori dictionaries (main branch) (apache#871)
  LUCENE-10436: Reinstate public getdocValuesdocIdSetIterator method on DocValues (apache#869)
  Disable liftbot, we have our own tools
  LUCENE-10553: Fix WANDScorer's handling of 0 and +Infty. (apache#860)
  Make CONTRIBUTING.md a bit more succinct (apache#866)
  LUCENE-10504: KnnGraphTester to use KnnVectorQuery (apache#796)
  Add change line for LUCENE-9848
  LUCENE-9848 Sort HNSW graph neighbors for construction (apache#862)
wjp719 added a commit to wjp719/lucene that referenced this pull request May 10, 2022
* main: (24 commits)
  LUCENE-10532: remove @Slow annotation (apache#832)
  LUCENE-10312: Add PersianStemmer (apache#540)
  LUCENE-10558: Implement URL ctor to support classpath/module usage in Kuromoji and Nori dictionaries (main branch) (apache#871)
  LUCENE-10436: Reinstate public getdocValuesdocIdSetIterator method on DocValues (apache#869)
  Disable liftbot, we have our own tools
  LUCENE-10553: Fix WANDScorer's handling of 0 and +Infty. (apache#860)
  Make CONTRIBUTING.md a bit more succinct (apache#866)
  LUCENE-10504: KnnGraphTester to use KnnVectorQuery (apache#796)
  Add change line for LUCENE-9848
  LUCENE-9848 Sort HNSW graph neighbors for construction (apache#862)
  LUCENE-10524 Add benchmark suite details to CONTRIBUTING.md (apache#853)
  LUCENE-10552: KnnVectorQuery has incorrect equals/ hashCode (apache#859)
  LUCENE-10534: MinFloatFunction / MaxFloatFunction calls exists twice (apache#837)
  LUCENE-10188: Give SortedSetDocValues a docValueCount() (apache#663)
  Allow to link to github PR from changes (apache#854)
  LUCENE-10551: improve testing of LowercaseAsciiCompression (apache#858)
  LUCENE-10542: FieldSource exists implementations can avoid value retrieval (apache#847)
  LUCENE-10539: Return a stream of completions from FSTCompletion. (apache#844)
  gradle 7.3.3 quick upgrade (apache#856)
  LUCENE-10530: Avoid floating point precision bug in TestTaxonomyFacetAssociations (apache#848)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants