-
Notifications
You must be signed in to change notification settings - Fork 983
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LUCENE-10553: Fix WANDScorer's handling of 0 and +Infty. #860
Conversation
The computation of the scaling factor has special cases for these two values, but the current logic is backwards.
@@ -86,7 +86,6 @@ static int scalingFactor(float f) { | |||
* sure we do not miss any matches. | |||
*/ | |||
static long scaleMaxScore(float maxScore, int scalingFactor) { | |||
assert scalingFactor(maxScore) >= scalingFactor; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this assertion no longer hold? I'm a bit nervous about removing it, as it tripping was what alerted us to the bug in the first place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question, I'm a bit torn about it too and it's related to the comment that I updated. This assertion verifies that the max score of a sub clause over a range of doc IDs cannot be greater than the sum of the max scores of sub clauses over the entire doc ID range. However it is currently not illegal for a clause to return a higher score on a subset of doc IDs than on the entire doc ID space, so I was a bit nervous about keeping this assertion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However it is currently not illegal for a clause to return a higher score on a subset of doc IDs than on the entire doc ID space
OK. Maybe this is something we can think about restricting more in the future?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah. We wouldn't want to do this check at search time because these max scores are computed lazily based on lists of (freq,norm) pairs, but maybe we could add the restriction on the Scorer#getMaxScore API that max scores must be non decreasing as upTo increases.
The computation of the scaling factor has special cases for these two values, but the current logic is backwards.
* main: LUCENE-10532: remove @Slow annotation (apache#832) LUCENE-10312: Add PersianStemmer (apache#540) LUCENE-10558: Implement URL ctor to support classpath/module usage in Kuromoji and Nori dictionaries (main branch) (apache#871) LUCENE-10436: Reinstate public getdocValuesdocIdSetIterator method on DocValues (apache#869) Disable liftbot, we have our own tools LUCENE-10553: Fix WANDScorer's handling of 0 and +Infty. (apache#860) Make CONTRIBUTING.md a bit more succinct (apache#866) LUCENE-10504: KnnGraphTester to use KnnVectorQuery (apache#796) Add change line for LUCENE-9848 LUCENE-9848 Sort HNSW graph neighbors for construction (apache#862)
* main: (24 commits) LUCENE-10532: remove @Slow annotation (apache#832) LUCENE-10312: Add PersianStemmer (apache#540) LUCENE-10558: Implement URL ctor to support classpath/module usage in Kuromoji and Nori dictionaries (main branch) (apache#871) LUCENE-10436: Reinstate public getdocValuesdocIdSetIterator method on DocValues (apache#869) Disable liftbot, we have our own tools LUCENE-10553: Fix WANDScorer's handling of 0 and +Infty. (apache#860) Make CONTRIBUTING.md a bit more succinct (apache#866) LUCENE-10504: KnnGraphTester to use KnnVectorQuery (apache#796) Add change line for LUCENE-9848 LUCENE-9848 Sort HNSW graph neighbors for construction (apache#862) LUCENE-10524 Add benchmark suite details to CONTRIBUTING.md (apache#853) LUCENE-10552: KnnVectorQuery has incorrect equals/ hashCode (apache#859) LUCENE-10534: MinFloatFunction / MaxFloatFunction calls exists twice (apache#837) LUCENE-10188: Give SortedSetDocValues a docValueCount() (apache#663) Allow to link to github PR from changes (apache#854) LUCENE-10551: improve testing of LowercaseAsciiCompression (apache#858) LUCENE-10542: FieldSource exists implementations can avoid value retrieval (apache#847) LUCENE-10539: Return a stream of completions from FSTCompletion. (apache#844) gradle 7.3.3 quick upgrade (apache#856) LUCENE-10530: Avoid floating point precision bug in TestTaxonomyFacetAssociations (apache#848) ...
The computation of the scaling factor has special cases for these two values,
but the current logic is backwards.