Scorer should sum up scores into a double #12682

shubhamvishu · 2023-10-15T05:38:55Z

Description

Addresses #12675 . Along with MultiSimilarity.MultiSimScorer found some others candidate scorer implementations for this fix.

jpountz

Thanks for looking into it! I suggested not doing 2 changes that you suggested, but the 2 other ones look good to me.

jpountz · 2023-10-17T07:43:08Z

lucene/core/src/java/org/apache/lucene/search/ReqOptSumScorer.java

@@ -266,7 +265,7 @@ public float score() throws IOException {
      score += optScorer.score();
    }

-    return score;
+    return (float) score;


Actually your change doesn't help here since this sums up two floats at most and summing up two floats is already guaranteed to be as accurate as a float can be. Let's revert changes on this file?

Makes sense to me! I think in that case we should remove the TODO as well?

I think that the TODO still makes sense, it refers to BS1 being able to handle a mix of MUST and SHOULD clauses. If it happened, then it could have more than 2 clauses so casting into a double would make sense.

Sure, I'll keep it then

jpountz · 2023-10-17T07:50:37Z

lucene/core/src/java/org/apache/lucene/search/similarities/TFIDFSimilarity.java

      float normValue = normTable[(int) (norm & 0xFF)];
-      return raw * normValue; // normalize for field
+      return (float) (raw * normValue); // normalize for field


Likewise here, float multiplication is already guaranteed to give a result that is as accurate as a float can give.

One could argue that we could get more accuracy by casting into a double before multiplying in the first multiplication, ie. final double raw = (double) tf(freq) * queryWeight;. But I don't think we should do it as similarity scores are a bit fuzzy by nature, and this would very unlikely improve ranking effectiveness. The main reason why we cast into doubles when summing up scores in not really to get better accuracy, but more so that the other in which clauses are evaluated doesn't have an impact on the final score.

Let's revert changes on this file as well?

I see. Thanks for clarifying! I'll revert changes to this file too.

shubhamvishu · 2023-10-17T09:02:18Z

Thanks @jpountz for the review! I have addressed the comments in the new revision.

jpountz

LGTM

shubhamvishu · 2023-10-20T15:04:12Z

Thanks for the approval @jpountz !

### Description Addresses #12675 . Along with `MultiSimilarity.MultiSimScorer` found some others candidate scorer implementations for this fix.

shubhamvishu mentioned this pull request Oct 15, 2023

MultiSimilarity.MultiSimScorer should sum up scores into a double #12675

Closed

shubhamvishu changed the title ~~Scorer's should sum up scores into a double~~ Scorer should sum up scores into a double Oct 16, 2023

jpountz requested changes Oct 17, 2023

View reviewed changes

shubhamvishu force-pushed the fix-score-sum branch from e471943 to c2f090f Compare October 17, 2023 09:01

shubhamvishu requested a review from jpountz October 17, 2023 09:02

Scorer should sum up scores into a double

36f446a

shubhamvishu force-pushed the fix-score-sum branch from c2f090f to 36f446a Compare October 17, 2023 09:05

jpountz approved these changes Oct 20, 2023

View reviewed changes

benwtrent merged commit 5461d1a into apache:main Oct 23, 2023
4 checks passed

benwtrent pushed a commit that referenced this pull request Oct 23, 2023

Scorer should sum up scores into a double (#12682)

1865ffe

### Description Addresses #12675 . Along with `MultiSimilarity.MultiSimScorer` found some others candidate scorer implementations for this fix.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scorer should sum up scores into a double #12682

Scorer should sum up scores into a double #12682

shubhamvishu commented Oct 15, 2023

jpountz left a comment

jpountz Oct 17, 2023

shubhamvishu Oct 17, 2023

jpountz Oct 17, 2023

shubhamvishu Oct 17, 2023

jpountz Oct 17, 2023

shubhamvishu Oct 17, 2023

shubhamvishu commented Oct 17, 2023

jpountz left a comment

shubhamvishu commented Oct 20, 2023

Scorer should sum up scores into a double #12682

Scorer should sum up scores into a double #12682

Conversation

shubhamvishu commented Oct 15, 2023

Description

jpountz left a comment

Choose a reason for hiding this comment

jpountz Oct 17, 2023

Choose a reason for hiding this comment

shubhamvishu Oct 17, 2023

Choose a reason for hiding this comment

jpountz Oct 17, 2023

Choose a reason for hiding this comment

shubhamvishu Oct 17, 2023

Choose a reason for hiding this comment

jpountz Oct 17, 2023

Choose a reason for hiding this comment

shubhamvishu Oct 17, 2023

Choose a reason for hiding this comment

shubhamvishu commented Oct 17, 2023

jpountz left a comment

Choose a reason for hiding this comment

shubhamvishu commented Oct 20, 2023