Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scorer should sum up scores into a double #12682

Merged
merged 1 commit into from
Oct 23, 2023

Conversation

shubhamvishu
Copy link
Contributor

Description

Addresses #12675 . Along with MultiSimilarity.MultiSimScorer found some others candidate scorer implementations for this fix.

@shubhamvishu shubhamvishu changed the title Scorer's should sum up scores into a double Scorer should sum up scores into a double Oct 16, 2023
Copy link
Contributor

@jpountz jpountz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for looking into it! I suggested not doing 2 changes that you suggested, but the 2 other ones look good to me.

@@ -266,7 +265,7 @@ public float score() throws IOException {
score += optScorer.score();
}

return score;
return (float) score;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually your change doesn't help here since this sums up two floats at most and summing up two floats is already guaranteed to be as accurate as a float can be. Let's revert changes on this file?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me! I think in that case we should remove the TODO as well?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that the TODO still makes sense, it refers to BS1 being able to handle a mix of MUST and SHOULD clauses. If it happened, then it could have more than 2 clauses so casting into a double would make sense.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I'll keep it then

float normValue = normTable[(int) (norm & 0xFF)];
return raw * normValue; // normalize for field
return (float) (raw * normValue); // normalize for field
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Likewise here, float multiplication is already guaranteed to give a result that is as accurate as a float can give.

One could argue that we could get more accuracy by casting into a double before multiplying in the first multiplication, ie. final double raw = (double) tf(freq) * queryWeight;. But I don't think we should do it as similarity scores are a bit fuzzy by nature, and this would very unlikely improve ranking effectiveness. The main reason why we cast into doubles when summing up scores in not really to get better accuracy, but more so that the other in which clauses are evaluated doesn't have an impact on the final score.

Let's revert changes on this file as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Thanks for clarifying! I'll revert changes to this file too.

@shubhamvishu
Copy link
Contributor Author

Thanks @jpountz for the review! I have addressed the comments in the new revision.

Copy link
Contributor

@jpountz jpountz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@shubhamvishu
Copy link
Contributor Author

Thanks for the approval @jpountz !

@benwtrent benwtrent merged commit 5461d1a into apache:main Oct 23, 2023
4 checks passed
benwtrent pushed a commit that referenced this pull request Oct 23, 2023
### Description

Addresses #12675 . Along with `MultiSimilarity.MultiSimScorer` found some others candidate scorer implementations for this fix.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants