Distinguish labels for calculating similarity scores #1124

julian-risch · 2021-06-02T07:59:00Z

Proposed changes:
The ranker now distinguishes predictions with label "0" (dissimilar) and label "1" (similar) when extracting the probability of the prediction (similarity score).
Before this change, the re-ranking gave wrong results when documents to be re-ranked had a probability larger than 0.5 of being dissimilar. This probability was wrongly assumed to be the similarity score.

tholor

Have we documented this requirement somewhere that the re-ranker model was trained on samples with label=1 for similar query + doc? If not, maybe worth adding to the docstring or usage page

Distinguish labels for calculating similarity scores

494e236

julian-risch requested a review from tholor June 2, 2021 09:33

tholor approved these changes Jun 2, 2021

View reviewed changes

Explain label "0" and "1" of TextPairClassifier in Ranker

3988d83

julian-risch merged commit 8e3d0d1 into master Jun 2, 2021

julian-risch deleted the fix-ranker-similarity-scores branch June 2, 2021 15:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Distinguish labels for calculating similarity scores #1124

Distinguish labels for calculating similarity scores #1124

julian-risch commented Jun 2, 2021

tholor left a comment

Distinguish labels for calculating similarity scores #1124

Distinguish labels for calculating similarity scores #1124

Conversation

julian-risch commented Jun 2, 2021

tholor left a comment

Choose a reason for hiding this comment