Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Removing probability field from answers in favor of score field #1340

Merged
merged 8 commits into from
Aug 17, 2021

Conversation

julian-risch
Copy link
Member

@julian-risch julian-risch commented Aug 12, 2021

Proposed changes:
This PR drops the field probability from answers and documents in favor of the field score. The default behavior is now that the field score contains the value that was previously stored in probability. However, the FARMReader also allows switching back to the old scores by setting use_confidence_scores to False (default is True).
closes #1220

Main changes:

  • no_answer prediction now uses score ranging between 0 and 1:
    "score": float(expit(np.asarray(no_ans_score) / 8)), # just a pseudo prob for now
  • FARMReader allows switching between new scores ranging between 0 and 1 (confidence scores) and old scores:
    use_confidence_scores: bool = True
  • FARMReader extracts new or old scores from QAPreds depending on this switch:
    "score": ans.confidence if self.use_confidence_scores else ans.score,
  • I relaxed an assertion in one of the test cases. Previously, it checked that the correct answer is the top prediction. It had the best score but only the second best probability so it worked in the past. Now, I changed the test so that it checks whether the correct predictions is within the top 2 predictions as the order of the ranked predictions slightly changed.

@tholor I came across some points in the code where a plain dictionary could be replaced with an Answer object:

answers.append({


no_ans_prediction = {"answer": None,

"answer": ans.answer,

cur_answer = {

Status (please check what you already did):

  • First draft (up for discussions & feedback)
  • Final code
  • Added tests -> I added one test case that checks the behavior of the newly introduced switch use_confidence_scores in FARMReader. If use_confidence_scores is set to False, the old scores are used, which can be larger than 1.0
  • Updated documentation -> I removed mentions of "probability" from docs

@julian-risch julian-risch changed the title WIP: Removing probability field from answers in favor of score field Removing probability field from answers in favor of score field Aug 16, 2021
@julian-risch julian-risch marked this pull request as ready for review August 16, 2021 12:38
Copy link
Member

@tholor tholor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks all good to me. Two questions:

  • I assume Summarizer, Re-Ranker and classifier did not use any probabilities before?
  • Should we scale score always between 0-1 now? I think this would simplify communication to users a lot. I assume this scaling is currently not the case for Elasticsearch (see comment).

haystack/document_store/elasticsearch.py Show resolved Hide resolved
docs/_src/api/api/reader.md Outdated Show resolved Hide resolved
@julian-risch
Copy link
Member Author

Looks all good to me. Two questions:

  • I assume Summarizer, Re-Ranker and classifier did not use any probabilities before?
  • Should we scale score always between 0-1 now? I think this would simplify communication to users a lot. I assume this scaling is currently not the case for Elasticsearch (see comment).

Thanks for the feedback! Summarizer, Re-Ranker and Classifier don't have a probability field in the returned Document, correct.

The requested change in the elasticsearch document store is included now and I reverted the manual changes to the reader.md file. Further, I implemented the normalization of weights in the JoinDocuments node (as discussed).

@julian-risch julian-risch merged commit eb990c9 into master Aug 17, 2021
@julian-risch julian-risch deleted the drop-probability-field-from-answers branch August 17, 2021 08:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Drop score field from answers to prevent confusion with probability/confidence
2 participants