-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Re-ranking component for document search without QA #1025
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking already pretty good on the code side!
Regarding the performance check: We definitely need to verify that training and inference work as expected. How about verifying inference with another pre-trained model? Is it possible to load one of the rerankers from sentence-transformers into the current class (https://www.sbert.net/docs/pretrained-models/ce-msmarco.html)? We could verify at least the inference part then by comparing performance in Haystack + sentence transformers or evaluating on MS Marco dev. If this is taking longer than expected, I am also fine with merging this PR first and tackling the verification in another PR (especially training performance).
Before merging we also need to update the documentation:
- Small code snippet that shows how to use a (pretrained) re-ranker for inference
- Small "Usage" page in the docs similar to the others (e.g. https://haystack.deepset.ai/docs/latest/summarizermd)
- Readme
I compared Retriever and Ranker node with the GermanDPR dataset and the full GermanWikipedia (2,797,725 Documents).
The comparison in haystack is on the GermanDPR test dataset as labels. The Retriever retrieves top 10 documents and the ranker re-ranks these top 10 documents. Evaluation is on top 3 documents.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, let's merge now as #933 and @brandenchan 's further work depend on the renaming of the eval nodes.
Still to tackle after this PR:
- Verification of model performance (inference + training)
- Uploading / linking a "sane" default, remote model for English
- Including the new ranker usage page in the docs (@PiffPaffM can you please take care of it?)
Ranker component that re-ranks results of a retriever. The eval() method of the Ranker is not implemented. Instead, the EvalRetriever pipeline node is renamed to EvalDocuments and works for both Retriever and Ranker nodes. For consistency, I also renamed the EvalReader node to EvalAnswers.
The train() method of the Ranker is implemented.
The EvalDocuments node got an additional metric: mean reciprocal rank. Recall and mean reciprocal rank can be measured for top k results (recall@k, mrr@k).
closes #423
Limitations
I tried the functionality with Tutorial 5 and also trained a TextPairClassification model within FARM and used it in Haystack.
However, neither with msmarco nor with asnq_binary as training data, the results of the BM25 retriever could be improved by the ranker in terms of recall@3 or mean reciprocal rank@3. For example has_answer mean_reciprocal_rank@3 dropped from 0.6533 (without ranker) to 0.4333 (with ranker).
I don't know why that is the case. Three possible reasons: task is too easy and BM25 already performs very well, the trained ranker models are bad, or there is a bug in the ranker code. I ran an experiment with 6657 queries so it's not because of a small dataset.
I came across a quick fix where I am unsure if it is still needed. That's why I raised the issue #1093