Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Re-ranking component for document search without QA #1025

Merged
merged 11 commits into from
May 31, 2021

Conversation

julian-risch
Copy link
Member

@julian-risch julian-risch commented May 3, 2021

Ranker component that re-ranks results of a retriever. The eval() method of the Ranker is not implemented. Instead, the EvalRetriever pipeline node is renamed to EvalDocuments and works for both Retriever and Ranker nodes. For consistency, I also renamed the EvalReader node to EvalAnswers.
The train() method of the Ranker is implemented.
The EvalDocuments node got an additional metric: mean reciprocal rank. Recall and mean reciprocal rank can be measured for top k results (recall@k, mrr@k).

closes #423

Limitations
I tried the functionality with Tutorial 5 and also trained a TextPairClassification model within FARM and used it in Haystack.
However, neither with msmarco nor with asnq_binary as training data, the results of the BM25 retriever could be improved by the ranker in terms of recall@3 or mean reciprocal rank@3. For example has_answer mean_reciprocal_rank@3 dropped from 0.6533 (without ranker) to 0.4333 (with ranker).
I don't know why that is the case. Three possible reasons: task is too easy and BM25 already performs very well, the trained ranker models are bad, or there is a bug in the ranker code. I ran an experiment with 6657 queries so it's not because of a small dataset.

I came across a quick fix where I am unsure if it is still needed. That's why I raised the issue #1093

@julian-risch julian-risch requested a review from tholor May 10, 2021 16:11
@julian-risch julian-risch marked this pull request as ready for review May 10, 2021 16:11
@julian-risch julian-risch changed the title WIP: re-ranking component for document search without QA Re-ranking component for document search without QA May 10, 2021
Copy link
Member

@tholor tholor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking already pretty good on the code side!

Regarding the performance check: We definitely need to verify that training and inference work as expected. How about verifying inference with another pre-trained model? Is it possible to load one of the rerankers from sentence-transformers into the current class (https://www.sbert.net/docs/pretrained-models/ce-msmarco.html)? We could verify at least the inference part then by comparing performance in Haystack + sentence transformers or evaluating on MS Marco dev. If this is taking longer than expected, I am also fine with merging this PR first and tackling the verification in another PR (especially training performance).

Before merging we also need to update the documentation:

haystack/eval.py Outdated Show resolved Hide resolved
haystack/ranker/farm.py Outdated Show resolved Hide resolved
@julian-risch julian-risch requested a review from tholor May 31, 2021 11:05
@julian-risch
Copy link
Member Author

I compared Retriever and Ranker node with the GermanDPR dataset and the full GermanWikipedia (2,797,725 Documents).
To this end, I trained a text pair classification model in FARM on the GermanDPR training dataset.

 _________ text_classification _________
05/31/2021 11:20:38 - INFO - farm.eval -   loss: 0.3178765927659484
05/31/2021 11:20:38 - INFO - farm.eval -   task_name: text_classification
05/31/2021 11:20:38 - INFO - farm.eval -   f1_macro: 0.9223982338366556
05/31/2021 11:20:38 - INFO - farm.eval -   report: 
               precision    recall  f1-score   support

           0     0.9039    0.9454    0.9242      1025
           1     0.9427    0.8995    0.9206      1025

    accuracy                         0.9224      2050
   macro avg     0.9233    0.9224    0.9224      2050
weighted avg     0.9233    0.9224    0.9224      2050

The comparison in haystack is on the GermanDPR test dataset as labels. The Retriever retrieves top 10 documents and the ranker re-ranks these top 10 documents. Evaluation is on top 3 documents.

EvalRetriever
-----------------
recall@3: 0.4088 (419 / 1025)
mean_reciprocal_rank@3: 0.3322

Retriever (Speed)
---------------
No indexing performed via Retriever.run()
Queries Performed: 1025
Query time: 59.91217167999912s
0.05845089919999914 seconds per query

EvalRanker
-----------------
recall@3: 0.5102 (523 / 1025)
mean_reciprocal_rank@3: 0.4262

Ranker (Speed)
---------------
Queries Performed: 1025
Query time: 53.910859655999275s
0.052595960639999294 seconds per query

Copy link
Member

@tholor tholor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, let's merge now as #933 and @brandenchan 's further work depend on the renaming of the eval nodes.

Still to tackle after this PR:

  • Verification of model performance (inference + training)
  • Uploading / linking a "sane" default, remote model for English
  • Including the new ranker usage page in the docs (@PiffPaffM can you please take care of it?)

@julian-risch julian-risch merged commit 84c3429 into master May 31, 2021
@julian-risch julian-risch deleted the document-ranking branch May 31, 2021 13:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add re-ranking for pure document search
2 participants