Re-ranking component for document search without QA #1025

julian-risch · 2021-05-03T16:29:42Z

Ranker component that re-ranks results of a retriever. The eval() method of the Ranker is not implemented. Instead, the EvalRetriever pipeline node is renamed to EvalDocuments and works for both Retriever and Ranker nodes. For consistency, I also renamed the EvalReader node to EvalAnswers.
The train() method of the Ranker is implemented.
The EvalDocuments node got an additional metric: mean reciprocal rank. Recall and mean reciprocal rank can be measured for top k results (recall@k, mrr@k).

closes #423

Limitations
I tried the functionality with Tutorial 5 and also trained a TextPairClassification model within FARM and used it in Haystack.
However, neither with msmarco nor with asnq_binary as training data, the results of the BM25 retriever could be improved by the ranker in terms of recall@3 or mean reciprocal rank@3. For example has_answer mean_reciprocal_rank@3 dropped from 0.6533 (without ranker) to 0.4333 (with ranker).
I don't know why that is the case. Three possible reasons: task is too easy and BM25 already performs very well, the trained ranker models are bad, or there is a bug in the ranker code. I ran an experiment with 6657 queries so it's not because of a small dataset.

I came across a quick fix where I am unsure if it is still needed. That's why I raised the issue #1093

tholor

Looking already pretty good on the code side!

Regarding the performance check: We definitely need to verify that training and inference work as expected. How about verifying inference with another pre-trained model? Is it possible to load one of the rerankers from sentence-transformers into the current class (https://www.sbert.net/docs/pretrained-models/ce-msmarco.html)? We could verify at least the inference part then by comparing performance in Haystack + sentence transformers or evaluating on MS Marco dev. If this is taking longer than expected, I am also fine with merging this PR first and tackling the verification in another PR (especially training performance).

Before merging we also need to update the documentation:

Small code snippet that shows how to use a (pretrained) re-ranker for inference
Small "Usage" page in the docs similar to the others (e.g. https://haystack.deepset.ai/docs/latest/summarizermd)
Readme

haystack/eval.py

haystack/ranker/farm.py

julian-risch · 2021-05-31T13:07:16Z

I compared Retriever and Ranker node with the GermanDPR dataset and the full GermanWikipedia (2,797,725 Documents).
To this end, I trained a text pair classification model in FARM on the GermanDPR training dataset.

 _________ text_classification _________
05/31/2021 11:20:38 - INFO - farm.eval -   loss: 0.3178765927659484
05/31/2021 11:20:38 - INFO - farm.eval -   task_name: text_classification
05/31/2021 11:20:38 - INFO - farm.eval -   f1_macro: 0.9223982338366556
05/31/2021 11:20:38 - INFO - farm.eval -   report: 
               precision    recall  f1-score   support

           0     0.9039    0.9454    0.9242      1025
           1     0.9427    0.8995    0.9206      1025

    accuracy                         0.9224      2050
   macro avg     0.9233    0.9224    0.9224      2050
weighted avg     0.9233    0.9224    0.9224      2050

The comparison in haystack is on the GermanDPR test dataset as labels. The Retriever retrieves top 10 documents and the ranker re-ranks these top 10 documents. Evaluation is on top 3 documents.

EvalRetriever
-----------------
recall@3: 0.4088 (419 / 1025)
mean_reciprocal_rank@3: 0.3322

Retriever (Speed)
---------------
No indexing performed via Retriever.run()
Queries Performed: 1025
Query time: 59.91217167999912s
0.05845089919999914 seconds per query

EvalRanker
-----------------
recall@3: 0.5102 (523 / 1025)
mean_reciprocal_rank@3: 0.4262

Ranker (Speed)
---------------
Queries Performed: 1025
Query time: 53.910859655999275s
0.052595960639999294 seconds per query

tholor

Ok, let's merge now as #933 and @brandenchan 's further work depend on the renaming of the eval nodes.

Still to tackle after this PR:

Verification of model performance (inference + training)
Uploading / linking a "sane" default, remote model for English
Including the new ranker usage page in the docs (@PiffPaffM can you please take care of it?)

Adding ranker similar to retriever and reader

9ee3824

Timoeller mentioned this pull request May 4, 2021

support for long answers #975

Closed

julian-risch added 3 commits May 5, 2021 16:46

Sort documents according to query-document similarity scores

0d6f69b

Reranking and model training runs for small example

cd1eaf9

Added EvalRanker node

d2d8c76

julian-risch requested a review from tholor May 10, 2021 16:11

julian-risch marked this pull request as ready for review May 10, 2021 16:11

julian-risch changed the title ~~WIP: re-ranking component for document search without QA~~ Re-ranking component for document search without QA May 10, 2021

julian-risch added 6 commits May 12, 2021 11:19

Calculate recall@k in EvalRetriever and EvalRanker nodes

ff19cb4

Renaming EvalRetriever to EvalDocuments and EvalReader to EvalAnswers

1eacd90

Added mean reciprocal rank as metric for EvalDocuments

71008ff

Fix bug that appeared when ranking documents with same score

bcef4ba

Remove commented code for unimplmented eval() of Ranker node

52528a4

Add documentation of k parameter in EvalDocuments

9ddab22

tholor requested changes May 26, 2021

View reviewed changes

haystack/eval.py Outdated Show resolved Hide resolved

haystack/ranker/farm.py Outdated Show resolved Hide resolved

Add Ranker docu and renaming top_k param

184da88

julian-risch requested a review from tholor May 31, 2021 11:05

tholor approved these changes May 31, 2021

View reviewed changes

julian-risch merged commit 84c3429 into master May 31, 2021

julian-risch deleted the document-ranking branch May 31, 2021 13:31

PiffPaffM mentioned this pull request May 31, 2021

Add API docs for Re-ranker #1111

Closed

This was referenced Jun 2, 2021

Check Re-ranking Performance #1126

Closed

Check Re-ranking Performance #1129

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Re-ranking component for document search without QA #1025

Re-ranking component for document search without QA #1025

julian-risch commented May 3, 2021 •

edited

Loading

tholor left a comment

julian-risch commented May 31, 2021

tholor left a comment •

edited by PiffPaffM

Loading

Re-ranking component for document search without QA #1025

Re-ranking component for document search without QA #1025

Conversation

julian-risch commented May 3, 2021 • edited Loading

tholor left a comment

Choose a reason for hiding this comment

julian-risch commented May 31, 2021

tholor left a comment • edited by PiffPaffM Loading

Choose a reason for hiding this comment

julian-risch commented May 3, 2021 •

edited

Loading

tholor left a comment •

edited by PiffPaffM

Loading