speed up slow retriever evaluation #1557

mathislucka · 2021-10-04T19:35:17Z

Is your feature request related to a problem? Please describe.
Recently, I played around with haystack and its evaluation features. I followed along your blog post https://www.deepset.ai/blog/how-to-evaluate-a-question-answering-system and I played around with different scenarios for evaluation. I then realized, that the evaluation method is quite slow and for me, it also had issues with colab notebooks because tqdm progress bars were printed for each query. Another thing that I didn't like was that you can only measure recall at one k at a time. If you want to have a chart of recall @ different k's you will have to run the evaluation multiple times.

To quantify it a bit more:

Evaluating a sentence transformer model on the german dpr test set took about 8 minutes on a P100-GPU whereas the evaluation of the same model in the sentence-transformers library took about 20 seconds.

Describe the solution you'd like
I know that there is much more overhead when compared to a library like sentence transformers because haystack has to use a real retrieval pipeline including a document store like elasticsearch. What I would like is that the eval method in the retriever (at least for dense retrievers)

haystack/haystack/retriever/base.py

Line 49 in 3539e6b

def eval(

makes a call to a bulk_retrieve or batch_retrieve method on the retriever. This bulk retrieval method could at least embed all the queries at once instead of embedding the queries one by one. Ideally, the document store would also be queried in batches although I realize that this might mean too much complexity when accounting for different document stores. Still, embedding all queries at once (or batches of queries to avoid OOM errors) will still make the evaluation much faster.

Describe alternatives you've considered
None

Additional context
This is just a feature that I think would be nice to have. It's not really urgent or anything like that.

The text was updated successfully, but these errors were encountered:

Timoeller · 2021-10-05T09:42:25Z

Hey @mathislucka , this is a good observation and would indeed greatly increase the speed of the evaluation.

We already added batch indexing in #1231 and have an open issue for batch querying in #1239
Feel free to discuss the design we laid out in that issue.

Btw do you have other feedback regarding the evaluation? E.g. is it clear what closed domain vs open domain means and how it is implemented? I am asking since we realized the evaluation is rather complex and we want to simplify it. Getting your feedback there would help.

mathislucka · 2021-10-05T17:29:41Z

Open vs. closed domain was clear for me. I did think that adding the eval data was slightly annoying because it says: „Adds a SQuAD-style formatted file to the document store“. You actually have this description in multiple places in your docs and I think the concept is hard to understand because now I have to go and find out how the original SQuAD-dataset is formatted and then I have to transform my data and write it to a file before loading it into the document store. I‘d like to be able to have other options of adding eval data. Maybe either accept a file or a list of dicts or something like that? And actually give an example of the requested structure in your docs, that would be great. I‘ll play around with evaluation a bit more and see if I have any more feedback.

Timoeller · 2021-10-06T11:05:15Z

We have created a dedicated issue for eval improvements and your suggestion of adding a SQuAD example also found its way in there: #1561

julian-risch · 2022-01-05T17:13:00Z

Closing because we already have two issues about batch processing mode in pipelines #1239 and #1589

Timoeller added topic:eval type:feature New feature or request labels Oct 5, 2021

julian-risch closed this as completed Jan 5, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

speed up slow retriever evaluation #1557

speed up slow retriever evaluation #1557

mathislucka commented Oct 4, 2021 •

edited

Loading

Timoeller commented Oct 5, 2021

mathislucka commented Oct 5, 2021

Timoeller commented Oct 6, 2021

julian-risch commented Jan 5, 2022

speed up slow retriever evaluation #1557

speed up slow retriever evaluation #1557

Comments

mathislucka commented Oct 4, 2021 • edited Loading

Timoeller commented Oct 5, 2021

mathislucka commented Oct 5, 2021

Timoeller commented Oct 6, 2021

julian-risch commented Jan 5, 2022

mathislucka commented Oct 4, 2021 •

edited

Loading