Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

speed up slow retriever evaluation #1557

Closed
mathislucka opened this issue Oct 4, 2021 · 4 comments
Closed

speed up slow retriever evaluation #1557

mathislucka opened this issue Oct 4, 2021 · 4 comments
Labels
topic:eval type:feature New feature or request

Comments

@mathislucka
Copy link
Member

mathislucka commented Oct 4, 2021

Is your feature request related to a problem? Please describe.
Recently, I played around with haystack and its evaluation features. I followed along your blog post https://www.deepset.ai/blog/how-to-evaluate-a-question-answering-system and I played around with different scenarios for evaluation. I then realized, that the evaluation method is quite slow and for me, it also had issues with colab notebooks because tqdm progress bars were printed for each query. Another thing that I didn't like was that you can only measure recall at one k at a time. If you want to have a chart of recall @ different k's you will have to run the evaluation multiple times.

To quantify it a bit more:

Evaluating a sentence transformer model on the german dpr test set took about 8 minutes on a P100-GPU whereas the evaluation of the same model in the sentence-transformers library took about 20 seconds.

Describe the solution you'd like
I know that there is much more overhead when compared to a library like sentence transformers because haystack has to use a real retrieval pipeline including a document store like elasticsearch. What I would like is that the eval method in the retriever (at least for dense retrievers)

makes a call to a bulk_retrieve or batch_retrieve method on the retriever. This bulk retrieval method could at least embed all the queries at once instead of embedding the queries one by one. Ideally, the document store would also be queried in batches although I realize that this might mean too much complexity when accounting for different document stores. Still, embedding all queries at once (or batches of queries to avoid OOM errors) will still make the evaluation much faster.

Describe alternatives you've considered
None

Additional context
This is just a feature that I think would be nice to have. It's not really urgent or anything like that.

@Timoeller
Copy link
Contributor

Hey @mathislucka , this is a good observation and would indeed greatly increase the speed of the evaluation.

We already added batch indexing in #1231 and have an open issue for batch querying in #1239
Feel free to discuss the design we laid out in that issue.

Btw do you have other feedback regarding the evaluation? E.g. is it clear what closed domain vs open domain means and how it is implemented? I am asking since we realized the evaluation is rather complex and we want to simplify it. Getting your feedback there would help.

@Timoeller Timoeller added topic:eval type:feature New feature or request labels Oct 5, 2021
@mathislucka
Copy link
Member Author

Open vs. closed domain was clear for me. I did think that adding the eval data was slightly annoying because it says: „Adds a SQuAD-style formatted file to the document store“. You actually have this description in multiple places in your docs and I think the concept is hard to understand because now I have to go and find out how the original SQuAD-dataset is formatted and then I have to transform my data and write it to a file before loading it into the document store. I‘d like to be able to have other options of adding eval data. Maybe either accept a file or a list of dicts or something like that? And actually give an example of the requested structure in your docs, that would be great. I‘ll play around with evaluation a bit more and see if I have any more feedback.

@Timoeller
Copy link
Contributor

We have created a dedicated issue for eval improvements and your suggestion of adding a SQuAD example also found its way in there: #1561

@julian-risch
Copy link
Member

Closing because we already have two issues about batch processing mode in pipelines #1239 and #1589

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic:eval type:feature New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants