-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature Request: Add index parameter to TFiDF retriever #1634
Comments
Related? #1637 |
Nice one! Yes it is. I stumbled upon this exact same error by adding the data in an InMemoryStore at a custom index in combination with TFiDFRetriever. |
What's the status on this? I'm actually facing the same problem 🥲 |
While thinking about #3447, I decided to revamp this issue. Document store and retrieversFrom what I understood, in Haystack:
This holds true for most of the cases, with some exceptions:
How to make the TF-IDF retriever to support different indexes (accept the
|
Hey @anakin87! Great analysis, as usual! This is going to be useful even for me to explain the situation to others 🚀 Mandatory premiseSo, first things first: I believe the current abstraction of Retrievers is fundamentally wrong. As you noticed, some Retriever rely fully on the docstore for the retrieval steps, others have their own internal representation stored (usually) in memory. The whole thing at some point will need to be re-evaluated and clarified (a topic already raised, as you noticed, in #2403) by assigning to document store, retrievers and embedders consistent and distinct responsibilities. However, we need to get stuff done with the current architecture for now 😅 How to make the TF-IDF retriever to support different indexes (accept the index parameter)?
Technically all docstores should support indices already, so this should not pose too many challenges. However, I have not verified this in practice.
In case the first solution proves too tough, this is a viable alternative. it might make TFIDFRetriever slow as a snail on large collections, but honestly it's already slow in such conditions, so I don't see it as a big issue 😅 Support for BM25Retriever in InMemoryDocumentStore (#3447)
For as odd as it sounds, I think the main challenge of adding BM25 support to I hope this is helpful, let me know if I forgot to address something! |
Problem
When using the inmemory docstore on a non standard index (e.g. for evaluation) we cannot use the TFiDF Retriever, because you cannot set an index there.
Solution
Lets add the index option to the TFiDF retriever please.
Background
I would like the inmemory store + TFiDF to be used for fast haystack examples (without "complicated" ES or FAISS setup)
The text was updated successfully, but these errors were encountered: