-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding support for update_existing_documents to sql and faiss document stores #584
Adding support for update_existing_documents to sql and faiss document stores #584
Conversation
@tholor and @tanaysoni please review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this @lalitpagaria !
It seems to me that we are missing one case: When update_existing_documents = True
and we call write_documents()
with an existing document_id
we will update the data in SQL. What about the associated embedding in FAISS? From what I can tell, we will add the new vector in FAISS, add the new vector_id to SQL and leave the "old vector" in FAISS untouched.
I believe deleting single vectors from FAISS is not possible for all index types (https://github.com/facebookresearch/faiss/wiki/Special-operations-on-indexes#removing-elements-from-an-index), but in that case, we should at least log a warning when settingupdate_existing_documents = True
and include it in the docstrings.
My second concern would be around speed if update_existing_documents = True
as SQL was often the bottleneck before, but I haven't measured it.
What do you think?
@tholor Yes you are correct, I have added following warning -
We can add this info in doc_string of |
Can you please update the doc_string as I am away from my system. |
Thank you for working on this, @lalitpagaria. It looks good-to-go! |
Resolve #562