Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix update_embeddings() for FAISSDocumentStore #978

Merged
merged 5 commits into from Apr 21, 2021

Conversation

oryx1729
Copy link
Contributor

@oryx1729 oryx1729 commented Apr 19, 2021

There are three main use cases for update_embeddings():

  1. Creating embeddings in a new document store after writing the documents. In this case, embeddings are created for all the documents in the document store. The default parameters for update_embeddings() are sufficient for this case.

  2. Incremental update after adding more documents in an existing index. In this case, update_embeddings() can be called with update_existing_embeddings set to False to prevent recomputation of existing embeddings.

  3. Create/Update embeddings for a subset of documents in a document store using the filters parameter. For FAISSDocumentStore, filters cannot be used in conjunction with the update_existing_embeddings set to True.

This PR resolves #885 where update_embeddings() could result in more embeddings than the number of documents and adds a new method get_embedding_count() for all document stores.

@oryx1729 oryx1729 requested a review from tholor April 19, 2021 14:07
@oryx1729 oryx1729 changed the title WIP: Fix update_embeddings for FAISS Document Store Fix update_embeddings() for FAISS Document Store Apr 19, 2021
@oryx1729 oryx1729 changed the title Fix update_embeddings() for FAISS Document Store Fix update_embeddings() for FAISSDocumentStore Apr 19, 2021
Copy link
Member

@tholor tholor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good. Only minor comment. Have you verified that update_embeddings() is working correctly for milvus?

haystack/document_store/faiss.py Outdated Show resolved Hide resolved
@oryx1729
Copy link
Contributor Author

Looking good. Only minor comment. Have you verified that update_embeddings() is working correctly for milvus?

Yes, it works correctly for Milvus as unlike FAISS, we can delete vectors using their IDs from the index.

@oryx1729 oryx1729 merged commit 8c1e411 into master Apr 21, 2021
@oryx1729 oryx1729 deleted the fix-update-embeddings branch April 21, 2021 07:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Update Embeddings in FAISS
2 participants