Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce incremental updates for embeddings in document stores #812

Merged
merged 9 commits into from
Feb 9, 2021

Conversation

tanaysoni
Copy link
Contributor

@tanaysoni tanaysoni commented Feb 8, 2021

The document stores provide a update_embedding() method that takes a retriever instance as an argument to create/update embeddings of all indexed documents. This works well for the creation of new document stores. However, this approach is inefficient when a large document collection is already present and you want to update only the newly added documents that don't have embeddings yet.

This PR introduces a new parameter update_existing_embeddings for update_embeddings(), which when set to False, will only generate embeddings for documents without embeddings. Additionally, filters are now also added to update embeddings for a subset of documents.

Resolves #806

@tanaysoni tanaysoni changed the title WIP: Introduce incremental updates for embeddings in document stores Introduce incremental updates for embeddings in document stores Feb 8, 2021
@tanaysoni tanaysoni requested a review from tholor February 8, 2021 17:52
@tanaysoni tanaysoni changed the title Introduce incremental updates for embeddings in document stores WIP: Introduce incremental updates for embeddings in document stores Feb 8, 2021
Copy link
Member

@tholor tholor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good to me. Left a few comments. Ready to merge once those are resolved + CI is passing.

haystack/document_store/elasticsearch.py Outdated Show resolved Hide resolved
haystack/document_store/faiss.py Show resolved Hide resolved
haystack/document_store/faiss.py Outdated Show resolved Hide resolved
haystack/document_store/memory.py Show resolved Hide resolved
@tanaysoni tanaysoni changed the title WIP: Introduce incremental updates for embeddings in document stores Introduce incremental updates for embeddings in document stores Feb 9, 2021
@tanaysoni tanaysoni merged commit fd5c5dd into master Feb 9, 2021
@tanaysoni tanaysoni deleted the refactor-update-embedding branch February 9, 2021 20:25
@brandenchan brandenchan mentioned this pull request Feb 19, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

document_store.update_embeddings can only update the new add docs
3 participants