Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix update_embeddings function in FAISSDocumentStore #481

Merged
merged 3 commits into from
Oct 14, 2020
Merged

Fix update_embeddings function in FAISSDocumentStore #481

merged 3 commits into from
Oct 14, 2020

Conversation

lalitpagaria
Copy link
Contributor

To fix following issue (Refer 1cebcb7#r43171379)

  1. Prevent update_embeddings function in FAISSDocumentStore to set faiss_index as None when document store does not have any docs.
  2. Cleaning up tests by adding fixture for retriever.

…iss_index as None when document store does not have any docs.

2. cleaning up tests by adding fixture for retriever.
lalitpagaria referenced this pull request Oct 12, 2020
…#339)

* add time and perf benchmark for es

* Add retriever benchmarking

* Add Reader benchmarking

* add nq to squad conversion

* add conversion stats

* clean benchmarks

* Add link to dataset

* Update imports

* add first support for neg psgs

* Refactor test

* set max_seq_len

* cleanup benchmark

* begin retriever speed benchmarking

* Add support for retriever query index benchmarking

* improve reader eval, retriever speed benchmarking

* improve retriever speed benchmarking

* Add retriever accuracy benchmark

* Add neg doc shuffling

* Add top_n

* 3x speedup of SQL. add postgres docker run. make shuffle neg a param. add more logging

* Add models to sweep

* add option for faiss index type

* remove unneeded line

* change faiss to faiss_flat

* begin automatic benchmark script

* remove existing postgres docker for benchmarking

* Add data processing scripts

* Remove shuffle in script bc data already shuffled

* switch hnsw setup from 256 to 128

* change es similarity to dot product by default

* Error includes stack trace

* Change ES default timeout

* remove delete_docs() from timing for indexing

* Add support for website export

* update website on push to benchmarks

* add complete benchmarks results

* new json format

* removed NaN as is not a valid json token

* fix benchmarking for faiss hnsw queries. do sql calls in update_embeddings() as batches

* update benchmarks for hnsw 128,20,80

* don't delete full index in delete_all_documents()

* update texts for charts

* update recall column for retriever

* change scale and add units to desc

* add units to legend

* add axis titles. update desc

* add html tags

Co-authored-by: deepset <deepset@Crenolape.localdomain>
Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
Co-authored-by: PiffPaffM <markuspaff.mp@gmail.com>
…on as it call fit() function in constructor so fixing it by checking self.paragraphs of None
test/conftest.py Outdated Show resolved Hide resolved
test/conftest.py Outdated Show resolved Hide resolved
Copy link
Member

@tholor tholor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good. Thx for adding this @lalitpagaria!

@tholor tholor merged commit 2e9f3c1 into deepset-ai:master Oct 14, 2020
@lalitpagaria lalitpagaria deleted the fix_faiss_update_embeddings branch October 14, 2020 14:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants