Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for BM25 with the Weaviate document store #2860

Merged
merged 11 commits into from Jul 27, 2022

Conversation

zoltan-fedor
Copy link
Contributor

@zoltan-fedor zoltan-fedor commented Jul 20, 2022

Related Issue(s): Closes #2859

Proposed changes:

  • Adding support for the BM25 (sparse) retrieval when using the Weaviate document store
  • As BM25 support is still in experimental model in Weaviate, when used a warning is displayed
  • Filtering with BM25 is currently not supported in Weaviate - error is raised if somebody tries that
  • Upgrading the Weaviate document store version dependency from v1.11.0 (which doesn't yet support BM25) to v1.14.1 (the latest)

Pre-flight checklist

zoltan-fedor and others added 5 commits July 19, 2022 15:29
This has also brought up an issue with one of the test filtering for value "a". This test has started to fail, as "a" is a default stopword in Weaviate, so I have changed this test to look for value "c" instead of value "a" to get around the stopword issue.
From v3.3.3 to v3.6.0
Weaviate now supports BM25 retrieval in experiment mode and with some limitations (like it cannot be combined with filters).
This commit adds support for inverted index (BM25) querying against Weaviate.
@CLAassistant
Copy link

CLAassistant commented Jul 20, 2022

CLA assistant check
All committers have signed the CLA.

@ZanSara
Copy link
Contributor

ZanSara commented Jul 21, 2022

Hello @zoltan-fedor, thank you for this tidy PR! One of the maintainers is going to review it soon 😊

Copy link
Member

@masci masci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this contribution, code looks very good and I don't really have much to say, only left a couple of comments (let me know if those make sense).

I tested your branch locally with the tutorial 1 and works like a charm.

test/document_stores/test_document_store.py Show resolved Hide resolved
haystack/document_stores/weaviate.py Outdated Show resolved Hide resolved
haystack/document_stores/weaviate.py Outdated Show resolved Hide resolved
@zoltan-fedor zoltan-fedor requested a review from masci July 26, 2022 15:43
Copy link
Member

@masci masci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! I'll wait for the tests to finish and merge.

@zoltan-fedor
Copy link
Contributor Author

@masci
The tests got stuck with the "Expected — Waiting for status to be reported" message.
This might be this issue https://stackoverflow.com/questions/52200096/github-pull-request-waiting-for-status-to-be-reported

I have tried the empty commit to re-trigger all the checks, but as the PR already approved, it seems it doesn't help.
Any ideas?

@ZanSara
Copy link
Contributor

ZanSara commented Jul 26, 2022

Hey @zoltan-fedor ! Would you mind merging master into this branch? We recently had substantial changes in the CI, so that could help triggering the tests.

@zoltan-fedor
Copy link
Contributor Author

zoltan-fedor commented Jul 26, 2022

Thanks @ZanSara
As requested, I have merged the master into the branch, but no change.
I suspect that we are experiencing issues with GitHub Actions, as explained here: https://stackoverflow.com/questions/52200096/github-pull-request-waiting-for-status-to-be-reported

UPDATE: I spoke to soon, at least some of the tests are running now!

@ZanSara
Copy link
Contributor

ZanSara commented Jul 26, 2022

Fortunately not 🙂 Given that you're a first time contributor, GH asks maintainers to approve every CI run. So after the master merge I had to press a button for your CI to start. What you see there was simply the branch protection rules requesting all tests to pass before allowing us to merge.

I see the tests are running now, so if all goes well tomorrow morning @masci should have all the green checks he needs to merge 👍

@zoltan-fedor
Copy link
Contributor Author

Excellent, thank you!

@masci masci merged commit adb2b2c into deepset-ai:master Jul 27, 2022
@zoltan-fedor zoltan-fedor deleted the feature-weaviate-bm25 branch July 27, 2022 15:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add support for BM25 with the Weaviate document store
4 participants