Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: Weaviate query with filters #3628

Merged
merged 2 commits into from
Nov 28, 2022
Merged

Conversation

ZanSara
Copy link
Contributor

@ZanSara ZanSara commented Nov 25, 2022

Related Issues

Proposed Changes:

  • Uncomment a block of code that should enable filters with queries
  • Enable the relative tests

How did you test it?

  • CI

Notes for the reviewer

n/a

Checklist

@ZanSara ZanSara added type:feature New feature or request type:refactor Not necessarily visible to the users journey:first steps topic:weaviate and removed type:refactor Not necessarily visible to the users labels Nov 25, 2022
@ZanSara ZanSara marked this pull request as ready for review November 25, 2022 09:45
@ZanSara ZanSara requested a review from a team as a code owner November 25, 2022 09:45
@ZanSara ZanSara requested review from vblagoje and removed request for a team November 25, 2022 09:45
Copy link
Member

@vblagoje vblagoje left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@vblagoje vblagoje merged commit eb7b945 into main Nov 28, 2022
@vblagoje vblagoje deleted the weaviate-query-with-filters branch November 28, 2022 11:26
@bobvanluijt
Copy link
Contributor

Thanks @ZanSara 🙏

@zoltan-fedor
Copy link
Contributor

zoltan-fedor commented Nov 29, 2022

Hi @ZanSara , @bobvanluijt, @masci,

I was the one who wrote the Haystack-Weaviate code enabling the BM25 functionality on Weaviate and throwing that exception for BM25 with filters not being supported in Weaviate (https://github.com/deepset-ai/haystack/pull/3628/files#diff-d4e4f2566db6c2fbe852debffcf7830447ecb969d4cdcce6d796fd773eba379bR986) and in parallel have raised an issue for it with Weaviate back in July: weaviate/weaviate#2061

I was just bugging @etiennedi from Weaviate today about the filter support with BM25 and then I was surprised to see this PR in Haystack starting to use that filter support with BM25.

My Weaviate issue hasn't been closed and @etiennedi has confirmed that there is still no filter support with BM25 in Weaviate (see weaviate/weaviate#2134 (comment)).

Are we absolutely sure that there is filter support for BM25 in Weaviate yet?

The odd thing that the unit test in Haystack was changed correctly by @ZanSara to remove the error catch (https://github.com/deepset-ai/haystack/pull/3628/files#diff-bc2015fb1dc5e70c32cf9c8bba6a2497388e0293d385cb4e614cd9cea534b679L158) and I assume that the tests were running successfully.

Something doesn't add up.

Either the unit test on Haystack side was not running, or Weaviate is not properly throwing that error now or @etiennedi is wrong and in fact that filtered BM25 is now supported in Weaviate.

Let me investigate.

UPDATE1: I have confirmed @etiennedi 's finding, Weaviate still throws the expected error with filters on BM25 (using Weaviate v1.16.1 and v1.16.5):
image

UPDATE2: I have confirmed that the Weaviate error is still properly thrown from Haystack too when trying to use BM25 with filters, just this PR has a coding error at https://github.com/deepset-ai/haystack/pull/3628/files#diff-d4e4f2566db6c2fbe852debffcf7830447ecb969d4cdcce6d796fd773eba379bR999, as that should be on the else: side of the if, as it currently overwrites the filtered search with the unfiltered one.
So correctly it should have looked like this:

            # Retrieval with BM25 AND filtering
            if filters:
                # raise NotImplementedError(
                #     "Weaviate currently (v1.14.1) does not support filters WITH inverted index text query (eg BM25)!"
                # )

                # Once Weaviate starts supporting filters with BM25:
                filter_dict = LogicalFilterClause.parse(filters).convert_to_weaviate()
                gql_query = weaviate.gql.get.GetBuilder(class_name=index,
                                                        properties=properties,
                                                        connection=self.weaviate_client) \
                    .with_near_vector({'vector': [0, 0]}) \
                    .with_where(filter_dict) \
                    .with_limit(top_k) \
                    .build()

            else:  # <=== THIS IS MISSING FROM THE PR
                # BM25 retrieval without filtering
                gql_query = (
                    gql.get.GetBuilder(class_name=index, properties=properties, connection=self.weaviate_client)
                    .with_near_vector({"vector": [0, 0]})
                    .with_limit(top_k)
                    .build()
                )

Once that else is there, then the filtered search query is used with Weaviate and that does include the Weaviate error about filtering not being supported with BM25.

IN SUMMARY:
1. This PR introduces a bug - Weaviate does NOT support filters with BM25 yet, so this PR should be removed.
2. There is a Weaviate issue asking for the filter support with BM25, which is just a replacement of my issue I have raised in July (see ).
Please vote for this issue at Weaviate, so this can be added as soon as possible into Weaviate and this PR can be reimplemented in Haystack (without the "missing else:" bug highlighted above).
Unfortunately currently this is NOT slated for the upcoming Weaviate version 1.17, but @etiennedi has noted that it will be considered after that, if it receives enough votes, so please vote for it.

@bobvanluijt
Copy link
Contributor

Side note: thanks for the detailed write-up @zoltan-fedor 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic:weaviate type:feature New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Weaviate: enable BM25 queries with filters
4 participants