Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Redundant parameter search_field in weaviate hybrid search #1504

Closed
violenil opened this issue May 8, 2023 · 13 comments · Fixed by #1516
Closed

Redundant parameter search_field in weaviate hybrid search #1504

violenil opened this issue May 8, 2023 · 13 comments · Fixed by #1516
Assignees

Comments

@violenil
Copy link

violenil commented May 8, 2023

The search_field parameter seems to be redundant in the hybrid search query on a Weaviate document index, since the text search will search all fields of the object and no specific text field may be supplied. However, when not supplied, the following error occurs:

Screenshot 2023-05-08 at 11 45 20

Although the documentation also explains that this parameter is necessary but has no effect, since all text fields will be searched by the hybrid query, it seems there should be no requirement to pass a None value here.

Screenshot 2023-05-08 at 11 48 24

@hsm207
Copy link
Collaborator

hsm207 commented May 8, 2023

weaviate can now support searching in specific field(s) (the BM25 part) when doing hybrid search (see this issue). We should update the implementation in docarray accordingly.

@JoanFM
Copy link
Member

JoanFM commented May 8, 2023

Hey @hsm207 , maybe u can create a ticket for this change?

@AnneYang720
Copy link
Contributor

Hi @hsm207 , I checked the lastest code of weaviate python client. It seems class Hybrid still doesn't have properties here. Like in weaviate go, I guess this should be changed to:

@dataclass
class Hybrid:
    query: str
    alpha: float
    vector: List[float]
    properties: Optional[List[str]]

    def __str__(self) -> str:
        ret = f'query: "{util.strip_newlines(self.query)}"'
        if self.vector is not None:
            ret += f", vector: {self.vector}"
        if self.alpha is not None:
            ret += f", alpha: {self.alpha}"
        if self.properties is not None and len(self.properties) > 0:
            props = '","'.join(self.properties)
            ret += f', properties: ["{props}"]'

        return "hybrid:{" + ret + "}"

@JohannesMessner
Copy link
Member

I think what we should do is simply make it such that the user doesn't hate to pass search_field=None since it doesn't do anything. So the argument should be made optional

@AnneYang720
Copy link
Contributor

I think what we should do is simply make it such that the user doesn't hate to pass search_field=None since it doesn't do anything. So the argument should be made optional

I think the search_field in _text_search() can be optional as well.

@AnneYang720
Copy link
Contributor

I found a possible issue that the search_field of _text_search() in weaviate doesn't have any effect. For example:

from pydantic import Field

from docarray import BaseDoc
from docarray.index.backends.weaviate import WeaviateDocumentIndex

class Document(BaseDoc):
    text: str = Field()
    text2: str = Field()

texts = ["lorem ipsum", "dolor sit amet", "consectetur adipiscing elit"]
texts2 = ["dolor sit amet", "lorem ipsum", "consectetur adipiscing elit"]

docs = [
    Document(id=str(i), text=text, text2=text2)
    for i, (text, text2) in enumerate(zip(texts, texts2))
]

index = WeaviateDocumentIndex[Document]()
results = index.text_search(query='ipsum', search_field='text')
print(len(results)) # 2

The len(result) should be 1 in my understanding. Is this the expected behavior? @hsm207

@JohannesMessner
Copy link
Member

Afaik the search_field doesn't have any effect in any of the weaviate methods since it is already declared at schema creation time what field to search on. Can you confirm @hsm207?

@hsm207
Copy link
Collaborator

hsm207 commented May 9, 2023

Hi @hsm207 , I checked the lastest code of weaviate python client. It seems class Hybrid still doesn't have properties here

@AnneYang720 I'm sorry, I forgot to verify if clients have implemented this feature. Support for adding properties in the hybrid search is coming soon: weaviate/weaviate-python-client#319

The len(result) should be 1 in my understanding. Is this the expected behavior? @hsm207

@AnneYang720 I agree, it should be 1. The implementation really does look at the supplied search_field (see here). Did the docarray team made some changes to the interface since it was release?

Your code snippet is very similar to the test case I wrote (see here and here). In that test, I asked for 3 documents but only got 1, as expected. This test still passes in the CI, doesn't it?

Afaik the search_field doesn't have any effect in any of the weaviate methods since it is already declared at schema creation time what field to search on. Can you confirm @hsm207?

@JohannesMessner That's incorrect. When we built this initially, the search_field is meant to have effect on text_search only and no effect on hybrid search.

@AnneYang720
Copy link
Contributor

AnneYang720 commented May 9, 2023

Your code snippet is very similar to the test case I wrote (see here and here). In that test, I asked for 3 documents but only got 1, as expected. This test still passes in the CI, doesn't it?

Yes I noticed this test. It passes because only one doc contains the word "lorem". But in my code example above, the two results are doc['0'] (text) and doc['1'] (text2). But we only expect doc['0'] because search_field='text'.

@hsm207
Copy link
Collaborator

hsm207 commented May 9, 2023

@AnneYang720 i found the reason:

instead of :

results = (
            self._client.query.get(index_name, self.properties)
            .with_bm25(bm25)
            .with_limit(limit)
            .with_additional(["score", "vector"])
            .do()
        )

it should be:

results = (
            self._client.query.get(index_name, self.properties)
            .with_bm25(**bm25)
            .with_limit(limit)
            .with_additional(["score", "vector"])
            .do()
        )

i.e. we need to unpack the dict that gets passed to with_bm25.

Can you do the fix?

@hsm207
Copy link
Collaborator

hsm207 commented May 9, 2023

the weaviate python client now supports search field in hybrid search (see https://github.com/weaviate/weaviate-python-client/releases/tag/v3.18.0)

@AnneYang720
Copy link
Contributor

AnneYang720 commented May 10, 2023

the weaviate python client now supports search field in hybrid search (see https://github.com/weaviate/weaviate-python-client/releases/tag/v3.18.0)

We can open a new PR for this.

@JoanFM
Copy link
Member

JoanFM commented May 10, 2023

the weaviate python client now supports search field in hybrid search (see https://github.com/weaviate/weaviate-python-client/releases/tag/v3.18.0)

@hsm207 could you open a PR to add this in a new PR?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

5 participants