-
Notifications
You must be signed in to change notification settings - Fork 223
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Redundant parameter search_field
in weaviate hybrid search
#1504
Comments
weaviate can now support searching in specific field(s) (the BM25 part) when doing hybrid search (see this issue). We should update the implementation in docarray accordingly. |
Hey @hsm207 , maybe u can create a ticket for this change? |
Hi @hsm207 , I checked the lastest code of weaviate python client. It seems class @dataclass
class Hybrid:
query: str
alpha: float
vector: List[float]
properties: Optional[List[str]]
def __str__(self) -> str:
ret = f'query: "{util.strip_newlines(self.query)}"'
if self.vector is not None:
ret += f", vector: {self.vector}"
if self.alpha is not None:
ret += f", alpha: {self.alpha}"
if self.properties is not None and len(self.properties) > 0:
props = '","'.join(self.properties)
ret += f', properties: ["{props}"]'
return "hybrid:{" + ret + "}" |
I think what we should do is simply make it such that the user doesn't hate to pass |
I think the |
I found a possible issue that the from pydantic import Field
from docarray import BaseDoc
from docarray.index.backends.weaviate import WeaviateDocumentIndex
class Document(BaseDoc):
text: str = Field()
text2: str = Field()
texts = ["lorem ipsum", "dolor sit amet", "consectetur adipiscing elit"]
texts2 = ["dolor sit amet", "lorem ipsum", "consectetur adipiscing elit"]
docs = [
Document(id=str(i), text=text, text2=text2)
for i, (text, text2) in enumerate(zip(texts, texts2))
]
index = WeaviateDocumentIndex[Document]()
results = index.text_search(query='ipsum', search_field='text')
print(len(results)) # 2 The |
Afaik the search_field doesn't have any effect in any of the weaviate methods since it is already declared at schema creation time what field to search on. Can you confirm @hsm207? |
@AnneYang720 I'm sorry, I forgot to verify if clients have implemented this feature. Support for adding properties in the hybrid search is coming soon: weaviate/weaviate-python-client#319
@AnneYang720 I agree, it should be 1. The implementation really does look at the supplied search_field (see here). Did the docarray team made some changes to the interface since it was release? Your code snippet is very similar to the test case I wrote (see here and here). In that test, I asked for 3 documents but only got 1, as expected. This test still passes in the CI, doesn't it?
@JohannesMessner That's incorrect. When we built this initially, the search_field is meant to have effect on text_search only and no effect on hybrid search. |
Yes I noticed this test. It passes because only one doc contains the word "lorem". But in my code example above, the two results are doc['0'] (text) and doc['1'] (text2). But we only expect doc['0'] because |
@AnneYang720 i found the reason: instead of :
it should be:
i.e. we need to unpack the dict that gets passed to Can you do the fix? |
the weaviate python client now supports search field in hybrid search (see https://github.com/weaviate/weaviate-python-client/releases/tag/v3.18.0) |
We can open a new PR for this. |
@hsm207 could you open a PR to add this in a new PR? |
The
search_field
parameter seems to be redundant in the hybrid search query on a Weaviate document index, since the text search will search all fields of the object and no specific text field may be supplied. However, when not supplied, the following error occurs:Although the documentation also explains that this parameter is
necessary but has no effect
, since all text fields will be searched by the hybrid query, it seems there should be no requirement to pass aNone
value here.The text was updated successfully, but these errors were encountered: