-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ENH] Add request metadata to query vectors and retry with version mismatch #2069
Conversation
This stack of pull requests is managed by Graphite. Learn more about stacking. Join @Ishiihara and the rest of your teammates on Graphite |
Reviewer ChecklistPlease leverage this checklist to ensure your code review is thorough before approving Testing, Bugs, Errors, Logs, Documentation
System Compatibility
Quality
|
34d989e
to
16e396d
Compare
65206f1
to
46b770f
Compare
46b770f
to
fcc1c79
Compare
fcc1c79
to
02318ca
Compare
02318ca
to
e5772b3
Compare
69f28f8
to
ddf4fef
Compare
01f346d
to
f75bea2
Compare
f75bea2
to
12849f8
Compare
@@ -70,6 +71,7 @@ def get_metadata( | |||
limit: Optional[int] = None, | |||
offset: Optional[int] = None, | |||
include_metadata: bool = True, | |||
request_metadata: Optional[RequestMetadata] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this only optional for single node compat ? I feel like in distributed we always want it no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. For compatibility with single node.
) | ||
) | ||
except grpc.RpcError as e: | ||
if e.details() == "Collection version mismatch": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we rely on the code?
) | ||
except grpc.RpcError as e: | ||
if e.details() == "Collection version mismatch": | ||
raise ValueError("Collection version mismatch") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think i'd prefer a new error type so we don't string match in above python code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. Found errors.py.
) | ||
self._collection_cache[collection_id] = collections[0] | ||
return self._collection_cache[collection_id] | ||
"""Get a collection database.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmmm, i think removing this cache in single node is worth a thought?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That should be fine. Essentially, Sqlite will cache?
@retry( | ||
retry=retry_if_exception( | ||
lambda e: isinstance(e, ValueError) | ||
and str(e) == "Collection version mismatch" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
prefer custom error type to string
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. I agree.
let collection_version = match request.request_metadata { | ||
Some(request_metadata) => Some(request_metadata.collection_version), | ||
None => { | ||
tracing::error!("No query metadata found"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like this should just be an error and we should make it mandatory
@@ -143,6 +146,8 @@ pub(crate) struct HnswQueryOrchestrator { | |||
result_channel: Option< | |||
tokio::sync::oneshot::Sender<Result<Vec<Vec<VectorQueryResult>>, Box<dyn ChromaError>>>, | |||
>, | |||
// information from the frontend | |||
collection_version: Option<i32>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we also pass the log offset and use that to pull logs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should do this. Need to address in a separate PR.
@@ -568,6 +575,17 @@ impl Component for HnswQueryOrchestrator { | |||
} | |||
}; | |||
|
|||
if self.collection_version.is_some() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should not be optional - is there a good reason to allow it?
@@ -507,6 +514,17 @@ impl MetadataQueryOrchestrator { | |||
} | |||
}; | |||
|
|||
if self.collection_version.is_some() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: prefer_ match to is_some + unwrap
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think my primary comments are around
- Optionality of RequestMetadata and associated data
- We should use log offset to avoid inconsistent queries between vector and metadat
Closing this in favor of the stack #2843 This implementation was improved in several ways:
|
Description of changes
Summarize the changes made by this PR.
Test plan
How are these changes tested?
pytest
for python,yarn test
for js,cargo test
for rustDocumentation Changes
Are all docstrings for user-facing APIs updated if required? Do we need to make documentation changes in the docs repository?