Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] Add request metadata to query vectors and retry with version mismatch #2069

Closed
wants to merge 4 commits into from

Conversation

Ishiihara
Copy link
Contributor

@Ishiihara Ishiihara commented Apr 26, 2024

Description of changes

Summarize the changes made by this PR.

  • Improvements & Bug fixes
    • Change the proto definition to include collection_version and log_position as request metadata.
    • Query vectors API will include request metadata in the API.
    • The query node will compare versions of the frontend request and the query node. Since the query node will query the SysDB for each query, it will have the most up-to-date collection version. A version mismatch means the version in the frontend request is stale and a refresh is needed. This is the best we can do for now as we don't save the previous collection versions yet in the SysDB. In the future, when the previous collection version and file locations are saved, the query can be executed with a consistent previous version.
    • A new error code version mismatch is added in the GRPC response.
    • Frontend will retry fetching the collection version from the SysDb.
    • Removed the collection segment query invocation in the front end.
  • New functionality
    • ...

Test plan

How are these changes tested?

  • Tests pass locally with pytest for python, yarn test for js, cargo test for rust

Documentation Changes

Are all docstrings for user-facing APIs updated if required? Do we need to make documentation changes in the docs repository?

Copy link

Reviewer Checklist

Please leverage this checklist to ensure your code review is thorough before approving

Testing, Bugs, Errors, Logs, Documentation

  • Can you think of any use case in which the code does not behave as intended? Have they been tested?
  • Can you think of any inputs or external events that could break the code? Is user input validated and safe? Have they been tested?
  • If appropriate, are there adequate property based tests?
  • If appropriate, are there adequate unit tests?
  • Should any logging, debugging, tracing information be added or removed?
  • Are error messages user-friendly?
  • Have all documentation changes needed been made?
  • Have all non-obvious changes been commented?

System Compatibility

  • Are there any potential impacts on other parts of the system or backward compatibility?
  • Does this change intersect with any items on our roadmap, and if so, is there a plan for fitting them together?

Quality

  • Is this code of a unexpectedly high quality (Readability, Modularity, Intuitiveness)

@Ishiihara Ishiihara marked this pull request as ready for review April 26, 2024 22:11
@Ishiihara Ishiihara force-pushed the liquan_cache_invalidation_proto branch 2 times, most recently from 34d989e to 16e396d Compare April 26, 2024 22:16
@Ishiihara Ishiihara force-pushed the liquan_cache_invalidation_proto branch from 65206f1 to 46b770f Compare April 26, 2024 23:20
@Ishiihara Ishiihara force-pushed the liquan_cache_invalidation_proto branch from 46b770f to fcc1c79 Compare July 23, 2024 17:30
@Ishiihara Ishiihara force-pushed the liquan_cache_invalidation_proto branch from fcc1c79 to 02318ca Compare August 12, 2024 12:59
@Ishiihara Ishiihara force-pushed the liquan_cache_invalidation_proto branch from 02318ca to e5772b3 Compare August 23, 2024 07:35
@Ishiihara Ishiihara force-pushed the liquan_cache_invalidation_proto branch 2 times, most recently from 69f28f8 to ddf4fef Compare August 28, 2024 17:03
@Ishiihara Ishiihara self-assigned this Aug 28, 2024
@Ishiihara Ishiihara force-pushed the liquan_cache_invalidation_proto branch 6 times, most recently from 01f346d to f75bea2 Compare August 30, 2024 21:28
@Ishiihara Ishiihara force-pushed the liquan_cache_invalidation_proto branch from f75bea2 to 12849f8 Compare August 30, 2024 21:59
@Ishiihara Ishiihara changed the title [ENH] Add reqeust metadata to query and get vectors [ENH] Add request metadata to query vectors and retry with version mismatch Sep 10, 2024
rust/error/src/lib.rs Outdated Show resolved Hide resolved
@@ -70,6 +71,7 @@ def get_metadata(
limit: Optional[int] = None,
offset: Optional[int] = None,
include_metadata: bool = True,
request_metadata: Optional[RequestMetadata] = None,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this only optional for single node compat ? I feel like in distributed we always want it no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. For compatibility with single node.

)
)
except grpc.RpcError as e:
if e.details() == "Collection version mismatch":
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we rely on the code?

)
except grpc.RpcError as e:
if e.details() == "Collection version mismatch":
raise ValueError("Collection version mismatch")
Copy link
Collaborator

@HammadB HammadB Sep 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think i'd prefer a new error type so we don't string match in above python code

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. Found errors.py.

)
self._collection_cache[collection_id] = collections[0]
return self._collection_cache[collection_id]
"""Get a collection database."""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmmm, i think removing this cache in single node is worth a thought?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That should be fine. Essentially, Sqlite will cache?

@retry(
retry=retry_if_exception(
lambda e: isinstance(e, ValueError)
and str(e) == "Collection version mismatch"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

prefer custom error type to string

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I agree.

let collection_version = match request.request_metadata {
Some(request_metadata) => Some(request_metadata.collection_version),
None => {
tracing::error!("No query metadata found");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like this should just be an error and we should make it mandatory

@@ -143,6 +146,8 @@ pub(crate) struct HnswQueryOrchestrator {
result_channel: Option<
tokio::sync::oneshot::Sender<Result<Vec<Vec<VectorQueryResult>>, Box<dyn ChromaError>>>,
>,
// information from the frontend
collection_version: Option<i32>,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we also pass the log offset and use that to pull logs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should do this. Need to address in a separate PR.

@@ -568,6 +575,17 @@ impl Component for HnswQueryOrchestrator {
}
};

if self.collection_version.is_some() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should not be optional - is there a good reason to allow it?

@@ -507,6 +514,17 @@ impl MetadataQueryOrchestrator {
}
};

if self.collection_version.is_some() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: prefer_ match to is_some + unwrap

Copy link
Collaborator

@HammadB HammadB left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think my primary comments are around

  1. Optionality of RequestMetadata and associated data
  2. We should use log offset to avoid inconsistent queries between vector and metadat

@HammadB
Copy link
Collaborator

HammadB commented Sep 25, 2024

Closing this in favor of the stack

#2843
#2842
#2839
#2831
#2827
#2826

This implementation was improved in several ways:

  • Added testing
  • Added custom error type for maintainability
  • Put the version context everywhere it was needed (ALL segment queries need this)
  • Implemented deferred critical work (log offset passed in is used to pull)
  • Don't make non-optional arguments optional.

@HammadB HammadB closed this Sep 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants