Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LUCENE-9908: Move VectorValues#search to LeafReader #104

Merged
merged 6 commits into from Apr 26, 2021

Conversation

jtibshirani
Copy link
Member

This PR removes VectorValues#search in favor of exposing NN search through
VectorReader#search and LeafReader#searchNearestVectors. It also marks the
vector methods on LeafReader as experimental.

@@ -199,40 +229,53 @@ public void checkIntegrity() throws IOException {

@Override
public VectorValues getVectorValues(String field) throws IOException {
FieldInfo info = fieldInfos.fieldInfo(field);
if (info == null) {
FieldEntry fieldEntry = fields.get(field);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Across the different reader implementations, we aren't that consistent when we return null vs. empty. For now I just matched the current pattern in each reader.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally we'd be consistent with what we do with doc values:

  • The low-level API on VectorReader only supports getting data for fields that exist and have vectors enabled on their FieldInfo, the behavior is undefined otherwise.
  • The high-level API on LeafReader checks the FieldInfo and returns null for fields that do not exist or that don't have vectors enabled, and otherwise delegates to the VectorReader.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll try to fix this before merging (for both vector-related methods).

@@ -1357,6 +1359,11 @@ public VectorValues getVectorValues(String fieldName) {
return VectorValues.EMPTY;
}

@Override
public TopDocs searchNearestVectors(String field, float[] target, int k, int fanout) {
return TopDocsCollector.EMPTY_TOPDOCS;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should it be null? My understanding based on the rest of the PR is that we use null when a field doesn't have vectors and EMPTY_TOP_DOCS when a field has vectors but not on the current segment. Since MemoryIndex doesn't support indexing vectors should it return null?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense to me. I'll update this and getVectorValues above to return null.

@@ -199,40 +229,53 @@ public void checkIntegrity() throws IOException {

@Override
public VectorValues getVectorValues(String field) throws IOException {
FieldInfo info = fieldInfos.fieldInfo(field);
if (info == null) {
FieldEntry fieldEntry = fields.get(field);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally we'd be consistent with what we do with doc values:

  • The low-level API on VectorReader only supports getting data for fields that exist and have vectors enabled on their FieldInfo, the behavior is undefined otherwise.
  • The high-level API on LeafReader checks the FieldInfo and returns null for fields that do not exist or that don't have vectors enabled, and otherwise delegates to the VectorReader.

@jtibshirani jtibshirani merged commit 3115f85 into apache:main Apr 26, 2021
@jtibshirani jtibshirani deleted the vector-search branch April 26, 2021 18:26
janhoy pushed a commit to cominvent/lucene that referenced this pull request May 12, 2021
janhoy pushed a commit to cominvent/lucene that referenced this pull request May 12, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants