-
Notifications
You must be signed in to change notification settings - Fork 962
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LUCENE-9855: Rename nn search vector format #207
Conversation
Most changes were done by IDE's refactoring function, besides I manually modified some comments/javadocs that were figured out by |
@@ -247,12 +246,12 @@ public TopDocs search(String field, float[] target, int k, int fanout) throws IO | |||
return null; | |||
} | |||
|
|||
OffHeapVectorValues vectorValues = getOffHeapVectorValues(fieldEntry); | |||
OffHeapNnVectors nnVectors = getOffHeapNnVectors(fieldEntry); | |||
|
|||
// use a seed that is fixed for the index so we get reproducible results for the same query | |||
final Random random = new Random(checksumSeed); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PREDICTABLE_RANDOM: This random generator (java.util.Random) is predictable (details)
(at-me in a reply with help
or ignore
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
true, but annoying. I think we can safely ignore this. The intent is predictable-randomness. @sonatype-lift
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've recorded this as ignored for this pull request. If you change your mind, just comment @sonatype-lift unignore
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I found a few error/status messages that we might want to update. I also scanned Lucene90HnswVectorWriter, but didn't find any. I think there may be some messages in the old VectorWriter/NnVectorsWriter that need updating?
@@ -2284,28 +2284,28 @@ static void checkImpacts(Impacts impacts, int lastTarget) { | |||
* | |||
* @lucene.experimental | |||
*/ | |||
public static Status.VectorValuesStatus testVectors( | |||
public static Status.NnVectorsStatus testVectors( | |||
CodecReader reader, PrintStream infoStream, boolean failFast) throws IOException { | |||
if (infoStream != null) { | |||
infoStream.print(" test: vectors.............."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we change the text here? "nn vectors"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in 4d301c5
if (fieldInfo.hasVectorValues()) { | ||
int dimension = fieldInfo.getVectorDimension(); | ||
if (fieldInfo.hasNnVectors()) { | ||
int dimension = fieldInfo.getNnVectorDimension(); | ||
if (dimension <= 0) { | ||
throw new RuntimeException( | ||
"Field \"" | ||
+ fieldInfo.name | ||
+ "\" has vector values but dimension is " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"vector values" -> "nearest-neighbor vector values"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in 4d301c5
@@ -34,7 +34,7 @@ | |||
* | |||
* @lucene.experimental | |||
*/ | |||
class VectorValuesWriter { | |||
class NnVectorsConsumer { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
@@ -7,7 +7,7 @@ http://s.apache.org/luceneversions | |||
|
|||
New Features | |||
|
|||
* LUCENE-9322: Vector-valued fields, Lucene90 Codec (Mike Sokolov, Julie Tibshirani, Tomoko Uchida) | |||
* LUCENE-9322 LUCENE-9855: Vector-valued fields, Lucene90 Codec (Mike Sokolov, Julie Tibshirani, Tomoko Uchida) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this can be grouped with LUCENE-9322?
Thanks @msokolov for reviewing. I'll wait for a few more days to let others give comments on this then merge it to the upstream if there is no another feedback. |
Maybe this one should not be renamed since it isn't related to nearest-neighbor search, it only allows iterating over vectors in doc ID order? |
+1 to keeping the name One thought: I actually find the "Nn" prefix hard to read/ type, it almost looks like a typo. For me either "NN" or "Knn" (which is @msokolov's preferred option) would be more elegant. I really don't mean to nitpick, and understand it was difficult to reach a decision, so feel free to ignore this comment if you'd like :) |
Re: |
|
Good point. I think having the similarity function on VectorValues is a bit awkward too since VectorValues is only about retrieving vectors. Maybe we should remove it from VectorValues? |
I opened #213 to remove |
Sounds good @mocobeta I'll merge shortly. A new question is whether |
I opened #218 since I found that it's much easier than to revert previous commits and resolve the conflicts with the latest main here. Can you have a look at it? "Knn" vs. "Nn" - While both are fine for me, I noticed there are already several nearest-neighbor related classes/variables that have "knn" in their name (e.g. KnnGraphValues). For consistency and visibility, I used "Knn" this time. |
I'm fine with |
This consists of three parts (commits).
0ae2804 refactors o.a.l.codec package.
6e7e60d refactors o.a.l.index package.
8824a04 refactors o.a.l.document package.
--
See https://issues.apache.org/jira/browse/LUCENE-9855 for details.