Skip to content

Add per-field knn vector format info in SegmentInfo#13367

Closed
tteofili wants to merge 4 commits intoapache:mainfrom
tteofili:knn_format_segment_commit_info
Closed

Add per-field knn vector format info in SegmentInfo#13367
tteofili wants to merge 4 commits intoapache:mainfrom
tteofili:knn_format_segment_commit_info

Conversation

@tteofili
Copy link
Contributor

@tteofili tteofili commented May 14, 2024

When indexing vectors, it is possible to use different vector formats depending on the field; in addition to that it's also possible (although not currently implemented) to have Codecs that can provide different vector formats "dynamically" even for a same field.
To better debug such situations, it would be helpful to have per field vector format information within SegmentCommitInfo (e.g. within the attributes).

This trivial PR adds KnnVectorFormat#name for each field to SegmentInfo#attributes in PerFieldKnnVectorsFormat.
If a doc with field1 is indexed with Lucene99HnswVectorsFormat and a doc with field2 is indexed with Lucene99HnswScalarQuantizedVectorsFormat within the same segment, the correspondingSegmentInfo#attributes will have the following entries:

  • "KnnVectorFormat.field1" -> "Lucene99HnswVectorsFormat"
  • "KnnVectorFormat.field2" -> "Lucene99HnswScalarQuantizedVectorsFormat"

Copy link
Contributor

@ChrisHegarty ChrisHegarty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jpountz
Copy link
Contributor

jpountz commented May 14, 2024

I'm a bit confused: what is the benefit of having it on segment infos in addition to field infos?

@tteofili
Copy link
Contributor Author

you're right @jpountz , we can probably get away with fieldInfo.getAttribute(PerFieldKnnVectorFormat.PER_FIELD_FORMAT_KEY), I didn't notice that, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants