Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lucene94HnswVectorsFormat validation fails with large segments #11858

Closed
jtibshirani opened this issue Oct 18, 2022 · 1 comment · Fixed by #11861
Closed

Lucene94HnswVectorsFormat validation fails with large segments #11858

jtibshirani opened this issue Oct 18, 2022 · 1 comment · Fixed by #11861

Comments

@jtibshirani
Copy link
Member

I ran a test on Lucene 9.4 where I tried to force merge 2 million vectors with
dimension 768. It failed with

java.lang.IllegalStateException: Vector data length 3070061568 not matching
size=999369 * dim=768 * byteSize=4 = -1224905728

The problem is that we use an integer to represent the size, which is too small
to hold it. The bug snuck in during the work to enable int8 values, which
switched a long to an int: #1054. This error doesn't occur before version 9.4.

@jtibshirani
Copy link
Member Author

Here's the stack trace:

java.lang.IllegalStateException: Vector data length 3070061568 not matching size=999369 * dim=768 * byteSize=4 = -1224905728
	at org.apache.lucene.core@9.4.0/org.apache.lucene.codecs.lucene94.Lucene94HnswVectorsReader.validateFieldEntry(Lucene94HnswVectorsReader.java:185)
	at org.apache.lucene.core@9.4.0/org.apache.lucene.codecs.lucene94.Lucene94HnswVectorsReader.readFields(Lucene94HnswVectorsReader.java:156)
	at org.apache.lucene.core@9.4.0/org.apache.lucene.codecs.lucene94.Lucene94HnswVectorsReader.readMetadata(Lucene94HnswVectorsReader.java:103)
	at org.apache.lucene.core@9.4.0/org.apache.lucene.codecs.lucene94.Lucene94HnswVectorsReader.<init>(Lucene94HnswVectorsReader.java:64)
	at org.apache.lucene.core@9.4.0/org.apache.lucene.codecs.lucene94.Lucene94HnswVectorsFormat.fieldsReader(Lucene94HnswVectorsFormat.java:157)

Thanks to @ebadyano and @benwtrent for helping me track this down so quickly.

@jtibshirani jtibshirani added this to the 9.4.1 milestone Oct 19, 2022
@jtibshirani jtibshirani changed the title Lucene94HnswVectorsFormat validation fails with large datasets Lucene94HnswVectorsFormat validation fails with large segments Nov 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants