ArrayIndexOutOfBoundsException in ByteBlockPool [LUCENE-8614] #9660

asfimport · 2018-12-19T00:18:38Z

A field with a very large number of small tokens can cause ArrayIndexOutOfBoundsException in ByteBlockPool due to an arithmetic overflow in ByteBlockPool.

The issue was originally reported in elastic/elasticsearch#23670 where due to the indexing settings the geo_shape generated a very large number of tokens and caused the indexing operation to fail with the following exception:

Caused by: java.lang.ArrayIndexOutOfBoundsException: -65531
	at org.apache.lucene.util.ByteBlockPool.setBytesRef(ByteBlockPool.java:308) \~[lucene-core-6.4.0.jar:6.4.0 bbe4b08cc1fb673d0c3eb4b8455f23ddc1364124 - jim - 2017-01-17 15:57:29]
	at org.apache.lucene.util.BytesRefHash.equals(BytesRefHash.java:183) \~[lucene-core-6.4.0.jar:6.4.0 bbe4b08cc1fb673d0c3eb4b8455f23ddc1364124 - jim - 2017-01-17 15:57:29]
	at org.apache.lucene.util.BytesRefHash.findHash(BytesRefHash.java:337) \~[lucene-core-6.4.0.jar:6.4.0 bbe4b08cc1fb673d0c3eb4b8455f23ddc1364124 - jim - 2017-01-17 15:57:29]
	at org.apache.lucene.util.BytesRefHash.add(BytesRefHash.java:255) \~[lucene-core-6.4.0.jar:6.4.0 bbe4b08cc1fb673d0c3eb4b8455f23ddc1364124 - jim - 2017-01-17 15:57:29]
	at org.apache.lucene.index.TermsHashPerField.add(TermsHashPerField.java:149) \~[lucene-core-6.4.0.jar:6.4.0 bbe4b08cc1fb673d0c3eb4b8455f23ddc1364124 - jim - 2017-01-17 15:57:29]
	at org.apache.lucene.index.DefaultIndexingChain$PerField.invert(DefaultIndexingChain.java:766) \~[lucene-core-6.4.0.jar:6.4.0 bbe4b08cc1fb673d0c3eb4b8455f23ddc1364124 - jim - 2017-01-17 15:57:29]
	at org.apache.lucene.index.DefaultIndexingChain.processField(DefaultIndexingChain.java:417) \~[lucene-core-6.4.0.jar:6.4.0 bbe4b08cc1fb673d0c3eb4b8455f23ddc1364124 - jim - 2017-01-17 15:57:29]
	at org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:373) \~[lucene-core-6.4.0.jar:6.4.0 bbe4b08cc1fb673d0c3eb4b8455f23ddc1364124 - jim - 2017-01-17 15:57:29]
	at org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:231) \~[lucene-core-6.4.0.jar:6.4.0 bbe4b08cc1fb673d0c3eb4b8455f23ddc1364124 - jim - 2017-01-17 15:57:29]
	at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:478) \~[lucene-core-6.4.0.jar:6.4.0 bbe4b08cc1fb673d0c3eb4b8455f23ddc1364124 - jim - 2017-01-17 15:57:29]
	at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1575) \~[lucene-core-6.4.0.jar:6.4.0 bbe4b08cc1fb673d0c3eb4b8455f23ddc1364124 - jim - 2017-01-17 15:57:29]
	at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1320) \~[lucene-core-6.4.0.jar:6.4.0 bbe4b08cc1fb673d0c3eb4b8455f23ddc1364124 - jim - 2017-01-17 15:57:29]

I was able to reproduce the issue and somewhat reduce the test that reproduces it (see enclosed patch) but unfortunately it still requires 12G of heap to run.

The issue seems to be caused by arithmetic overflow in the byteOffset calculation when BytesBlockPool advances to the next buffer on the last line of the nextBuffer() method, but it doesn't manifest itself until much later when this offset is used to calculate the bytesStart in BytesRefHash, which in turn causes AIOB back in the ByteBlockPool setBytesRef() method where it is used to find the term's buffer.

I realize that it's unreasonable to expect lucene to index such fields, but I wonder if an overflow check should be added to BytesBlockPool.nextBuffer in order to handle such condition more gracefully.

Migrated from LUCENE-8614 by Igor Motov (@imotov), updated Mar 03 2022
Attachments: LUCENE-8614.patch
Linked issues:

ArrayIndexOutOfBoundsException during indexing [LUCENE-10441] #11477

The text was updated successfully, but these errors were encountered:

asfimport · 2019-02-28T09:48:00Z

Adrien Grand (@jpountz) (migrated from JIRA)

+1 to check for overflows and raise a better error

Maybe we can write a test that uses reasonable amounts of memory by using a dummy allocator that always returns the same byte[].

stefanvodita · 2023-06-24T20:23:26Z

I opened a PR that addresses the overflow. I couldn’t reproduce the error using the test in the patch file, but the overflow is easy to reproduce if we test the ByteBlockPool directly.

zhaih · 2023-09-20T06:32:18Z

@stefanvodita Seems the issue is resolved? I closed the issue, feel free to reopen it

stefanvodita · 2023-09-20T08:11:13Z

Yes, it's resolved. Thanks, Patrick!

asfimport mentioned this issue Aug 24, 2022

ArrayIndexOutOfBoundsException during indexing [LUCENE-10441] #11477

Open

stefanvodita added a commit to stefanvodita/lucene that referenced this issue Jun 24, 2023

Catch offset overflows in byte pool (apache#9660)

f374bad

jpountz pushed a commit that referenced this issue Jun 28, 2023

Catch offset overflows in byte pool (#9660) (#12392)

b88d3e1

jpountz pushed a commit that referenced this issue Jun 28, 2023

Catch offset overflows in byte pool (#9660) (#12392)

445b30d

zhaih added this to the 9.8.0 milestone Sep 20, 2023

zhaih closed this as completed Sep 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ArrayIndexOutOfBoundsException in ByteBlockPool [LUCENE-8614] #9660

ArrayIndexOutOfBoundsException in ByteBlockPool [LUCENE-8614] #9660

asfimport commented Dec 19, 2018 •

edited

Loading

asfimport commented Feb 28, 2019

stefanvodita commented Jun 24, 2023

zhaih commented Sep 20, 2023

stefanvodita commented Sep 20, 2023

ArrayIndexOutOfBoundsException in ByteBlockPool [LUCENE-8614] #9660

ArrayIndexOutOfBoundsException in ByteBlockPool [LUCENE-8614] #9660

Comments

asfimport commented Dec 19, 2018 • edited Loading

asfimport commented Feb 28, 2019

stefanvodita commented Jun 24, 2023

zhaih commented Sep 20, 2023

stefanvodita commented Sep 20, 2023

asfimport commented Dec 19, 2018 •

edited

Loading