Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

It lasted for several days to throw the exception: failed to load bitset #103840

Open
kkewwei opened this issue Jan 3, 2024 · 1 comment
Open
Labels
>bug :Search/Search Search-related issues that do not fall into other categories Team:Search Meta label for search team

Comments

@kkewwei
Copy link
Contributor

kkewwei commented Jan 3, 2024

Elasticsearch Version

7.10.1

Java Version

jdk11

OS Version

4.14.81.bm.29-amd64 #1 SMP Debian 4.14.81.bm.29

Problem Description

In our product, there are two cases showing that the file-system corruption, but the shard is green , which seems abnormal.

Case1
Server log: It lasted for several days to throw follow exception:

[2023-12-30T10:00:09,017][WARN ][o.e.i.w.ShardIndexWarmerService] [data0] [index][114] failed to load bitset for [DocValuesFieldExistsQuery [field=_primary_term]]
java.util.concurrent.ExecutionException: java.io.EOFException: read past EOF: MMapIndexInput(path="/data/nodes/0/indices/PFK_L3OGRBicJkxAcdDw1g/114/index/_x8jv.cfs") [slice=_x8jv_Lucene80_0.dvd] [slice=docs]
        at org.elasticsearch.common.cache.Cache.computeIfAbsent(Cache.java:436) ~[elasticsearch-7.10.2.jar:7.10.2]
        at org.elasticsearch.index.cache.bitset.BitsetFilterCache.getAndLoadIfNotPresent(BitsetFilterCache.java:148) ~[elasticsearch-7.10.2.jar:7.10.2]
        at org.elasticsearch.index.cache.bitset.BitsetFilterCache.access$000(BitsetFilterCache.java:74) ~[elasticsearch-7.10.2.jar:7.10.2]
        at org.elasticsearch.index.cache.bitset.BitsetFilterCache$BitSetProducerWarmer.lambda$warmReader$1(BitsetFilterCache.java:265) [elasticsearch-7.10.2.jar:7.10.2]
        at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:684) [elasticsearch-7.10.2.jar:7.10.2]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130) [?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630) [?:?]
        at java.lang.Thread.run(Thread.java:832) [?:?]
Caused by: java.io.EOFException: read past EOF: MMapIndexInput(path="/data/nodes/0/indices/PFK_L3OGRBicJkxAcdDw1g/114/index/_x8jv.cfs") [slice=_x8jv_Lucene80_0.dvd] [slice=docs]
        at org.apache.lucene.store.ByteBufferIndexInput.readByte(ByteBufferIndexInput.java:85) ~[lucene-core-8.7.0.jar:8.7.0-SNAPSHOT b324c8f6ecb8a04a66d7b52d52d18a664cbf1ab4 - root - 2022-03-29 20:56:53]
        at org.apache.lucene.store.DataInput.readShort(DataInput.java:95) ~[lucene-core-8.7.0.jar:8.7.0-SNAPSHOT b324c8f6ecb8a04a66d7b52d52d18a664cbf1ab4 - root - 2022-03-29 20:56:53]
        at org.apache.lucene.store.ByteBufferIndexInput.readShort(ByteBufferIndexInput.java:163) ~[lucene-core-8.7.0.jar:8.7.0-SNAPSHOT b324c8f6ecb8a04a66d7b52d52d18a664cbf1ab4 - root - 2022-03-29 20:56:53]
        at org.apache.lucene.codecs.lucene80.IndexedDISI$Method$1.advanceWithinBlock(IndexedDISI.java:478) ~[lucene-core-8.7.0.jar:8.7.0-SNAPSHOT b324c8f6ecb8a04a66d7b52d52d18a664cbf1ab4 - root - 2022-03-29 20:56:53]
        at org.apache.lucene.codecs.lucene80.IndexedDISI.advance(IndexedDISI.java:389) ~[lucene-core-8.7.0.jar:8.7.0-SNAPSHOT b324c8f6ecb8a04a66d7b52d52d18a664cbf1ab4 - root - 2022-03-29 20:56:53]
        at org.apache.lucene.codecs.lucene80.IndexedDISI.nextDoc(IndexedDISI.java:459) ~[lucene-core-8.7.0.jar:8.7.0-SNAPSHOT b324c8f6ecb8a04a66d7b52d52d18a664cbf1ab4 - root - 2022-03-29 20:56:53]
        at org.apache.lucene.codecs.lucene80.Lucene80DocValuesProducer$SparseNumericDocValues.nextDoc(Lucene80DocValuesProducer.java:444) ~[lucene-core-8.7.0.jar:8.7.0-SNAPSHOT b324c8f6ecb8a04a66d7b52d52d18a664cbf1ab4 - root - 2022-03-29 20:56:53]
        at org.apache.lucene.util.BitSet.or(BitSet.java:95) ~[lucene-core-8.7.0.jar:8.7.0-SNAPSHOT b324c8f6ecb8a04a66d7b52d52d18a664cbf1ab4 - root - 2022-03-29 20:56:53]
        at org.apache.lucene.util.FixedBitSet.or(FixedBitSet.java:271) ~[lucene-core-8.7.0.jar:8.7.0-SNAPSHOT b324c8f6ecb8a04a66d7b52d52d18a664cbf1ab4 - root - 2022-03-29 20:56:53]
        at org.apache.lucene.util.BitSet.of(BitSet.java:41) ~[lucene-core-8.7.0.jar:8.7.0-SNAPSHOT b324c8f6ecb8a04a66d7b52d52d18a664cbf1ab4 - root - 2022-03-29 20:56:53]
        at org.elasticsearch.index.cache.bitset.BitsetFilterCache.bitsetFromQuery(BitsetFilterCache.java:103) ~[elasticsearch-7.10.2.jar:7.10.2]
        at org.elasticsearch.index.cache.bitset.BitsetFilterCache.lambda$getAndLoadIfNotPresent$1(BitsetFilterCache.java:149) ~[elasticsearch-7.10.2.jar:7.10.2]
        at org.elasticsearch.common.cache.Cache.computeIfAbsent(Cache.java:433) ~[elasticsearch-7.10.2.jar:7.10.2]
        ... 7 more

We just print a warn log no matter what the exception is:
https://github.com/elastic/elasticsearch/blob/1c34507e66d7db1211f66f3513706fdf548736aa/server/src/main/java/org/elasticsearch/index/cache/bitset/BitsetFilterCache.java#L270C5-L270C5

If we should distinguish exceptions: if there is an IOException, the shard should be failed.

Case2
Client log:

Caused by: [index0/data0][[index0][91]] ElasticsearchException[Elasticsearch exception [type=engine_exception, reason=Couldn't resolve version]]; nested: ElasticsearchException[Elasticsearch exception [type=illegal_state_exception, reason=document [0] does not have docValues for [_primary_term]]];
        at org.elasticsearch.ElasticsearchException.innerFromXContent(ElasticsearchException.java:496)
        at org.elasticsearch.ElasticsearchException.fromXContent(ElasticsearchException.java:407)
        at org.elasticsearch.action.bulk.BulkItemResponse.fromXContent(BulkItemResponse.java:139)
        at org.elasticsearch.action.bulk.BulkResponse.fromXContent(BulkResponse.java:188)
        at org.elasticsearch.client.RestHighLevelClient.parseEntity(RestHighLevelClient.java:1911)
        at org.elasticsearch.client.RestHighLevelClient.lambda$performRequestAsyncAndParseEntity$10(RestHighLevelClient.java:1699)
        at org.elasticsearch.client.RestHighLevelClient$1.onSuccess(RestHighLevelClient.java:1781)
        at org.elasticsearch.client.RestClient$FailureTrackingResponseListener.onSuccess(RestClient.java:636)
        at org.elasticsearch.client.RestClient$1.completed(RestClient.java:376)
        at org.elasticsearch.client.RestClient$1.completed(RestClient.java:370)

It throws the EngineException inner, but doesn't process the exception outside:

return getFromSearcher(get, searcherFactory, scope);

If we should fail the shard here when throwing EngineException.

@kkewwei kkewwei added >bug needs:triage Requires assignment of a team area label labels Jan 3, 2024
@kingherc kingherc added the :Search/Search Search-related issues that do not fall into other categories label Jan 11, 2024
@elasticsearchmachine elasticsearchmachine added Team:Search Meta label for search team and removed needs:triage Requires assignment of a team area label labels Jan 11, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search (Team:Search)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Search/Search Search-related issues that do not fall into other categories Team:Search Meta label for search team
Projects
None yet
Development

No branches or pull requests

3 participants