Index Corruption #7238

ghost · 2014-08-12T14:49:44Z

We have two nodes running in our elasticsearch set-up with the cluster regularly going into yellow state. As far as we have been able to determine, this is usually caused by a shard of one particular index regularly becoming unassigned. Restarting the elasticsearch instance helps for a short while (green state), but it usually turns yellow again in half an hour or so.

Log File Excerpt:

[2014-08-12 13:16:15,183][INFO ][node ] [Natalie Portman] started
[2014-08-12 13:16:18,742][INFO ][gateway ] [Natalie Portman] recovered [19] indices into cluster_state
[2014-08-12 13:16:20,982][INFO ][cluster.service ] [Natalie Portman] added {[Jennifer Lawrence][cTovvVXzSgmKBSgi9AoYaA][ip-XXXXX][inet[/XXXX]],}, reason: zen-disco-receive(join from node[[Jennifer Lawrence][cTovvVXzSgmKBSgi9AoYaA][XXXXX][inet[/XXXXX]]])
[2014-08-12 14:13:50,511][WARN ][index.merge.scheduler ] [Natalie Portman] [fuelup-accounts-revisions][2] failed to merge
java.lang.ArrayIndexOutOfBoundsException: 219
at org.apache.lucene.util.FixedBitSet.set(FixedBitSet.java:256)
at org.apache.lucene.codecs.PostingsConsumer.merge(PostingsConsumer.java:87)
at org.apache.lucene.codecs.TermsConsumer.merge(TermsConsumer.java:110)
at org.apache.lucene.codecs.FieldsConsumer.merge(FieldsConsumer.java:72)
at org.apache.lucene.index.SegmentMerger.mergeTerms(SegmentMerger.java:399)
at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:112)
at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4163)
at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3759)
at org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:405)
at org.apache.lucene.index.TrackingConcurrentMergeScheduler.doMerge(TrackingConcurrentMergeScheduler.java:106)
at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:482)
[2014-08-12 14:13:50,515][WARN ][index.engine.internal ] [Natalie Portman] [fuelup-accounts-revisions][2] failed engine [merge exception]
[2014-08-12 14:13:50,784][WARN ][cluster.action.shard ] [Natalie Portman] [fuelup-accounts-revisions][2] sending failed shard for [fuelup-accounts-revisions][2], node[wqRW6lbSQ6yk1B3bxbCF6w], [R], s[STARTED], indexUUID [PMsFEnTWQ1e_uqGFx8yUXw], reason [engine failure, message [merge exception][MergeException[java.lang.ArrayIndexOutOfBoundsException: 219]; nested: ArrayIndexOutOfBoundsException[219]; ]]
[2014-08-12 14:13:50,785][WARN ][cluster.action.shard ] [Natalie Portman] [fuelup-accounts-revisions][2] received shard failed for [fuelup-accounts-revisions][2], node[wqRW6lbSQ6yk1B3bxbCF6w], [R], s[STARTED], indexUUID [PMsFEnTWQ1e_uqGFx8yUXw], reason [engine failure, message [merge exception][MergeException[java.lang.ArrayIndexOutOfBoundsException: 219]; nested: ArrayIndexOutOfBoundsException[219]; ]]
[2014-08-12 14:13:50,888][WARN ][cluster.action.shard ] [Natalie Portman] [fuelup-accounts-revisions][2] received shard failed for [fuelup-accounts-revisions][2], node[cTovvVXzSgmKBSgi9AoYaA], [P], s[STARTED], indexUUID [PMsFEnTWQ1e_uqGFx8yUXw], reason [engine failure, message [merge exception][MergeException[java.lang.ArrayIndexOutOfBoundsException: 219]; nested: ArrayIndexOutOfBoundsException[219]; ]]
[2014-08-12 14:13:50,890][WARN ][cluster.action.shard ] [Natalie Portman] [fuelup-accounts-revisions][2] received shard failed for [fuelup-accounts-revisions][2], node[cTovvVXzSgmKBSgi9AoYaA], [P], s[STARTED], indexUUID [PMsFEnTWQ1e_uqGFx8yUXw], reason [master [Natalie Portman][wqRW6lbSQ6yk1B3bxbCF6w][ip-XXXXX][inet[/XXXXX]] marked shard as started, but shard has not been created, mark shard as failed]
[2014-08-12 14:13:52,371][WARN ][index.merge.scheduler ] [Natalie Portman] [fuelup-accounts-revisions][2] failed to merge

We have verified that the machines have both enough storage and memory, so we are wondering what is going wrong and how we could further debug this. Suggestions are appreciated.

mikemccand · 2014-08-12T15:39:02Z

This looks like either index corruption or a Lucene issue. What ES version are you using? Are you able to reproduce this issue from a clean slate / minimal configuration?

ghost · 2014-08-14T11:05:48Z

We are using ES version 1.3.1 (upgraded from 1.2.1 running before). I have investigated it further and indeed it seems two of our indices were corrupted somehow. I have the corrupted data still so I could analyze it further to see what caused this. The disks containing the indices were not too large (about 8x larger than the data itself and filled for about 60 - 80%) I am hypothesizing that too little disk space when synchronizing shards across the machine could have been the cause, but I am not sure if that's a realistic assessment. If you have other suggestions that I could look into, those would be welcome. I want to rule out that this is an ES (or Lucene) issue.

mikemccand · 2014-08-14T13:10:22Z

A disk full should not cause index corruption...

Are you able to reproduce the creation of the corrupted index? Did you run CheckIndex on it (can you post the output if so)?

Also, I'd recommend upgrading to 1.3.2: there was a rare bug in the compression library that ES uses when moving shards around, that can introduce corruption.

ghost · 2014-09-01T15:33:29Z

Mike, we:

Upgraded our machines to 1.3.2
Put our indices on separate large volumes

Still we are getting regular index corruption, particularly on indices which have a lot of write operations, see the examples below.

Do you have a suggestions as to what the best approach is to debugging this? We'd be happy to do this, but are having some trouble figuring out the best place to start.

Some examples:

[2014-09-01 15:09:44,799][WARN ][cluster.action.shard ] [Jennifer Lawrence] [fuelup-accounts-revisions][2] received shard failed for [fuelup-accounts-revisions][2], node[eFW-40fxQiye18qoDPiJ_Q], [P], s[INITIALIZING], indexUUID [pEYC0ucUSIGun5KgVOoC0A], reason [Failed to start shard, message [IndexShardGatewayRecoveryException[[fuelup-accounts-revisions][2] failed to fetch index version after copying it over]; nested: CorruptIndexException[[fuelup-accounts-revisions][2] Corrupted index [corrupted_U1FKXejPRfGBoSRivrDxHQ] caused by: CorruptIndexException[codec footer mismatch: actual footer=1952805376 vs expected footer=-1071082520 (resource: NIOFSIndexInput(path="/esdata/fuelup-testing/nodes/0/indices/fuelup-accounts-revisions/2/index/_a_XBloomFilter_0.smy"))]]; ]]

[2014-09-01 15:04:19,366][WARN ][indices.cluster ] [Jennifer Lawrence] [fuelup-manifests][4] failed to start shard
org.elasticsearch.index.gateway.IndexShardGatewayRecoveryException: [fuelup-manifests][4] failed to fetch index version after copying it over
at org.elasticsearch.index.gateway.local.LocalIndexShardGateway.recover(LocalIndexShardGateway.java:152)
at org.elasticsearch.index.gateway.IndexShardGatewayService$1.run(IndexShardGatewayService.java:132)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.lucene.index.CorruptIndexException: [fuelup-manifests][4] Corrupted index [corrupted_jh_x1y9MS1SWKZefSFlVfQ] caused by: CorruptIndexException[codec footer mismatch: actual footer=1852273495 vs expected footer=-1071082520 (resource: NIOFSIndexInput(path="/esdata/fuelup-testing/nodes/0/indices/fuelup-manifests/4/index/_mi_XBloomFilter_0.smy"))]
at org.elasticsearch.index.store.Store.failIfCorrupted(Store.java:343)
at org.elasticsearch.index.store.Store.failIfCorrupted(Store.java:328)
at org.elasticsearch.index.gateway.local.LocalIndexShardGateway.recover(LocalIndexShardGateway.java:119)

... 4 more

mikemccand · 2014-09-01T17:31:24Z

Do you have a suggestions as to what the best approach is to debugging this?

It's important to get to the bottom of this, in case you're hitting a new bug here ...

Does this corruption reproduce on a brand new index created with 1.3.2? I can't tell from your responses so far whether you are carrying forward an index from before...

Can you attach the server logs from all nodes that led up to the above exception? It looks like a shard was migrated to another node, but then when that node tried to start the shard, it detected corruption. We need to isolate where that corruption happened, either during indexing or during replication.

Can you zip up the data directory from the node that this shard was replicated from and post somewhere, if you allowed to share? I can have a look and see if the corruption is present there.

Are you running any plugins, any external processes, etc.?

mikemccand · 2014-09-01T17:51:01Z

Also, which CPU, OS, JVM version are you using?

almer-t · 2014-09-01T18:09:49Z

Mike, thanks for your response, as for your questions:
A) It's on a brand new index (actually multiple indices) on 1.3.2, the index has custom mappings.
B) We're running on AWS (Ubuntu 14.04LTS). A shared Intel Xeon CPU E5-2650.
C) Java reports (Oracle Java 7):
java version "1.7.0_65"
Java(TM) SE Runtime Environment (build 1.7.0_65-b17)
Java HotSpot(TM) 64-Bit Server VM (build 24.65-b04, mixed mode)
D) Plugins are "cloud-aws" (2.3.0) and "head"

As for your other questions: I will look into what logs and index data I can share with you tomorrow. We will likely set-up a separate environment for replicating the problem. Naturally I'll share with you any "recipe" I can make for reproducing the problem.

mikemccand · 2014-09-01T21:04:12Z

Thanks @almer-t that's great you can reproduce from a clean start; hopefully this means we can iterate until we find the root cause. If you can narrow it down to a dedicated environment / particular set of documents / recipe that would be very helpful.

As far as I know that JVM version is safe.

Have you changed any settings from defaults (e.g. via settings API or in config/elasticsearch.yml)?

ghost · 2014-09-02T11:53:15Z

mike: I've haven't changed the defaults (apart from the number of master nodes, from 1->2, but the problem existed before this, so it's probably not the root cause).

Here's some log information that shows the first signs of trouble:

[2014-09-02 09:07:51,137][WARN ][index.engine.internal ] [Natalie Portman] [fuelup-manifests][3] failed engine [corrupt
file detected source: [recovery phase 1]]
[2014-09-02 09:07:51,154][WARN ][cluster.action.shard ] [Natalie Portman] [fuelup-manifests][3] sending failed shard fo
r [fuelup-manifests][3], node[3R-3HR1jQzOWsct8tIlO8w], [P], s[STARTED], indexUUID [xlRTKIXsSq6HO_BHWIA4Fg], reason [engine
failure, message [corrupt file detected source: [recovery phase 1]][RecoverFilesRecoveryException[[fuelup-manifests][3] Fai
led to transfer [0] files with total size of [0b]]; nested: CorruptIndexException[[fuelup-manifests][3] Corrupted index [corrupted_dXHcg1AkQtGuplAHa87jqw] caused by: CorruptIndexException[codec footer mismatch: actual footer=1852273495 vs expected footer=-1071082520 (resource: NIOFSIndexInput(path="/esdata/fuelup-testing/nodes/0/indices/fuelup-manifests/3/index/_b_XBloomFilter_0.smy"))]]; ]]
...
[2014-09-02 09:07:51,201][WARN ][indices.cluster ] [Natalie Portman] [fuelup-manifests][3] master [[Natalie Portman][3R-3HR1jQzOWsct8tIlO8w][ip-10-0-1-20][inet[/10.0.1.20:9300]]] marked shard as started, but shard has not been created, mark shard as failed
...
[2014-09-02 09:07:51,352][WARN ][indices.cluster ] [Natalie Portman] [fuelup-manifests][3] failed to start shard
org.elasticsearch.index.gateway.IndexShardGatewayRecoveryException: [fuelup-manifests][3] failed to fetch index version after copying it over
at org.elasticsearch.index.gateway.local.LocalIndexShardGateway.recover(LocalIndexShardGateway.java:152)
at org.elasticsearch.index.gateway.IndexShardGatewayService$1.run(IndexShardGatewayService.java:132)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.lucene.index.CorruptIndexException: [fuelup-manifests][3] Corrupted index [corrupted_dXHcg1AkQtGuplAHa87jqw] caused by: CorruptIndexException[codec footer mismatch: actual footer=1852273495 vs expected footer=-1071082520 (resource: NIOFSIndexInput(path="/esdata/fuelup-testing/nodes/0/indices/fuelup-manifests/3/index/_b_XBloomFilter_0.smy"))]
at org.elasticsearch.index.store.Store.failIfCorrupted(Store.java:343)
at org.elasticsearch.index.store.Store.failIfCorrupted(Store.java:328)
at org.elasticsearch.index.gateway.local.LocalIndexShardGateway.recover(LocalIndexShardGateway.java:119)

... 4 more

All except the first message then repeats (particularly the 2nd and the last)

ghost · 2014-09-02T13:10:43Z

We are currently using three machines. I split this into a 1/2 situation (by renaming the cluster and moving around some files). So I ended up with a cluster A with 1 machine and a cluster B with 2 machines. I have tried our read/write operations on both A and B and can not trivially reproduce the errors above. I have to say though that usually they occur "after a while", so we'll have to wait a bit.

I have found that when corruption does occur its detection can be triggered by:

creating a backup snapshot.
restarting a machine in the cluster.
changing the replication settings of a corrupted index.

Examing log files when corruption did occur (from yesterday when it was still a 3-machine cluster), it seems that this message repeats often, with the exact same values:
"CorruptIndexException[codec footer mismatch: actual footer=1852273495 vs expected footer=-1071082520"

Also this is a bit awkward:
"[RecoverFilesRecoveryException[[fuelup-manifests][3] Failed to transfer [0] files with total size of [0b]]"

A wild guess: can it be that the volume where the elasticsearch data is stored is (for a brief instant) disconnected and then reconnected, which may cause this to occur?

Mike: I am also sending you the complete log files (privately).

mikemccand · 2014-09-02T14:05:17Z

Thanks @almer-fuel

Can you enable CheckIndex on startup, by adding "index.shard.check_on_startup: true" to your config/elasticsearch.yml? Then restart a node and it should run CheckIndex on all its shards; it runs for a while and then suddenly prints out lots of output (from Lucene's CheckIndex tool)... I'm hoping we can isolate when the corruption happens.

Are you able to get the corruption to happen if you use 1 shard 0 replicas? Just trying to simplify the recipe as much as possible.

I would definitely not expect this kind of corruption if a volume disconnects/reconnects; only if the volume is buggy (flips/loses/adds bits) should you see this kind of corruption.

Does the corruption always happen in a *.smy file?

rmuir · 2014-09-02T14:21:17Z

This is the .smy file from pulsing codec. it doesnt have a checksum... its basically incompatible with elasticsearch (and was just an experiment). Pulsing codec is actually removed from lucene since this optimization has been in the default codec for years. I recommend turning it off.

ghost · 2014-09-02T14:41:34Z

Mike: checkindex set, restarting any of the machines does not yield a corrupted index (yet) as of now.

Mike & muir:
Hmmm, yes the corruption always happens in an smy file. All the corrupted indexes have at least a mapping where "postings_format" : "bloom_pulsing" is used. I hypothesized this could have been the problem in a mail to Mike.

rmuir:
Would you recommend in general leaving postings_format alone? (I can't seem to find it anymore in the current elasticsearch documentation). Also: if this can cause index corruption it should be disabled.

mikemccand · 2014-09-02T14:59:14Z

Hi @almer-fuel hmm I can't find your private email to me ...

The pulsing postings format is being removed from Lucene; what it does (folding terms with just one doc into the terms dict) is already being done by the default postings format. I would recommend just using the default ...

But, I don't think this can explain the original AIOOBE you hit; are you able to reproduce that exception?

ghost · 2014-09-02T15:11:11Z

Check your spam :) (if you can't find it, i'll resend).

No, it can't explain the AIOOBE, but we have not yet seen that problem recur (perhaps we should split this bug into two, one concerning the AIOOBE still unexplained, the other about the postings_format, I am changing our index definitions, so I should be able to confirm within several days that the problem does not re-occur).

budnik · 2014-09-04T08:43:41Z

We ecountered same issue multiple times. We were using 'postings_format" : "bloom_pulsing" for one of the fields. I hope switching back to default would help and thanks for helping me figuring it out.

mikemccand · 2014-09-04T15:37:17Z

Thanks @almer-fuel I found the email ...

Let's leave this issue open to dig into the AIOOBE?

ghost · 2014-09-04T17:38:10Z

@budnik @mikemccand Removing the bloom_pulsing helped restore our cluster to a stable state. So, it's confirmed to have been the cause of our (non-AIOOBE) problems.

@mikemccand Yes, we will keep an eye on whether the AIOOBE occurs again (and/or if we can reproduce it in a separate testing environment). If so, we will post our updates here. It could also be that the issues are somehow linked (the AIOOBE being a side effect of the bloom-pulsing issues), but I can't say that for sure. I'll keep an eye on it.

Thanks for the help! 🙏

deverton · 2014-10-01T03:30:09Z

We just rolled out a new cluster on 1.3.2 and got this after a restart.

[2014-10-01 11:24:19,381][WARN ][cluster.action.shard     ] [node1.example.com] [index-20130125][1] sending failed shard for [index-20130125][1], node[2LBNSJCMQByFpj0LgKZjJw], relocating [JT1xYUsPQkyEgLzDmBAvhA], [P], s[INITIALIZING], indexUUID [uNfol8JNQ_GZCMEzT07uvQ], reason [engine failure, message [corrupted preexisting index][CorruptIndexException[[reviews-20130125][1] Corrupted index [corrupted_yHFUTG8eQMWqLbRwRHtPaA] caused by: CorruptIndexException[codec footer mismatch: actual footer=1230204780 vs expected footer=-1071082520 (resource: MMapIndexInput(path="/opt/elasticsearch/data/elasticsearch_1_3/nodes/0/indices/index-20130125/1/index/_am.fdt"))]]]]

This was on a two node cluster and the index was created from scratch. There are no other indices in this cluster either. The index mapping is fairly simple and doesn't involve any changes to codecs or anything so I don't think this is bloom related.

s1monw · 2014-10-01T07:34:13Z

hmm the index name says reviews-20130125 so it seems like it's a pretty old index? You said it was newly created so I am a bit confused. Can you check if there is a file called corrupted_yHFUTG8eQMWqLbRwRHtPaA inside the index/shard directory and gist it's content?

deverton · 2014-10-01T09:12:32Z

Hah, sorry, my attempt at masking data didn't quite work out. While the index name indicates it's old, it's actually just the version of the mapping. It was replicated from a CouchDB instance as part of the deployment of 1.3.2.

I'll see if we still have corrupted_yHFUTG8eQMWqLbRwRHtPaA in the data directory. It might have been destroyed as we recreated the ES 1.3.2 instances just to be sure.

s1monw · 2014-10-01T12:52:39Z

I am closing this issue for now...

clintongormley assigned mikemccand Aug 12, 2014

ghost changed the title ~~Failed to Merge~~ Index Corruption Sep 2, 2014

mikemccand mentioned this issue Sep 3, 2014

Mapping: Remove pulsing/bloom_pulsing postings format #7566

Closed

mikemccand mentioned this issue Sep 5, 2014

Remove unsupported postings_format / doc_values_format #7604

Closed

s1monw closed this as completed Oct 1, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Index Corruption #7238

Index Corruption #7238

ghost commented Aug 12, 2014

mikemccand commented Aug 12, 2014

ghost commented Aug 14, 2014

mikemccand commented Aug 14, 2014

ghost commented Sep 1, 2014

mikemccand commented Sep 1, 2014

mikemccand commented Sep 1, 2014

almer-t commented Sep 1, 2014

mikemccand commented Sep 1, 2014

ghost commented Sep 2, 2014

ghost commented Sep 2, 2014

mikemccand commented Sep 2, 2014

rmuir commented Sep 2, 2014

ghost commented Sep 2, 2014

mikemccand commented Sep 2, 2014

ghost commented Sep 2, 2014

budnik commented Sep 4, 2014

mikemccand commented Sep 4, 2014

ghost commented Sep 4, 2014

deverton commented Oct 1, 2014

s1monw commented Oct 1, 2014

deverton commented Oct 1, 2014

s1monw commented Oct 1, 2014

Index Corruption #7238

Index Corruption #7238

Comments

ghost commented Aug 12, 2014

Log File Excerpt:

mikemccand commented Aug 12, 2014

ghost commented Aug 14, 2014

mikemccand commented Aug 14, 2014

ghost commented Sep 1, 2014

Some examples:

... 4 more

mikemccand commented Sep 1, 2014

mikemccand commented Sep 1, 2014

almer-t commented Sep 1, 2014

mikemccand commented Sep 1, 2014

ghost commented Sep 2, 2014

Here's some log information that shows the first signs of trouble:

... 4 more

ghost commented Sep 2, 2014

mikemccand commented Sep 2, 2014

rmuir commented Sep 2, 2014

ghost commented Sep 2, 2014

mikemccand commented Sep 2, 2014

ghost commented Sep 2, 2014

budnik commented Sep 4, 2014

mikemccand commented Sep 4, 2014

ghost commented Sep 4, 2014

deverton commented Oct 1, 2014

s1monw commented Oct 1, 2014

deverton commented Oct 1, 2014

s1monw commented Oct 1, 2014