New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Several recoveries cause IndexShardGatewayRecoveryException #8340
Comments
permission problem - I don't think so, since it succeeds most of the times. |
@asafc64 could you tell us how your test is structured? What sort of cleanup are you doing before/after the test? In particular, do you close the index or delete it? How do you perform these operations? Do you wait for their completion? What operations are performed in the test preceding the test that fails? |
When the environment is loaded, its creates a snap shot, and before every test it's being restored.
By the way, |
@asafc64 on the step 3 when shards are reported done there is still some additional operations that should take place before the restore process is really over. So, it's not very reliable method of checking restore status in tests. Opening index after restore is not needed, because restore will open automatically once restore is complete. I would recommend the following approach
|
@imotov is there some additional status that we should report in recovery status to indicate that the process isn't quite complete? |
@clintongormley, I am not sure what we can/should do here since the recoveries are reported on the shard level. I think there is a couple of concurrency-related things going on there. First of all there is slight delay in shard status propagation through the cluster between the moment when a shard is done recovering and the moment when a cluster knows that it got started. There is also a global restore clean up stage that is performed once all shards are restored. So, when all shards are restored the restore process is still technically running. I think the recovery API is good to monitor the recovery/restore progress of shards, but it's not a good method to check if restore is done or not globally. |
@imotov fair enough. Plus the docs already tell you how to monitor a restore: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-snapshots.html#_monitoring_snapshot_restore_progress |
…or succesfully restored shards to get started This commit ensures that restore operation with wait_for_completion=true doesn't return until all successfully restored shards are started. Before it was returning as soon as restore operation was over, which cause some shards to be unavailable immediately after restore completion. Fixes elastic#8340
…or succesfully restored shards to get started This commit ensures that restore operation with wait_for_completion=true doesn't return until all successfully restored shards are started. Before it was returning as soon as restore operation was over, which cause some shards to be unavailable immediately after restore completion. Fixes #8340
…or succesfully restored shards to get started This commit ensures that restore operation with wait_for_completion=true doesn't return until all successfully restored shards are started. Before it was returning as soon as restore operation was over, which cause some shards to be unavailable immediately after restore completion. Fixes #8340
…or succesfully restored shards to get started This commit ensures that restore operation with wait_for_completion=true doesn't return until all successfully restored shards are started. Before it was returning as soon as restore operation was over, which cause some shards to be unavailable immediately after restore completion. Fixes #8340
…or succesfully restored shards to get started This commit ensures that restore operation with wait_for_completion=true doesn't return until all successfully restored shards are started. Before it was returning as soon as restore operation was over, which cause some shards to be unavailable immediately after restore completion. Fixes elastic#8340
…or succesfully restored shards to get started This commit ensures that restore operation with wait_for_completion=true doesn't return until all successfully restored shards are started. Before it was returning as soon as restore operation was over, which cause some shards to be unavailable immediately after restore completion. Fixes elastic#8340
I have tests environment that restore the index from snapshot before every test.
After a few successful restorings, it fails on:
[2014-11-03 16:03:54,957][WARN ][indices.cluster ] [Baron Von Blitzschlag] [qs_rm_3][0] failed to start shard
org.elasticsearch.index.gateway.IndexShardGatewayRecoveryException: [qs_rm_3][0] failed recovery
at org.elasticsearch.index.gateway.IndexShardGatewayService$1.run(IndexShardGatewayService.java:185)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Caused by: org.elasticsearch.index.snapshots.IndexShardRestoreFailedException: [qs_rm_3][0] restore failed
at org.elasticsearch.index.snapshots.IndexShardSnapshotAndRestoreService.restore(IndexShardSnapshotAndRestoreService.java:130)
at org.elasticsearch.index.gateway.IndexShardGatewayService$1.run(IndexShardGatewayService.java:127)
... 3 more
Caused by: org.elasticsearch.index.snapshots.IndexShardRestoreFailedException: [qs_rm_3][0] failed to restore snapshot [qs_rm_alias]
at org.elasticsearch.index.snapshots.blobstore.BlobStoreIndexShardRepository.restore(BlobStoreIndexShardRepository.java:159)
at org.elasticsearch.index.snapshots.IndexShardSnapshotAndRestoreService.restore(IndexShardSnapshotAndRestoreService.java:124)
... 4 more
Caused by: org.elasticsearch.index.snapshots.IndexShardRestoreFailedException: [qs_rm_3][0] Failed to recover index
at org.elasticsearch.index.snapshots.blobstore.BlobStoreIndexShardRepository$RestoreContext.restore(BlobStoreIndexShardRepository.java:840)
at org.elasticsearch.index.snapshots.blobstore.BlobStoreIndexShardRepository.restore(BlobStoreIndexShardRepository.java:156)
... 5 more
Caused by: java.io.FileNotFoundException: C:\TestResults\QuickSearch\data\elasticsearch\nodes\0\indices\qs_rm_3\0\index_8.si (Access is denied)
at java.io.FileOutputStream.open(Native Method)
at java.io.FileOutputStream.(Unknown Source)
at java.io.FileOutputStream.(Unknown Source)
at org.apache.lucene.store.FSDirectory$FSIndexOutput.(FSDirectory.java:389)
at org.apache.lucene.store.FSDirectory.createOutput(FSDirectory.java:282)
at org.apache.lucene.store.RateLimitedFSDirectory.createOutput(RateLimitedFSDirectory.java:40)
at org.elasticsearch.index.store.DistributorDirectory.createOutput(DistributorDirectory.java:118)
at org.apache.lucene.store.FilterDirectory.createOutput(FilterDirectory.java:69)
at org.elasticsearch.index.store.Store.createVerifyingOutput(Store.java:298)
at org.elasticsearch.index.snapshots.blobstore.BlobStoreIndexShardRepository$RestoreContext.restoreFile(BlobStoreIndexShardRepository.java:887)
at org.elasticsearch.index.snapshots.blobstore.BlobStoreIndexShardRepository$RestoreContext.restore(BlobStoreIndexShardRepository.java:830)
... 6 more
The text was updated successfully, but these errors were encountered: