Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fail recovery if retry recovery if resetRecovery fails #11149

Merged
merged 1 commit into from May 13, 2015

Conversation

s1monw
Copy link
Contributor

@s1monw s1monw commented May 13, 2015

This might fail if the shard is closed for instance. This will leak
a shard lock causing the shard being locked on this node forever.

  1> [2015-05-13 16:08:52,118][DEBUG][indices.recovery         ] [node_s1] unexpected error during recovery, but recovery id [420] is finished
  1> [test_index2][0] CurrentState[CLOSED] Shard not in recovering state
  1>    at org.elasticsearch.index.shard.IndexShard.performRecoveryRestart(IndexShard.java:870)
  1>    at org.elasticsearch.indices.recovery.RecoveryStatus.resetRecovery(RecoveryStatus.java:233)
  1>    at org.elasticsearch.indices.recovery.RecoveryTarget.retryRecovery(RecoveryTarget.java:151)
  1>    at org.elasticsearch.indices.recovery.RecoveryTarget.doRecovery(RecoveryTarget.java:237)
  1>    at org.elasticsearch.indices.recovery.RecoveryTarget.access$700(RecoveryTarget.java:72)
  1>    at org.elasticsearch.indices.recovery.RecoveryTarget$RecoveryRunner.doRun(RecoveryTarget.java:462)
  1>    at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
  1>    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
  1>    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
  1>    at java.lang.Thread.run(Thread.java:745)

followed by

  1> [2015-05-13 16:08:52,123][DEBUG][indices                  ] [node_s1] [test_index2] failed to delete index store - at least one shards is still locked
  1> org.apache.lucene.store.LockObtainFailedException: Can't lock shard [test_index2][0], timed out after 0ms
  1>    at org.elasticsearch.env.NodeEnvironment$InternalShardLock.acquire(NodeEnvironment.java:520)
  1>    at org.elasticsearch.env.NodeEnvironment.shardLock(NodeEnvironment.java:448)
  1>    at org.elasticsearch.env.NodeEnvironment.lockAllForIndex(NodeEnvironment.java:392)
  1>    at org.elasticsearch.env.NodeEnvironment.deleteIndexDirectorySafe(NodeEnvironment.java:342)
  1>    at org.elasticsearch.indices.IndicesService.deleteIndexStore(IndicesService.java:496)
  1>    at org.elasticsearch.indices.IndicesService.removeIndex(IndicesService.java:403)
  1>    at org.elasticsearch.indices.IndicesService.deleteIndex(IndicesService.java:445)
  1>    at org.elasticsearch.indices.cluster.IndicesClusterStateService.deleteIndex(IndicesClusterStateService.java:844)
  1>    at org.elasticsearch.indices.cluster.IndicesClusterStateService.applyDeletedIndices(IndicesClusterStateService.java:243)
  1>    at org.elasticsearch.indices.cluster.IndicesClusterStateService.clusterChanged(IndicesClusterStateService.java:180)
  1>    at org.elasticsearch.cluster.service.InternalClusterService$UpdateTask.run(InternalClusterService.java:489)
  1>    at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:188)
  1>    at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:158)
  1>    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
  1>    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
  1>    at java.lang.Thread.run(Thread.java:745)

This might fail if the shard is closed for instance. This will leak
a shard lock causing the shard being locked on this node forever.
@dakrone
Copy link
Member

dakrone commented May 13, 2015

LGTM

@s1monw s1monw merged commit 6a43fe3 into elastic:master May 13, 2015
@s1monw s1monw deleted the fail_recovery_on_retry_error branch May 13, 2015 14:57
@clintongormley clintongormley added the :Distributed/Recovery Anything around constructing a new shard, either from a local or a remote source. label May 29, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Distributed/Recovery Anything around constructing a new shard, either from a local or a remote source. v1.5.3 v1.6.0 v2.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants