Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resiliency: Cancelling a recovery may leave temporary files behind #7893

Closed
bleskes opened this issue Sep 26, 2014 · 1 comment
Closed

Resiliency: Cancelling a recovery may leave temporary files behind #7893

bleskes opened this issue Sep 26, 2014 · 1 comment

Comments

@bleskes
Copy link
Contributor

bleskes commented Sep 26, 2014

We currently cancel recoveries when the shard is no longer assigned to the target node, or the primary shard (source of copying) is moved to another node (and there are more scenarios). That cancel logic doesn't clean up any temporary files created during the recovery.

Normally that's not a problem as the files will be cleaned up once the shard is safely recovered somewhere else (or locally). However, if one runs into continuous failure cycles we can fill up disk space, causing bigger problems like corrupting other shards on the node.

@clintongormley clintongormley changed the title Recovery: cancelling a recovery may leave temporary files behind Resiliency: Cancelling a recovery may leave temporary files behind Sep 26, 2014
bleskes added a commit to bleskes/elasticsearch that referenced this issue Oct 7, 2014
At the moment, we leave around temporary files if a peer (replica) recovery is canceled. Those files will normally be cleaned up once the shard is started else but in case of errors this can lead to trouble. If recovery are started and canceled often, we may cause nodes to run out of disk space.

Closes elastic#7893
bleskes added a commit to bleskes/elasticsearch that referenced this issue Oct 14, 2014
At the moment, we leave around temporary files if a peer (replica) recovery is canceled. Those files will normally be cleaned up once the shard is started else but in case of errors this can lead to trouble. If recovery are started and canceled often, we may cause nodes to run out of disk space.

Closes elastic#7893
@bleskes bleskes added v1.5.0 and removed v1.4.0 labels Nov 3, 2014
@bleskes
Copy link
Contributor Author

bleskes commented Nov 3, 2014

fixed with #8092

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants