Delete shard content under lock #9083

s1monw · 2014-12-29T15:17:19Z

Once we delete the the index on a node we are closing all resources
and subsequently need to delete all shards contents from disk. Yet
this happens today under a lock (the shard lock) that needs to be
acquried in order to execute any operation on the shards data
path. We try to delete all the index meta-data once we acquired
all the shard lock but this operation can run into a timeout which causes
the index to remain on disk. Further, all shard data will be left on
disk if the timeout is reached.

This commit removes all the shards data just before the shard lock
is release as the last operation on a shard that belongs to a deleted
index.

supersedes #8608 & relates to #9009

rjernst · 2014-12-30T01:32:00Z

src/main/java/org/elasticsearch/env/NodeEnvironment.java

        }
+        logger.trace("deleted shard {} directory, paths: [{}]", shardId, paths);
+        assert FileSystemUtils.exists(paths) == false;
+        IOUtils.rm(paths);


Isn't this redundant since the exists check just returned false?

rjernst · 2014-12-30T04:08:41Z

LGTM, but my review was cursory as I am not very familiar with this code.

dakrone · 2015-01-02T10:29:36Z

src/main/java/org/elasticsearch/index/IndexService.java

            } catch (Throwable e) {
-                logger.debug("[{}] failed to close translog", e, shardId);
-                // ignore
+                logger.warn("[{}] failed to close store on shard deletion", e, shardId);


Closing the store doesn't necessarily mean the shard is being deleted, I tested this and this codepath can happen when the index is closed, so I think this should be "failed to close store on shard closing"

or "failed to close store on shard removal"

dakrone · 2015-01-02T10:51:02Z

Left some comments, also with this change the CloseableComponent class can be removed entirely

s1monw · 2015-01-02T13:40:25Z

Left some comments, also with this change the CloseableComponent class can be removed entirely

yeah I removed it now :) I actually missed that!

s1monw · 2015-01-02T13:42:45Z

@dakrone I pushed a new commits

dakrone · 2015-01-02T14:45:39Z

LGTM

bleskes · 2015-01-05T13:28:23Z

src/main/java/org/elasticsearch/env/NodeEnvironment.java

+     * @throws IOException if an IOException occurs
+     */
+    public void deleteShardDirectorySafe(ShardLock lock, @IndexSettings Settings indexSettings) throws IOException {
+        assert indexSettings != ImmutableSettings.EMPTY;


can we check and except if the lock is not acquired? Maybe also rename the method name to deleteShardDirectoryUnderLock, not sure. We should also document the fact that the shard is expected to be locked.

bleskes · 2015-01-05T14:31:45Z

I like it. Much simpler. I think we can remove the timeout on delete introduced temporarily in #9009, but if we do so, we need to try to delete all the shard folders that are not locked, instead of trying to acquire all locks and then delete them all together (or not). This is needed for shards that were just relocated away (and their locks / in memory registration released) but not yet deleted from disk.

s1monw · 2015-01-06T16:51:47Z

I applied changes to your comments. Can you take another look?

I like it. Much simpler. I think we can remove the timeout on delete introduced temporarily in #9009, but if we do so, we need to try to delete all the shard folders that are not locked, instead of trying to acquire all locks and then delete them all together (or not). This is needed for shards that were just relocated away (and their locks / in memory registration released) but not yet deleted from disk.

can we do this in a different change?

bleskes · 2015-01-06T20:22:21Z

LGTM. thx.

can we do this in a different change?

I'll do it. OK.

Once we delete the the index on a node we are closing all resources and subsequently need to delete all shards contents from disk. Yet this happens today under a lock (the shard lock) that needs to be acquried in order to execute any operation on the shards data path. We try to delete all the index meta-data once we acquired all the shard lock but this operation can run into a timeout which causes the index to remain on disk. Further, all shard data will be left on disk if the timeout is reached. This commit removes all the shards data just before the shard lock is release as the last operation on a shard that belongs to a deleted index.

s1monw added v2.0.0-beta1 v1.5.0 >enhancement labels Dec 29, 2014

s1monw mentioned this pull request Dec 29, 2014

[CORE] Wait for pending shard removal on IndexService close #8608

Closed

s1monw added the review label Dec 29, 2014

rjernst reviewed Dec 30, 2014
View reviewed changes

s1monw force-pushed the remove_shard_data_on_close branch from b540fa3 to 40a1c28 Compare December 30, 2014 15:48

dakrone reviewed Jan 2, 2015
View reviewed changes

s1monw force-pushed the remove_shard_data_on_close branch from 40a1c28 to 20bdfa3 Compare January 2, 2015 13:39

bleskes reviewed Jan 5, 2015
View reviewed changes

s1monw force-pushed the remove_shard_data_on_close branch from b290aad to 7ec8973 Compare January 6, 2015 21:15

s1monw merged commit 7ec8973 into elastic:master Jan 6, 2015

clintongormley added the resiliency label Jan 22, 2015

clintongormley added :Core/Infra/Core Core issues without another label and removed review labels Mar 19, 2015

clintongormley changed the title ~~[CORE] Delete shard content under lock~~ Delete shard content under lock Jun 6, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Delete shard content under lock #9083

Delete shard content under lock #9083

s1monw commented Dec 29, 2014

rjernst Dec 30, 2014

rjernst commented Dec 30, 2014

dakrone Jan 2, 2015

dakrone Jan 2, 2015

dakrone commented Jan 2, 2015

s1monw commented Jan 2, 2015

s1monw commented Jan 2, 2015

dakrone commented Jan 2, 2015

bleskes Jan 5, 2015

bleskes commented Jan 5, 2015

s1monw commented Jan 6, 2015

bleskes commented Jan 6, 2015

Delete shard content under lock #9083

Delete shard content under lock #9083

Conversation

s1monw commented Dec 29, 2014

rjernst Dec 30, 2014

Choose a reason for hiding this comment

rjernst commented Dec 30, 2014

dakrone Jan 2, 2015

Choose a reason for hiding this comment

dakrone Jan 2, 2015

Choose a reason for hiding this comment

dakrone commented Jan 2, 2015

s1monw commented Jan 2, 2015

s1monw commented Jan 2, 2015

dakrone commented Jan 2, 2015

bleskes Jan 5, 2015

Choose a reason for hiding this comment

bleskes commented Jan 5, 2015

s1monw commented Jan 6, 2015

bleskes commented Jan 6, 2015