New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Delete shard content under lock #9083
Conversation
} | ||
logger.trace("deleted shard {} directory, paths: [{}]", shardId, paths); | ||
assert FileSystemUtils.exists(paths) == false; | ||
IOUtils.rm(paths); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this redundant since the exists check just returned false?
LGTM, but my review was cursory as I am not very familiar with this code. |
b540fa3
to
40a1c28
Compare
} catch (Throwable e) { | ||
logger.debug("[{}] failed to close translog", e, shardId); | ||
// ignore | ||
logger.warn("[{}] failed to close store on shard deletion", e, shardId); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Closing the store doesn't necessarily mean the shard is being deleted, I tested this and this codepath can happen when the index is closed, so I think this should be "failed to close store on shard closing"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or "failed to close store on shard removal"
Left some comments, also with this change the |
40a1c28
to
20bdfa3
Compare
yeah I removed it now :) I actually missed that! |
@dakrone I pushed a new commits |
LGTM |
* @throws IOException if an IOException occurs | ||
*/ | ||
public void deleteShardDirectorySafe(ShardLock lock, @IndexSettings Settings indexSettings) throws IOException { | ||
assert indexSettings != ImmutableSettings.EMPTY; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we check and except if the lock is not acquired? Maybe also rename the method name to deleteShardDirectoryUnderLock, not sure. We should also document the fact that the shard is expected to be locked.
I like it. Much simpler. I think we can remove the timeout on delete introduced temporarily in #9009, but if we do so, we need to try to delete all the shard folders that are not locked, instead of trying to acquire all locks and then delete them all together (or not). This is needed for shards that were just relocated away (and their locks / in memory registration released) but not yet deleted from disk. |
I applied changes to your comments. Can you take another look?
can we do this in a different change? |
LGTM. thx.
I'll do it. OK. |
Once we delete the the index on a node we are closing all resources and subsequently need to delete all shards contents from disk. Yet this happens today under a lock (the shard lock) that needs to be acquried in order to execute any operation on the shards data path. We try to delete all the index meta-data once we acquired all the shard lock but this operation can run into a timeout which causes the index to remain on disk. Further, all shard data will be left on disk if the timeout is reached. This commit removes all the shards data just before the shard lock is release as the last operation on a shard that belongs to a deleted index.
b290aad
to
7ec8973
Compare
Once we delete the the index on a node we are closing all resources
and subsequently need to delete all shards contents from disk. Yet
this happens today under a lock (the shard lock) that needs to be
acquried in order to execute any operation on the shards data
path. We try to delete all the index meta-data once we acquired
all the shard lock but this operation can run into a timeout which causes
the index to remain on disk. Further, all shard data will be left on
disk if the timeout is reached.
This commit removes all the shards data just before the shard lock
is release as the last operation on a shard that belongs to a deleted
index.
supersedes #8608 & relates to #9009