New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce shard level locks to prevent concurrent shard modifications #8436
Conversation
* @param index the index to delete | ||
* @throws Exception if any of the shards data directories can't be locked or deleted | ||
*/ | ||
public void deleteIndexDirecotrySafe(Index index) throws Exception { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Directory" is misspelled as "Direcotry" here
pushed more changes @dakrone |
@s1monw quick question regarding the path of the lock file, does it make sense to have it encapsulated in the shard location itself (similar to write.lock in Lucene)? I think its nicer since then there is a single place that holds the shard data. Update: I think I get why its a different directory, the ability to delete while holding the lock, if thats the case, then it makes sense. On crappy internet, will try and complete the review later tonight.... |
@@ -246,7 +245,7 @@ public void clusterChanged(ClusterChangedEvent event) { | |||
logger.debug("[{}] deleting index that is no longer part of the metadata (indices: [{}])", current.index(), newMetaData.indices().keys()); | |||
if (nodeEnv.hasNodeFile()) { | |||
try { | |||
IOUtils.rm(FileSystemUtils.toPaths(nodeEnv.indexLocations(new Index(current.index())))); | |||
nodeEnv.deleteIndexDirectorySafe(new Index(current.index())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we warn on this failure now? because of we have a bug and we can't obtain the lock, we won't delete anything, which is a different semantics than deleting the files and letting the OS handle dangling open file handles? This in theory should not happen because a lock is there
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah I made it a warn... btw. I think it's a preexisting but ie on windows you might have this behaviour already since somebody can hold a ref to the files preventing the deletion. I wonder if we should actually schedule a deletion task or use a list of pending deletions that are executed before we apply new clusterstates and before we import dangeling indeices?
@Override | ||
public void onClose(ShardId shardId) { | ||
assertFalse(called.get()); | ||
called.set(true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be assertTrue(called.compareAndSet(false, true))
to avoid the race condition between the assert and the set.
Left lots of comments, @s1monw does it make sense to have |
thanks for the comments - I think I addressed them all... Regarding |
synchronized (shardLocks) { | ||
assert waitCount > 0 : "waitCount is " + waitCount + " but should be > 0"; | ||
if (--waitCount == 0) { | ||
shardLocks.remove(shardId); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to assert that .remove
didn't return null here?
I fixed all comments - I think it's ready.. refactorings can happen later this really goes out of date quickly... |
} catch (Exception ex) { | ||
logger.debug("[{}] failed to delete index", ex, current.index()); | ||
logger.warn("[{}] failed to delete index", ex, current.index()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have second thoughts about this warn - it might be "normal" in the case that an ongoing recovery is holding a reference to the store while the index is deleted. The new recovery refcount and locking this is expected behavior. Should we log debug if LockObtainFailedException is caught (expected) and only do warn o.w.?
Left two comments about log levels. LGTM other wise. Agreed we need to push it and continue from here. |
LGTM! |
LGTM, I love the move to in memory locks. |
…ications Today it's possible that the data directory for a single shard is used by more than on IndexShard->Store instances. While one shard is already closed but has a concurrent recovery running and a new shard is creating it's engine files can conflict and data can potentially be lost. We also remove shards data without checking if there are still users of the files or if files are still open which can cause pending writes / flushes or the delete operation to fail. If the latter is the case the index might be treated as a dangeling index and is brought back to life at a later point in time. This commit introduces a shard level lock that prevents modifications to the shard data while it's still in use. Locks are created per shard and maintined in NodeEnvironment.java. In contrast to most java concurrency primitives those locks are not reentrant. This commit also adds infrastructure that checks if all shard locks are released after tests.
thanks everybody for the intense & time consuming reviews it's a pretty low level change and lots of places are involved. |
elastic#8436 has introduced shard level locks in order to prevent directory reuse during fast deletion & creation of indices. As part for the change, close listeners were introduced to delete the folders once all out standing references were released. The new change has created race conditions causing shard folders not to be deleted (causing test failures due to left over corruption markers). This commit removes the listeners in favor of a simple timeout based solution to be use until a better listener based solution is ready ( elastic#8608 ).
elastic#8436 has introduced shard level locks in order to prevent directory reuse during fast deletion & creation of indices. As part for the change, close listeners were introduced to delete the folders once all out standing references were released. The new change has created race conditions causing shard folders not to be deleted (causing test failures due to left over corruption markers). This commit removes the listeners in favour of a simple timeout based solution to be use until a better listener based solution is ready ( elastic#8608 ). Closes elastic#9009
Today it's possible that the data directory for a single shard is used by more than on
IndexShard->Store instances. While one shard is already closed but has a concurrent recovery
running and a new shard is creating it's engine files can conflict and data can potentially
be lost. We also remove shards data without checking if there are still users of the files
or if files are still open which can cause pending writes / flushes or the delete operation
to fail. If the latter is the case the index might be treated as a dangeling index and is brought
back to life at a later point in time.
This commit introduces a shard level lock that prevents modifications to the shard data
while it's still in use. Locks are created per shard and maintained in NodeEnvironment.java.
In contrast to most java concurrency primitives those locks are not reentrant.
This commit also adds infrastructure that checks if all shard locks are released after tests.