New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FLINK-11850][zk] Tolerate concurrent child deletions when deleting owned zNode #7928
Conversation
Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community Review Progress
Please see the Pull Request Review Guide for a full explanation of the review process. The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required Bot commandsThe @flinkbot bot supports the following commands:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me :-)
Do you have a reference for the supposed behavior (failing if a child is deleted) that is not caused by The only related issue I could only find is https://issues.apache.org/jira/browse/CURATOR-430. |
I think you are right @zentol. Our curator version |
@tillrohrmann isn't it an option that we bump our shaded curator version? |
@tisonkun I wouldn't do this so close to the actual release. Rather, I prefer to do this after the 1.8 release to give it a bit more exposure. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1 minor comment.
@flinkbot approve all
client.delete().deletingChildrenIfNeeded().forPath("/"); | ||
zNodeDeleted = true; | ||
} catch (KeeperException.NoNodeException ignored) { | ||
// concurrent delete operation. Try again. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we could log this on debug just in case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True, will add it.
…ission This commit changes the cleanup logic of the Dispatcher to only clean up job HA files if the job is not a duplicate (meaning that it is either running or has already been executed by the same JobMaster). This closes apache#7918.
…nd MiniCluster The io executor is responsible for running io operations like discarding checkpoints. By using the io executor, we don't risk that the RpcService is blocked by blocking io operations. This closes apache#7924.
…wned zNode When calling ZooKeeperHaServices#closeAndCleanupAllData it can happen that a child of the owned zNode of the ZooKeeperHaServices is being concurrently deleted (e.g. a LeaderElectionService has been shut down). In order to tolerate concurrent deletions, we use now ZKPaths#deleteChildren. This closes apache#7928.
f255af3
to
b464df2
Compare
…wned zNode When calling ZooKeeperHaServices#closeAndCleanupAllData it can happen that a child of the owned zNode of the ZooKeeperHaServices is being concurrently deleted (e.g. a LeaderElectionService has been shut down). In order to tolerate concurrent deletions, we use now ZKPaths#deleteChildren. This closes apache#7928.
What is the purpose of the change
When calling ZooKeeperHaServices#closeAndCleanupAllData it can happen that a child of the owned
zNode of the ZooKeeperHaServices is being concurrently deleted (e.g. a LeaderElectionService has
been shut down). In order to tolerate concurrent deletions, we use now ZKPaths#deleteChildren.
Verifying this change
ZooKeeperHaServicesTest
Does this pull request potentially affect one of the following parts:
@Public(Evolving)
: (no)Documentation