-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FLINK-30765][runtime] Aligns the LeaderElectionService.stop() contract #21742
Conversation
...time/src/main/java/org/apache/flink/runtime/leaderelection/DefaultLeaderElectionService.java
Outdated
Show resolved
Hide resolved
6266fdf
to
1768d4f
Compare
...time/src/main/java/org/apache/flink/runtime/leaderelection/DefaultLeaderElectionService.java
Outdated
Show resolved
Hide resolved
// Clear the old leader information on the external storage | ||
leaderElectionDriver.writeLeaderInformation(LeaderInformation.empty()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is it fine to remove this? The test states that this is done because leadership was already lost.
Does that imply that from the perspective of a TM there is always a leader (after leadership was acquired at least once), because this information is never cleared but only over-written?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
onRevokeLeadership()
is called after the leadership is lost. All LeaderElectionDriver
implementations do a check before writing data to the backend which prevents any writes if the leadership is lost:
- ZooKeeperLeaderElectionDriver:198 (but we even ignore empty leader information in ZooKeeperLeaderElectionDriver:190). Same code is available for the ZooKeeperMultipleComponentLeaderElectionDriver.java:154
- KubernetesLeaderElectionDriver:139
- KubernetesMultipleComponentLeaderElectionDriver:169
One could argue that the leadership could be regained while the operation isn't omitted, yet. Then, we would miss writing the empty leader information. But in that case, leadership will be renegotiated anyway which will trigger a write operation with the updated leader information soon after.
…ice. Signed-off-by: Matthias Pohl <matthias.pohl@aiven.io>
…stop DefaultLeaderElectionService.stop will now behave in the same way as if the leadership loss was triggered by the HA backend (iff the LeaderElectionService instance has the leadership acquired prior to the stop() call).
Thanks for the review: I squashed the commits and rebased the branch (which I realized I shouldn't do to keep the diff intact in the PR -.-). |
PR order:
What is the purpose of the change
This PR is about hardening the
LeaderElectionService.stop()
contract.The current implementations of LeaderElectionService do not implement the stop() call consistently. Some (e.g. StandaloneLeaderElectionService call revoke on the LeaderContender) whereas others don't (e.g. DefaultLeaderElectionService). The MultipleComponentLeaderElectionService does call revoke on the LeaderContender instances, though.
This PR makes the contract of the
LeaderElectionService.stop
call more consistent, i.e. aligning it more with what's happening if the leadership is revoked by the HA backend. The contract change reduces the assumptions that are made by theLeaderElectionService
implementations (e.g. that theLeaderContender
owns theLeaderElectionService
and is, therefore, responsible for its lifecycle) which losens the coupling between the two components.This should reduce noise when going ahead with refactoring the interfaces (FLIP-285).
Brief change log
LeaderElectionService.stop()
to specify the contractLeaderElectionService.stop()
implementations for the case where the individual instance still have the leadership acquiredVerifying this change
The
LeaderContender.revokeLeadership()
call was also added toTestingLeaderElectionService
to make each test rely on this contract.Does this pull request potentially affect one of the following parts:
@Public(Evolving)
: noDocumentation