-
Notifications
You must be signed in to change notification settings - Fork 136
IGNITE-17056 Design for rebalance cancellation #1676
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
04bbc45 to
aa82437
Compare
aa82437 to
a367daf
Compare
| - On every new PD leader elected - it must check the direct value (not the locally cached one) of `zoneId.assignment.pending` keys and send RebalanceRequest to needed PrimaryReplicas and then listen updates from the last revision. | ||
| - On every PrimaryReplica reelection by PD it must send the new RebalanceRequest to PrimaryReplica, if pending key is not empty. | ||
| - On every leader reelection (for the leader oriented protocols) inside the replication group - leader send leaderElected event to PrimaryReplica, which force PrimaryReplica to send RebalanceRequest to the replication group leader again. | ||
| - On every new PD leader elected - it must check the direct value (not the locally cached one) of `zoneId.assignment.pending`/`zondeId.assignment.cancel` (the last one always wins, if exists) keys and send `RebalanceRequest`/`CancelRebalanceRequest` to needed PrimaryReplicas and then listen updates from the last revision of this key. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do these requests contain revision? Only old and new topology is mentioned, as I can see
| When PrimaryReplica send `CancelRebalanceRequest(oldTopology, newTopology)` to the ReplicationGroup following cases available: | ||
| - Replication group has ongoing rebalance oldToplogy->newTopology. So, it must be cancelled and cleanup for the configuration state of replication group to oldTopology must be executed. | ||
| - Replication group has no ongoing rebalance and currentTopology==oldTopology. So, nothing to cancel, return success response. | ||
| - Replication group has no ongoing rebalance and currentTopology==newTopology. So, cancel request can't be executed, return the response about it. Result recipient of this response (placement driver) must log this fact and do the same routine for usual rebalanceDone. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if after sending CancelRebalanceRequest the placement driver finds out that some of replication groups have finished rebalance (currentTopology==newTopology) and some have not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does it possible, if any rebalance touches only one distribution zone, so, only one replication group?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems I missed this moment, my fault. Question withdrawn.
Co-authored-by: Denis Chudov <moonglloom@gmail.com>
Co-authored-by: Denis Chudov <moonglloom@gmail.com>
…pache#1676 Signed-off-by: Slava Koptilin <slava.koptilin@gmail.com>
…pache#1676 Signed-off-by: Slava Koptilin <slava.koptilin@gmail.com>
https://issues.apache.org/jira/browse/IGNITE-17056