IGNITE-17056 Design for rebalance cancellation #1676

kgusakov · 2023-02-15T13:37:26Z

https://issues.apache.org/jira/browse/IGNITE-17056

modules/distribution-zones/tech-notes/rebalance.md

denis-chudov · 2023-02-20T09:58:29Z

modules/distribution-zones/tech-notes/rebalance.md

- On every new PD leader elected - it must check the direct value (not the locally cached one) of `zoneId.assignment.pending` keys and send RebalanceRequest to needed PrimaryReplicas and then listen updates from the last revision.
- On every PrimaryReplica reelection by PD it must send the new RebalanceRequest to PrimaryReplica, if pending key is not empty. 
- On every leader reelection (for the leader oriented protocols) inside the replication group - leader send leaderElected event to PrimaryReplica, which force PrimaryReplica to send RebalanceRequest to the replication group leader again.
+- On every new PD leader elected - it must check the direct value (not the locally cached one) of `zoneId.assignment.pending`/`zondeId.assignment.cancel` (the last one always wins, if exists) keys and send `RebalanceRequest`/`CancelRebalanceRequest` to needed PrimaryReplicas and then listen updates from the last revision of this key.


Do these requests contain revision? Only old and new topology is mentioned, as I can see

modules/distribution-zones/tech-notes/rebalance.md

denis-chudov · 2023-02-20T10:04:33Z

modules/distribution-zones/tech-notes/rebalance.md

+When PrimaryReplica send `CancelRebalanceRequest(oldTopology, newTopology)` to the ReplicationGroup following cases available:
+- Replication group has ongoing rebalance oldToplogy->newTopology. So, it must be cancelled and cleanup for the configuration state of replication group to  oldTopology must be executed.
+- Replication group has no ongoing rebalance and currentTopology==oldTopology. So, nothing to cancel, return success response.
+- Replication group has no ongoing rebalance and currentTopology==newTopology. So, cancel request can't be executed, return the response about it. Result recipient of this response (placement driver) must log this fact and do the same routine for usual rebalanceDone.


What if after sending CancelRebalanceRequest the placement driver finds out that some of replication groups have finished rebalance (currentTopology==newTopology) and some have not?

How does it possible, if any rebalance touches only one distribution zone, so, only one replication group?

Seems I missed this moment, my fault. Question withdrawn.

modules/distribution-zones/tech-notes/src/cancelRebalance.puml

Co-authored-by: Denis Chudov <moonglloom@gmail.com>

modules/distribution-zones/tech-notes/rebalance.md

Co-authored-by: Denis Chudov <moonglloom@gmail.com>

…pache#1676 Signed-off-by: Slava Koptilin <slava.koptilin@gmail.com>

kgusakov force-pushed the ignite-17056 branch from 04bbc45 to aa82437 Compare February 16, 2023 13:48

IGNITE-17056 Design rebalance cancel mechanism

a367daf

kgusakov force-pushed the ignite-17056 branch from aa82437 to a367daf Compare February 16, 2023 22:53

kgusakov marked this pull request as ready for review February 16, 2023 22:54

denis-chudov suggested changes Feb 20, 2023

View reviewed changes

kgusakov and others added 5 commits February 21, 2023 14:22

Apply suggestions from code review

043c790

Co-authored-by: Denis Chudov <moonglloom@gmail.com>

fix PR comments

a2631d8

Fix comments

249787f

Merge branch 'main' into ignite-17056

765ad2e

fix comments

6e292c3

denis-chudov suggested changes Feb 22, 2023

View reviewed changes

modules/distribution-zones/tech-notes/rebalance.md Outdated Show resolved Hide resolved

modules/distribution-zones/tech-notes/rebalance.md Outdated Show resolved Hide resolved

Apply suggestions from code review

daf06ad

Co-authored-by: Denis Chudov <moonglloom@gmail.com>

denis-chudov approved these changes Feb 22, 2023

View reviewed changes

asfgit closed this in c276a33 Feb 22, 2023

lowka pushed a commit to gridgain/apache-ignite-3 that referenced this pull request Mar 18, 2023

IGNITE-17056 Added design documents for rebalance cancellation. Fixes a…

2692d20

…pache#1676 Signed-off-by: Slava Koptilin <slava.koptilin@gmail.com>

lowka pushed a commit to gridgain/apache-ignite-3 that referenced this pull request Apr 19, 2023

IGNITE-17056 Added design documents for rebalance cancellation. Fixes a…

1cc2402

…pache#1676 Signed-off-by: Slava Koptilin <slava.koptilin@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

IGNITE-17056 Design for rebalance cancellation #1676

IGNITE-17056 Design for rebalance cancellation #1676

Uh oh!

kgusakov commented Feb 15, 2023 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

denis-chudov Feb 20, 2023

Uh oh!

Uh oh!

Uh oh!

denis-chudov Feb 20, 2023

Uh oh!

kgusakov Feb 21, 2023

Uh oh!

denis-chudov Feb 21, 2023

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

IGNITE-17056 Design for rebalance cancellation #1676

IGNITE-17056 Design for rebalance cancellation #1676

Uh oh!

Conversation

kgusakov commented Feb 15, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

denis-chudov Feb 20, 2023

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

denis-chudov Feb 20, 2023

Choose a reason for hiding this comment

Uh oh!

kgusakov Feb 21, 2023

Choose a reason for hiding this comment

Uh oh!

denis-chudov Feb 21, 2023

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

kgusakov commented Feb 15, 2023 •

edited

Loading