-
Notifications
You must be signed in to change notification settings - Fork 2
Voldemort Rebalancing
Voldemort Rebalancing project aims to provide capability to add/delete nodes, move data around in a running voldemort cluster. The project is aimed to solve the following scenarios without downtime and with minimal impact on online cluster performance
- Dynamic addition of new nodes to the cluster.
- Deletion of nodes from cluster.
- Load balancing of data inside a cluster.
These are the requirements for Voldemort rebalancing project.
- No downtime
- No functional impact on client
- Maintain data consistency guarantees while rebalancing.
- Minimal and tunable performance impact on the cluster.
- push button user interface
These are the main assumptions/design choices
- Only one rebalancing process at one time on the cluster should be running.
- Partitions to key mapping is not changed during rebalancing instead partitions ownerships (node to partition mapping)
is changed and entire partitions are migrated across nodes. - User need to create and specify the targetCluster.xml (the final cluster configuration he desires)
The Rebalancing process has three participants during rebalancing.
- Rebalance Controller : Initiates rebalancing on the cluster by providing a target cluster.xml, can be started on any node which can make socket calls to all nodes in the cluster.
-
Stealer Nodes: The node which get new partitions.
- When adding new nodes the new nodes will be the stealer nodes
- When doing data load balancing, some nodes from the cluster itself will act as stealer nodes.
- Donor Nodes: The nodes from where the data is copied.
- Step 0: Make sure cluster is not doing Rebalancing, User is responsible for making sure only one rebalancing controller is trying to rebalance at a given time
- Step 1 : Create a target cluster.xml.
- If adding new nodes, add the new nodes to cluster.xml with intended partition ownerships.
- If doing data load balancing just change the intended partition ownerships.
- Step 2 : Check all the nodes in the cluster are up and running at the right ports as specified in the target cluster.xml.
- If adding new nodes to the cluster, start them with an cluster.xml with empty partition list.
- Step 3: Start a rebalance Controller.
- Specify a bootstrap URL This should point to a node in the cluster and not the new node
- Specify a maxParallelRebalancing number, This specifies how many StealerNodes will attempt rebalancing and partition migration in parallel.
***One stealerNode steals from only One donor node at one time. This number can help control impact on performance if adding large number of nodes to the cluster. - Specify if you want stealerNodes to actually delete data from donorNode after rebalancing succeed, The default behavior is Not to Delete data as a precaution measure.
- Step 4 : monitor the output on RebalanceController shell, this should output the rebalancing steps, success/failed information.
Rebalance Controller job is to compute what changes need to be made in the cluster, initiates the rebalancing and then wait for their completion.
- Step 0 : Get the current cluster state from the cluster using bootstrap URL.
- Step 1: Compare the targetCluster.xml provided and add all new nodes to currentCluster.xml
- Step 2: Propagate the currentCluster.xml to all nodes
- Step 1: Create a Rebalancing Cluster plan based on the desired target cluster.xml and the current state of the cluster.
- Step 2: In parallel (upto maxParallelRebalancing number)
- Poll the Queue (RebalancingNodePlan task queue) from Rebalancing Cluster plan and get rebalancing info for one stealer node.
- Pick one donorNode from the RebalancingNodePlan list.
- Get the current latest cluster.xml from the cluster.
- Create an new updated cluster.xml with partition ownership change from donorNode to stealerNode for desired partitions.
- send a rebalance request to the stealer node and save the returned rebalance async Id.
- Commit this new cluster.xml changes on the cluster, if failed revert back all nodes to the current cluster.xml.
- wait for completion of the rebalance process using the rebalance Async Id.
- Mark success or failure accordingly.
- Poll the Queue (RebalancingNodePlan task queue) from Rebalancing Cluster plan and get rebalancing info for one stealer node.
- The RebalancingController is done when the entire plan is attempted.
The stealerNode is responsible for doing actual rebalancing and handling all failure scenarios.
- Step 1: Receives a rebalanceNode request through the AdminClient.
- Step 2: Check current local state that is not doing rebalancing for any other partition set.
- Step 3: Fail if is already doing rebalancing with error message
- Step 4: Starts the rebalancing process as an async operation and return the async Id back to caller.
- Step 5: Monitor the async fetch operation internally and keep updating progress/success or failures.
The donorNode has no knowledge for rebalancing at all and keeps behaving normally.
If the rebalance Client log shows no exception/warning. Everything went smoothly and you have a new rebalanced cluster. The trouble spot is when bad things happened and how to correct them if needed. These are the failure scenarios we have considered in the design.
- RebalancingController dies: Rebalancing controller can timeout or die while rebalancing is happening. Since the actual process is still running as Async operation it will not affect it.
- Restarting RebalanceController with same targetCluster.xml will make sure we will restart the remaining left rebalancing transfers.
- The currentCluster configuration is read from the cluster each time before starting new rebalancing so attempt to give already rebalanced cluster.xml would be a no-op.
- StealerNode/DonorNode dies : Stealer Node persist rebalancing state before starting actual data transfers, If DonorNode fails or it fails itself during rebalancing, It will wait for some time and attempt again until succeed or maximum number of attempts tried (tunable settings are in VoldemortConfig)