forked from voldemort/voldemort
-
Notifications
You must be signed in to change notification settings - Fork 2
Voldemort Rebalancing
bbansal edited this page Aug 17, 2010
·
32 revisions
Voldemort rebalancing is the numero uno request for Project Voldemort users for a long time, here is the current plan.
Voldemort Rebalancing need to handle few things
- Load balancing of data
- New server should get equal share from all other servers in cluster.
- Should not impact online serving
- rebalancing will run in background and should be throttled.
- Rebalancing logic need to handle get()/put()/delete() while doing rebalancing, We are thinking of a “proxy server” model to solve this problem
- for both get()/put() external requests
- New server makes a background redirectGet() call
- does a local put() ignoring any ObsoleteVersionExceptions
- Serve the original get()/put() request as normal.
- for delete() request
- we need to delete it from local store and make sure we dont add it back due to rebalancing.
- we should delete it locally and send back an exception
- for both get()/put() external requests
- Restrictions
- Allow only one rebalancing at one time to keep things simple for now.

- Get the rebalancing permit from the cluster.
- Attempt to set CLUSTER_STATE_KEY to REBALANCING_CLUSTER on all nodes.
- Fail if not successful on atleast Majority nodes.
- metadataStore need to throw exception if CLUSTER_STATE is already REBALANCING_CLUSTER.
- If succeed in getting permit
- Set State of new/rebalancing node as REBALANCING_MASTER_SERVER
- Set the list of all other nodes as REBALANCING_SLAVES_LIST_KEY
- While REBALANCING_SLAVES_LIST_KEY is not empty
- Choose a node at Random from REBALANCING_SLAVES_LIST_KEY
- set state on node as REBALANCING_SLAVE_SERVER
- set partition list to be stolen/donated to the remote node as REBALANCING_PARTITIONS_LIST_KEY.
- start fetching/updating
- if success
- remove node from REBALANCING_SLAVES_LIST_KEY
- if failed
- Increment failed count for node and continue.
- if failed more than ‘x’ times throw error in log and remove node from list.
- repeat
- Set the state of REBALANCING_MASTER_SERVER back to Normal State
- Set the cluster State back to NORMAL_CLUSTER.
| Task | Details | Issue/Ticket no. | primary contact | Status |
| Proxy serving | * Add a new RedirectStore() layer to handle proxy serving ** get()/put()/delete() calls |
Bhupesh Bansal | Done | |
| InvalidMetadataRequest | All servers at all times throws an InvalidMetadataException() if they are requested with a key not belonging to them based on current partitioning and routing strategy |
Done | ||
| Gossip protocol | Servers gossip with each other to keep in sync on * cluster.xml * stores.xml * CLUSTER_STATE |
Not started | ||
| Streaming protocol | For migrating partitions we need to support streaming based protocol. * A new AdminClient and AdminServer was added to Voldemort to support streaming and other operations needed by Rebalancing. |
Done | ||
| Rebalancing Process Design | Implement the full rebalancing logic | Partially done | ||
| Rebalancing user interface | Not Started |