-
Notifications
You must be signed in to change notification settings - Fork 2
Hinted Handoff
The purpose of Hinted Handoff is to provide additional consistency guarantees, allowing consistency to be reached before (or without) read-repair taking place on the key. The idea is to allow handle transient failures and partitions in situations where perform read repair at the end of the get pipeline is expensive e.g., across data centers or where quorums are otherwise unwanted.
An experimentation implementation now exists in my private branch:
http://github.com/afeinberg/voldemort/compare/hinted-handoff-rebase
- Any single put or delete to a node (whether a part of a quorum or not) fails due to a non-application exception i.e., timeout, machine unreacheable
- At the end of the put (or delete) pipeline, a list of nodes to which requests have failed is kept.
- For each failed node in the list of failed nodes, randomly choose a live node elsewhere in the pipeline and perform a synchronous put to the slop store on that node, retaining the original vector clock. A slop store is a separate store that is not used for live quorums, but merely stores these hints. Each hint consist of: key, version (vector clock), original node, time handoff occured, operation (put or a delete) and, if the operation is a put, the value. The hints are serialized using protocol buffers.
- If ultimately the put or delete is a failure (required-writes not achieved), still return an exception to the client but if previous step suceeded specify, within the exception explanation string that a handoff has occurred
- Periodically, the nodes holding the hints should attempt to write the original key and value to the original node that failed in the first place. One a put or delete is confirmed as having occured, we can delete the hint
Note, that this is a deviation from the original Dynamo paper. There are no sloppy quorums for reads, if required-writes aren’t met by a strict quorum the request is still considered failed (even if hinted handoff succeeds) and hints are written to random nodes rather than to neighbours to avoid cascading failures.
There are several things to test out for Hinted Handoff:
- Correct semantics, tested through unit tests. This has been implemented and passes
- Transient (a few hours) failures on a real cluster, under realistic load are correctly handed off and restored. Start a cluster, start doing a mixed read/write workload (with known values written to known keys), kill a node in the cluster. Keep it down for an hour, bring the node back.
- Verify that the values written can now be read (semantic correctness)
- There should be no significant performance impact on the client
- There is no significant impact on performance of the live nodes (ones being handed off to) in the cluster
- When a node is returned and a push of the hints occurs, the pushes shouldn’t cause a disrupt to ongoing activity (puts, gets)
- Longer term (a whole day) failures should not result in unexpected disk fill up or disruptive hand off push operations upon the return of the node i.e., a returning node shouldn’t get knocked out by the load (might throttling of the push jobs be needed).
- Performance of the node handing off the hints should also not be disrupted
- Failures of multiple nodes (for hours) should also not cause a disruption in normal activity (scalability of handoff)
- Hints should only be handed off to nodes within the same zone. This is simple logic change and should be implemented once the existing code has been verified to be correct and safe.
- Hinted handoff should be able to handle multi-hour (and if possible, multi-day) network partitions
- There should be some operational visibility into the slop store e.g., ability to see how many hints remain, ability to clear out hints