Hinted Handoff

What is it

The purpose of Hinted Handoff is to provide additional an consistency mechanism, allowing consistency to be reached before (or without) read-repair taking place on the key. The idea is to handle transient failures and partitions in situations where performing read repair at the end of the get pipeline is expensive (e.g., across data centers) or where quorums are otherwise unwanted.

An implementation based on the pipelined routed store (recently merged to voldemort/master) now exists in a separate branch:

http://github.com/afeinberg/voldemort/compare/hinted-handoff-rebase

How it works

Any single put or delete to a node (whether a part of a quorum or not) fails due to a non-application exception i.e., timeout, machine unreacheable
At the end of the put (or delete) pipeline (pipeline is kept per request to a key), a list of nodes to which requests have failed is kept.
For each failed node in the list of failed nodes, randomly choose a live node elsewhere in the cluster and perform a synchronous put to the slop store on that node, retaining the original vector clock. A slop store is a separate store that is not used for live quorums, but merely stores these hints. Each hint consist of: key, version (vector clock), original node, time handoff occurred, operation (put or a delete) and, if the operation is a put, the value. The hints are serialized using protocol buffers.
If ultimately the put or delete is a failure (required-writes not achieved), we still return an exception to the client; if previous step succeeded, specify within the exception explanation string that a handoff has occurred
Periodically the nodes holding the hints should attempt to write the original key and value (or a request to delete the specified version) to the original (once failed) node. Once a put or delete is confirmed as having succeeded to the original node, we can delete the hint

Note that this is a deviation from the original Dynamo paper: there are no sloppy quorums for reads; if required-writes aren’t met by a strict quorum, the request is still considered failed (even if hinted handoff succeeds) and hints are written to random nodes rather than to neighbours in the ring to avoid cascading failures.

Test plan

There are several things to test out for Hinted Handoff:

Correct semantics, tested through unit tests. This has been implemented and passes.
1. Unmocked end-to-end, tests, perhaps part of a regression. They could run on EC2
Transient (a few hours) failures on a real cluster, under realistic load are correctly handed off and restored. Start a cluster, start doing a mixed read/write workload (with known values written to known keys), kill a node in the cluster. Keep it down for an hour, bring the node back.
1. The values written can should now be read (semantic correctness)
2. There should be no significant performance impact on the client
3. There is no significant impact on performance of the live nodes (ones being handed off to) in the cluster
4. When a node is returned and a push of the hints occurs, the pushes shouldn’t cause a disruption to normal activity (puts, gets)
Longer term (e.g., whole day) failures should not result in disruptive hint push operations upon the return of the node i.e., a returning node shouldn’t get knocked out by the load (might require throttling of the push jobs be needed).
1. Performance of the node pushing the hints should also not be disrupted
2. As hints accumulate on the nodes and aren’t flushed, performance of the nodes holding the hints shouldn’t degrade
Failures of multiple nodes (for hours) should also not cause a disruption in normal activity (scalability of handoff)

Additional implementation work

Hinted Handoff is extremely useful when dealing with a multiple datacenter environment. However, work remains to make this feasible.

Hints should only be handed off to nodes within the same zone. This is a simple logic change and should be implemented once the existing code has been verified to be correct and safe.
Hinted handoff should be able to handle multi-hour (and if possible, multi-day) network partitions
There should be some operational visibility into the slop store e.g., ability to see how many hints remain, ability to clear out hints, ability to schedule a push of hints early, set the throttling rate, observe the pushes, etc…

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hinted Handoff

What is it

How it works

Test plan

Additional implementation work

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally