Implementation available at https://github.com/yahoo/bftkv
Reliable data storage is one of the fundamental problems.
BFTKV uses b-masking quorum based
write operations to
ensure Byzantine fault-tolerance and GPG's Web of Trust
mechanism to build trust relationships between entities. Trust relationships are used to build quorums.
Moreover, BFTKV provides the following guarantees:
- Value corresponding to the key is up to date and not forged
- Entities trying to deceive users will be revoked immediately
- Entities can join and leave the system dynamically
- Communications between entities are encrypted using public keys
BFTKV leverages integration of three concepts to provide a Byzantine fault tolerant distributed key-value storage:
- Byzantine Quorum Systems
- Web of Trust
- Quorum Certificate
In this document, we will first describe PGP's Web of Trust mechanism and then build the other concepts on top of it.
Web of Trust
Web of Trust is a way of building trust between entities without a central authority, unlike Public Key Infrastructure (PKI). Trust is established by signing public keys (implies the signer trusts the owner of the signed public key). A Web of Trust is created by exchanging signed keys between entities.
Trust relationships can be represented with a graph, such as this:
The graph can be transcribed as:
Alice trusts Bob and Erin Bob trusts Erin and Dave Carol trusts Bob and Erin Erin trusts Alice, Carol and Dave
Web of Trust mechanism plays a huge role in BFTKV's quorum selection mechanism.
Byzantine Quorum Systems
In a network system, servers might be inaccessible or return wrong/not up to date data. Byzantine failure refers to both of these failures and the naming is based on The Byzantine General's Problem.
b-masking quorums to tolerate Byzantine failures where
b is the number of failure nodes.
b-masking quorums are due to Malkhi and Reiter (Paper).
Castro and Liskov (Paper) introduced
a Byzantine fault tolerant replication mechanism that expects
from the servers to verify the data, where
f is the number of faulty nodes. We use the Web of Trust mechanism to specify
the nodes that their responses will be accepted by a quorum member.
The following parts of this document deals with how previously discussed concepts are used in BFTKV.
Quorum selection is based on the trust graph built using the Web of Trust mechanism. BFTKV, usually, chooses the maximal cliques that are
L hops away from the clients. For example,
Client1 has two cliques: Clique 1 and Clique 2 where
Client2 has two cliques: Clique 2 and Clique 3 where
Write procedure saves a value associated with a key in the system. A high-level pseudocode for the procedure:
- Choose a quorum.
- Get times for the key from quorum members.
- Pick a new time that is higher than the maximum time returned by the quorum members.
- Request and gather signatures from quorum members for new value for the key with the new timestamp.
- Choose another quorum that includes the first quorum.
- Write the key, value and signature set to the new quorum members.
read procedure reads a value associated with a key in the system. A high level pseudocode for the procedure:
- Choose a quorum.
- Collect values associated with the key
- Revoke signers who signed different values with the same timestamp.
- Return value having signatures more than the number of faulty nodes and has the maximum timestamp.
Design Decisions and Security Analysis
In this section, we will go over somewhat unclear points in the chapters we discussed
- The quorum
Qchosen here should have the property
|Q| >= 3b + 1where
bis the number of faulty nodes. This is required for Byzantine fault tolerance since
fnodes may be inaccessible and
fnodes may be returning a previous value for the key. The remaining honest
f + 1nodes will keep the system in a safe condition. As
Q, BFTKV uses a maximal clique that a client is connected to in the trust graph.
- Number of timestamps should be greater than or equal to
2b + 1since we will tolerate
- The number of signatures
mshould be greater than
b + (n - b) / 2. Please see the security analysis for details.
- All nodes may be chosen which is BFTKV's current strategy.
- Before writing, each server verifies the signature, checks the number of valid signatures gathered from quorum members and accepts write if the number
is greater than
b + (n - b) / 2. Moreover, every server makes sure that they haven't signed this key with the same timestamp before.
writeoperation succeeds if the received acks from server is greater than
2f + 1.
- BFTKV chooses a random quorum
Qthat has the property
|Q| > b + (n - b) / 2.
- Collect pairs up to
2f + 1. The reason is the same with
writeoperation second explanation.
- A server should not sign the same key, same timestamp and a different value. This is equivocation attack. It is very important that servers revoke these nodes at this phase for the system to survive.
- To return a value for the key, the value should have at least
b + 1signatures, which guarantees that the value is valid, and have the maximum timestamp.
Equivocation Attack: An adversary can try to create two different views of a quorum by trying to store different values for a key in half of the quorum and
another value in the other half (Half is the best option from an attacker's perspective if he wants to succeed). With the help of the
b faulty nodes, the basic check for
b + 1 signatures will succeed. However, if the quorum size is chosen carefully, this can be prevented.
Consider the node states below (
n = the number of nodes in the quorum,
b = faulty nodes in the quorum):
Maximum number of signatures an attacker can get is
b + (n - b) / 2. To make sure that the majority has the correct value
n - b > b + (n - b) / 2 should hold. Therefore
should be greater than
Detecting Equivocation on Read
Fp define the failure probability of an adversary (i.e., he won't be detected);
H2 honest node sets,
F the set of faulty nodes and
N = H1 U H2 U F.
Then the nodes chosen from the quorum
Q should be either from
H1 U F or
H2 U F to prevent detection. This probability is
Fp ~= 1 - ((|F| + |N|) / 2|N|)^|Q|
In a reqular quorum system after if the number of faulty nodes exceed
n/3 trust to data drops down to 0. BFTKV can keep the adversary's failure probability
close to 1 for more than
f failing nodes. Below two graphs represent this: