aip | title | author | discussions-to (*optional) | Status | last-call-end-date (*optional) | type | created |
---|---|---|---|---|---|---|---|
89 |
Consensus Latency Reduction using Order Votes |
Satya Vusirikala, Daniel Xiang, Zekun Li |
Draft |
06/25/2024 |
Core |
06/13/2024 |
We define the consensus latency to be the time taken for validators to order a block proposal under the common case. This AIP proposes a solution to reduce the consensus latency from 4 hops to 3 hops by introducing a novel protocol change that involves a new type of votes called “order votes”. We note that 3 hops is the theoretically optimal limit on a BFT based consensus protocol (see Section Optimal Consensus Latency).
We define the consensus latency to be the time taken for a proposed block to be ordered under the common case (where network is synchronous and the leader is non-faulty). In Jolteon consensus protocol, we consider a block is ordered when we obtain a 2-chain quorum certificate on the block. At the moment, Jolteon consensus protocol takes 4 hops of latency. In a nutshell, we are implementing the PBFT-style [1] ordering mechanism for the latency optimization. This is the timeline of the current consensus protocol for proposing and ordering a block r
.
-
Time t
: The leader of blockr
creates and broadcasts the blockr
. -
Time t+1
: All the other validators receive the blockr
and broadcasts a vote onr
. -
Time t+2
: All the validators create a quorum certificate on the blockr
by aggregating the received votes. The next leader creates and broadcasts the blockr+1
. The blockr+1
includes the quorum certificate on the blockr
. -
Time t+3
: All the validators receive the blockr+1
and broadcasts a vote onr+1
. -
Time t+4
: All the validators create a quorum certificate on the blockr+1
by aggregating the received votes. The validators now obtained a 2-chain quorum certificate on the blockr
. The validators now consider the blockr
as “ordered” and passes the block to the execution pipeline.
This AIP aims to reduce the consensus latency from 4 hops to 3 hops. We start with the observation that the QC (Quorum Certificate) on the block r+1
serves a dual purpose.
-
The validators certify the block
r+1
. -
The validators create a 2-chain QC for the block
r
and order the blockr
.
All the validators have the QC on the block r
at time t+2
, but the validators effectively vote on this QC when they vote on the block r+1
at time t+3
. We aim to optimize the latency by letting the validators vote on the “QC of block r
" at time t+2
without waiting to receive block r+1
. We call these new type of votes as the “order votes”. The new timeline for ordering a block r
looks as follows.
-
Time t
: The leader of blockr
creates and broadcasts the blockr
. -
Time t+1
: All the other validators receive the blockr
and broadcasts a vote onr
. -
Time t+2
: All the validators create a quorum certificate on the blockr
by aggregating the received votes. All the validators broadcasts on “order vote” on the created quorum certificate on the blockr
. The next leader creates and broadcasts the blockr+1
. The blockr+1
includes the quorum certificate on the blockr
. -
Time t+3
: All the validators receive the order votes and creates a quorum certificate on the order votes. The validators now obtained a 2-chain quorum certificate on the blockr
. The validators now consider the blockr
as “ordered” and passes the block to the execution pipeline. All the validators receive the blockr+1
and broadcasts a vote onr+1
.
The structure of the order vote to order the block r
contains the following fields.
- The identity of the voter.
- The ledger info of block
r
. The ledger info consists of the block info of blockr
, along with a consensus data hash which is set to a dummy value (zero). - The signature on the above ledger info.
pub struct OrderVote {
/// The identity of the voter.
author: Author,
/// LedgerInfo of a block that is going to be ordered in case this vote gathers QC.
ledger_info: LedgerInfo,
/// Signature of the LedgerInfo
signature: bls12381::Signature,
}
In most cases, a validator first receives the quorum certificate on the block r
before receiving order votes on the block. But when a validator’s network connection is faulty, the validator could potentially fail to receive the quorum certificate on block r
but receive the order votes on the block r
. In this case, without having the quorum certificate on block r
, even if the validator receives enough order votes on block r
, the validator will still have to wait for block r+1
to execute the block r
. To simplify this, the validators additionally append the quorum certificate on block r
when broadcasting the order vote.
pub struct OrderVoteMsg {
order_vote: OrderVote
// The quorum certificate on the block that is going to be ordered
quorum_cert: QuorumCert
}
When a validator obtains the QC on a block r
, the validator uses the following algorithm to create the corresponding order vote.
Procedure process_vote {
// same as before
...
if a QC is aggregated {
order_vote <- make_order_vote(QC)
order_vote_msg <- OrderVoteMsg {order_vote, QC}
broadcast order_vote_msg
}
}
Procedure make_order_vote(qc) {
// Safety need to update the highest qc here
observe_qc(qc)
if safe_to_order_vote(qc) {
ledger_info <- LedgerInfo { qc.vote_data.proposed, HashValue::dummy() }
signature <- sign(ledger_info)
order_vote <- OrderVote { self.author, ledger_info, signature }
return order_vote
}
return none
}
Procedure make_timeout(round, high_qc, last_tc) {
...
if valid_signature && safe_to_timeout {
increase_highest_vote_round(round)
increase_highest_timeout_round(round)
return TimeoutInfo
}
...
}
Procedure safe_to_order_vote(qc) {
// no order vote for this round if already timeout
return qc.vote_data.round > highest_timeout_round
}
Procedure increase_highest_timeout_round(round) {
highest_timeout_round <- max(highest_timeout_round, round)
}
When a validator receives an order vote, it uses the following algorithm to verify and process the order vote.
Procedure process_order_vote_msg(order_vote_msg) {
// the signature of order_vote should already be verified upon receiving
commit_id <- order_vote_msg.order_vote.ledger_info.id
pending_order_votes[commit_id].add(order_vote.signature)
if |pending_order_votes[commit_id]| >= 2f+1 {
// Add order_vote_msg.qc to the block store, if not already present
// Send the block with commit_id to execution
// Update highest_ordered_cert
finalize(order_vote.ledger_info)
}
}
The order certificate and commit certificate were both of the type QuorumCert
which contains VoteData of the block being certified, and the LedgerInfo of the block being ordered/committed. This means, the order certificate to order the block r
would contain VoteData of block r+1
and LedgerInfo of block r
. With our new protocol, when a validator creates an order certificate by aggregating enough order votes on the block r
, the validator may not have access to the block r+1
. As the commit certificate is derived from the order certificate, we face the same issue in creating the commit certificate to commit the block r
.
We observe that the VoteData in the order and commit certificates for the correctness, safety and liveness of the protocol. So, we let the validators use dummy vote data when creating order certificate. But when a validator receives a QuorumCert
with dummy vote data, the message will fail verification. To ensure backward compatibility, we let order and commit certs be of new struct type called WrappedLedgerInfo
. This struct has the same fields as the QuorumCert
. However, the verify function of WrappedLedgerInfo
doesn’t verify the VoteData field.
This feature will reduce the end-to-end latency of Aptos by 100ms (1 hop).
As each validator need to broadcast the order vote to all the validators, the communication complexity of the protocol increases.
Each validator need to verify the order votes from all the other validators, increasing the computational complexity.
For each consensus block, 130 mainnet validators sign and validate an order vote message. Assuming 5 blocks per second, each validator will sign 5 order votes per second, will receive and verify 650 order votes per second. Earlier, each validator broadcasts a proposal vote for each block. With the introduction of order votes, the number of consensus messages roughly doubles. The verification of an order vote message takes about 2.3 ms. As the regular proposal votes and order votes are verified in parallel, it increases the number of cores used for processing consensus messages. With 130 nodes, each order vote message is 1408 bytes long in-memory, and 699 bytes on the network. The order votes feature will thereby increase the network bandwidth by 445 KB/second.
In a load test with 100 nodes at 5k TPS workload, the average total number of inbound network messages increased from 2165 to 2625 messages per second with order votes. The average total inbound traffic increased 1.31 MB/s to 1.54 MB/s. The average total number of outbound network messages increased from to 2590 to 3080 messages per second. The average CPU usage increased from 1450% to 1640%. There is no visible change in the memory usage. The average I/O rate increased from 345 IO ops/sec to 375 IO ops/sec. The average end-to-end latency reduced from 1.5 seconds to 1.4 seconds.
As we introduce a new type of consensus message, this change is not backward compatible due to 2 reasons.
- If an upgraded validator broadcasts an “order vote” message, a non-upgraded validator will not know how to process this message and will fail. To avoid this, we will follow a 2-step release plan.
- If an upgraded validator broadcasts an order certificate or commit certificate with dummy VoteData, a non-upgraded validator will not be able to ver
In v1.14, we released the necessary code for the validators to send and receive the order votes. But this code is behind a new order_vote_enabled
flag added to the onchain consensus config. After all the validators are upgraded with this binary, when this AIP is accepted, we enable the order_vote_enabled
flag.
- PR for reference implementation of order votes.
We formally define the problem of consensus and review the existing lower bound on the latency for achieving consensus. Our findings demonstrate that the proposed modifications result in a protocol that attains optimal consensus latency.
System Model.
We consider a set of
Definition 1 (Consensus).
Given a system as defined above, the consensus exposes two interfaces to the parties, namely$Input(val)$ to input a value$val$$Output() \rightarrow val’$ to output a value.
Consensus is required to satisfy the following properties:
- Agreement. If any two honest parties output, they output the same value.
- Termination. Every honest party eventually outputs.
To exclude a trivial solution that always has the parties output a predetermined value, consensus typically specifies which output values are valid given the input values of all parties. In the context of the blockchain, the consensus needs to satisfy the following two validity requirements.
- Weak Validity. If all parties are honest and input the same value
$val$ , then any honest party only outputs$val$ . - External Validity. If an honest party outputs
$val'$ , then$f(val')=1$ where$f(\cdot)$ is a globally verifiable predicate that output$1$ or$0$ .
In blockchain consensus, the honest party always inputs
We define consensus latency as follows. A more rigorous definition can be found in [2].
An initial configuration includes the set of corrupted parties and the input values of the honest ones.
Definition 2 (Consensus Latency).
A consensus protocol has a consensus latency of X message delays if, for every initial configuration and adversary strategies, there exists an execution where every honest party outputs no later than X message delays.
In the context of blockchain consensus, consensus latency refers to the time required for validators to agree on a block under the common case. For example, after this AIP, the Aptos consensus protocol achieves a consensus latency of three message delays. This involves one round of block proposal by the leader followed by two rounds of all-to-all voting among the validators to order the block, assuming the leader is honest and the network is synchronous.
Kuznetsov et al. (Theorem 4.5, [2]) proves any protocol that solves consensus defined above that satisfies Weak Validity must have a consensus latency of 3 message delays, under
This implies under the optimal fault tolerance
Therefore, after the modification proposed in this AIP, the Aptos consensus protocol is optimal in terms of consensus latency.
[1] Miguel Castro, and Barbara Liskov. "Practical byzantine fault tolerance." In OSDI, vol. 99, no. 1999, pp. 173-186. 1999.
[2] Petr Kuznetsov, Andrei Tonkikh, and Yan X. Zhang. "Revisiting optimal resilience of fast byzantine consensus." In Proceedings of the 2021 ACM Symposium on Principles of Distributed Computing, pp. 343-353. 2021.