Skip to content
Permalink
Branch: master
Find file Copy path
Find file Copy path
Fetching contributors…
Cannot retrieve contributors at this time
310 lines (228 sloc) 12.9 KB

UIP-11: Snapshot

Author:   Kostiantyn Stepaniuk <kostia@thirdhash.com>
Status:   Proposed
Created:  2018-10-09

Abstract

Download UTXO set instead of the whole blockchain during initial sync to reduce the bandwidth and the sync time.

Motivation

Currently, to start a full node it requires to download the whole chain. At the time of writing, Bitcoin's chain weights 200GB, and they generate the block every 10 minutes. In our case, we create the block every 16 seconds which will lead to significantly larger chain size. Besides consuming a lot of space and a long time to sync the node, downloading the whole chain causes also high bandwidth usage. According to statoshi.info, on average block message consumes 252 KB/s of outbound traffic which is ~12x more than the second tx message, 20 KB/s. The reason is that many nodes are being added to the network but have a short lifetime. We can see this pattern on bitnodes

To reduce bandwidth, storage and time to sync, we introduce the snapshot which is the set of UTXOs. Effectively, to validate/propose new transactions/blocks the node needs to know only UTXOs.

Specification

Snapshot generation

Snapshot generation is optional however is highly recommended as it helps the network. New nodes can quicker join the network and nodes that serve the snapshot can save the bandwidth as they will be asked for the UTXO set only instead of the whole blockchain.

Ideally, we would like to compute the snapshot for every block the node sees. However, computation of the snapshot is expensive. It takes roughly 20 minutes to dump all the UTXO set from the chainstate DB and compute the ECMH of it. To overcome this, we will generate the snapshot only for the checkpoint that can be potentially finalized epoch with the step of 150 epochs. The reason of doing it every 150 epochs is to generate the snapshot ~1/day. Current block rate is: 1 block/16 secs or 1/16*60*60*24 = 5400 blocks = 108 epochs

Once the node receives the block which has (height + 2) % (50 * 150) == 01, after processing the block (and making it the tip), the node takes the levelDB snapshot (iterator of chainstate) and tries to generate the actual snapshot in the separate thread. Taking the levelDB iterator, it guarantees that we have the same view of chainstate while we are keeping it.

If the snapshot was successfully generated, we compute its hash and store it in chainstate DB for later retrieval. The node SHOULD keep last 5 snapshots. Validators MUST always create snapshots. Proposers can disable snapshot creation by starting the node with flag -createsnapshot=0.

Snapshot schema

Snapshot is the collection of UTXOSubset. The schema of one UTXOSubset is the following:

UTXOSubset 2

field type bytes description
tx_id uint256 32 TX hash
height uint32 4 block height the TX was included
tx_type uint8_t 1 type of the transaction
output count VarInt 1-9 number of outputs this TX contains
index uint32 4 output index
output CTxOut the actual output

CTxOut

field type bytes description
amount int64 8 amount the output has
script size VarInt 1-9 size of the script data
script data vector<unsigned char> script data

Snapshot Hash calculation

To compute the snapshot hash we don't need to generate the snapshot itself. We use a rolling update Elliptic Curve Multiset Hash to compute the hash once the new block arrives by subtracting inputs and adding outputs. Snapshot hash is the concatenation of all UTXOs + stake modifier and chain_work.

The formula is the following: ECMH(UTXO03 4 + UTXO1 + ... + UTXOn + stake_modifier5 + chain_work6)

UTXO

field type bytes description
out_point COutPoint 36 identifier of the output
height uint32 4 block height the TX was included
tx_type uint8_t 1 type of the transaction
output CTxOut the actual output

COutPoint

field type bytes description
hash uint256 32 tx hash
n uint32 4 index of the output

Sample code of computing the snapshot hash:

secp256k1_context *ctx = secp256k1_context_create(SECP256K1_CONTEXT_NONE);
secp256k1_multiset multiset;
secp256k1_multiset_init(ctx, &multiset);

secp256k1_multiset_add(ctx, &multiset, utxo_0.data(), utxo_0.size());
secp256k1_multiset_add(ctx, &multiset, utxo_1.data(), utxo_1.size());
secp256k1_multiset_add(ctx, &multiset, utxo_2.data(), utxo_2.size());

secp256k1_multiset_add(ctx, &multiset, stake_modifier, 32);
secp256k1_multiset_add(ctx, &multiset, chain_work, 32);

uint256 hash;
secp256k1_multiset_finalize(ctx, in.data(), &multiset);
return hash;

Notice: there is no count byte in front of the message 7.

Sample code of updating the hash when a new block arrives:

secp256k1_multiset_remove(ctx, &multiset, input_0.data(), input_0.size());
secp256k1_multiset_remove(ctx, &multiset, input_1.data(), input_1.size());

secp256k1_multiset_add(ctx, &multiset, output_0.data(), output_0.size());
secp256k1_multiset_add(ctx, &multiset, output_1.data(), output_1.size());

Snapshot verification

To guarantee that the snapshot is a valid one and other nodes can trust it, we add the snapshot hash to the chain. Every proposer MUST include the snapshot hash inside the CoinBase transaction as part of the script of the first input 8. The schema of the input is the following:

CTxIn(
    script_sig = CScript() << CScriptNum::serialize(height) << snapshot_hash;
)

Everyone who receives the block SHOULD validate that it has the correct snapshot hash. Snapshot hash points to the UTXOs of all blocks until the current one. To visualize it:

H - height
S - snapshot

H=0 (S=null)
*-----
blocks

* genesis block points to the empty snapshot hash


H=0        H=1 (S=0)
*----------*-------
snapshot 0 | blocks

* at hight 1 coinbase TX points to the snapshot that has only genesis UTXOs


H=0        H=1 (S=0)   H=2 (S=1)
*----------*-----------*---
snapshot 1             | blocks

* at height 2 coinbase TX points to the snapshot that has UTXOs from H=0 and H=1

Initial Snapshot Download (ISD)

To start the node in ISD mode, it's required to set two parameters:

  1. -prune=n (n >= 1)
  2. -isd=1

If pruning mode is not enabled, the regular IBD will be used. This UIP proposes to make ISD as an extension to the pruning. Later we might revise it and make ISD enabled by default.

When a node starts in ISD mode, it has the following parameters set:

  1. snapshot_chunk_timeout_sec=30 if the peer is not able to provide a valid chunk within this timeout this peer is marked as it doesn't have a snapshot.
  2. discovery_timeout_sec=120 time during which the node will discover available snapshots from peers. Peers joined after this timeout won't be asked for the snapshot.

These settings are important to not let the node remain stuck if it's surrounded by peers that can't provide the snapshot.

Once the node is started in ISD mode, it will perform the following:

  1. the node sends the discovery getsnaphead message to its peers to find the best snapshot
  2. the node downloads headers
  3. the node starts to download the snapshot from peers that have the best (highest) snapshot
  4. when the last snapshot chunk is downloaded, the node verifies the hash of the whole snapshot
  5. the node requests the block which is the parent block of the snapshot
  6. if the parent block has the correct snapshot hash, the node applies the snapshot and leaves ISD

If any error happens at any step, the node discards the snapshot (if it downloaded one) and switches to the 2nd best one. If the node runs out of peers that can serve the snapshot, it will switch to the IBD.

P2P messages

There are four new P2P messages:

  1. getsnaphead9 to request the snapshot header
  2. snaphead to reply with the snapshot header
  3. getsnapshot to request the snapshot chunk
  4. snapshot to reply with the snapshot chunk

getsnaphead

This command doesn't have parameters.

snaphead

field type bytes description
snapshot_hash uint256 32 snapshot hash
block_hash uint256 32 at which block the snapshot was created
stake_modifier uint256 32 stake modifier of this block
chain_work uint256 32 chain work including this block
total_utxo_subsets uint64 8 total UTXO subsets in snapshot

getsnapshot

field type bytes description
snapshot_hash uint256 32 which snapshot to request
utxo_subset_index uint64 8 index of the first UTXO subset in the snapshot
utxo_subset_count uint16 2 number of UTXO subsets to return

snapshot

field type bytes description
snapshot_hash uint256 32 snapshot hash
utxo_subset_index uint64 8 index of the first UTXO subset in the snapshot
utxo_subsets_count VarInt 1-9 number of UTXO subsets in the message
utxo_subsets []UTXOSubset actual UTXO subsets and their outputs

Once the snapshot message has the condition total_utxo_subsets == utxo_subset_index + utxo_subsets_count it is considered the last chunk and no more getsnapshot messages SHOULD be requested.

Rationale

  1. Why (height + 2) % (50 * 150) == 0? Height counter starts from 0 so (height + 1) % 50 == 0 represents the last checkpoint in the epoch. (height + 2) % 50 == 0 is the block before the last one. We don't generate the snapshot for the last finalized block because we need to also download the full parent block which contains the snapshot hash to validate that the snapshot we have is a correct one, and this parent block MUST be from the finalized epoch.
  2. Why introduce UTXOSubset? The reason for introducing UTXOSubset instead of using Coin class is to reduce the storage as we don't need to serialize outPoint, height and tx_type for every output.
  3. Why use UTXO instead of UTXOSubset? The reason we introduce UTXO instead of using UTXOSubset for computing the hash is that UTXO represents one output. It means that we can add/subtract outputs even without looking into the disk.
  4. Why use UTXO instead of Coin? The reason for not using the Coin class is that it doesn't use the canonical P2P serialization as all other P2P messages do.
  5. Why stake_modifier is used in snapshot hash calculation? We include stake_modifier in the snapshot because this field is used to compute the next kernel hash. To guarantee that snapshot carries the right value, we must include it in the snapshot hash too.
  6. Why chain_work is used in snapshot hash calculation? We include chain_work into the snapshot because this field is used in the fork-choice rule to decide which fork to follow. To guarantee that snapshot carries the right value, we must include it in the snapshot hash too.
  7. Why count byte is not prepended to UTXO set before computing the hash? Count byte(s) is often prepended to the list that during unserialization it is used to determine how big std::vector to initialize and how many records to read. For the case of the snapshot, the list of UTXOs can reach couple of gigabytes, so you'll never download it in one go instead, we download it in chunks. In this case, this extra byte is redundant as it won't be used.
  8. Why snapshot hash is in the input? The reason to use the input instead of the output is to reduce the size of the transaction as creating an extra output has an overhead.
  9. Why call it getsnaphead? The command name can be maximum 12 characters long; full version getsnapshotheader doesn't fit, so we shortened it.

Backwards compatibility

  1. snapshot hash becomes part of the consensus rule. Everyone needs to perform extra work to compute the hash
  2. block size increases by 32 bytes as we add snapshot hash to the script of coinbase input
  3. every node that generates the snapshot, has extra work (~20 min to produce the snapshot on the current bitcoin UTXOs)
  4. extra disk space usage for full nodes, as they keep the whole chain + 5 snapshots.

Reference implementation

Implemented in Unit-e

Changelog

  • 2019-04-16 Add chain_work and replace is_coin_base with tx_type
  • 2019-04-16 Changed status to proposed.
  • 2019-01-09 Extract meta fields from snapshot message to snaphead
  • 2018-12-12 Moved to UIP repository as UIP-11
  • 2018-11-06 Accepted as ADR-11

Copyright

This document and all its auxiliary files are dual-licensed under CC0 and MIT.

You can’t perform that action at this time.