BEP-18: State sync enhancement
- BEP-18: State sync enhancement
This BEP describes state sync enhancement on the Binance Chain.
State sync is a way to help newly-joined users sync the latest status of the binance chain. It syncs the latest sync-able peer's status so that fullnode user (who wants to catch up with chain as soon as possible with a cost that discards all historical blocks locally) doesn't need sync from block height 0.
BEP-18 Proposal describes an enhancement of existing state sync implementation to improve user experience. The status of the blockchain that can be synced is represented in a "snapshot", which consists of a manifest file and a bunch of snapshot chunk files. The manifest file summarizes version, height, and checksums of snapshot chunk files of this snapshot. The snapshot chunk files contain encoded essential state data to recover a full node.
This BEP introduces the following details:
- What's the procedure to take a snapshot
- What's the procedure to sync snapshot from other peers
- Snapshot (manifest, snapshot chunks) format
This BEP is already implemented.
We propose this BEP to enhance full node user experience (and ease their pain) on using state sync because of the following implementation limitations.
Users complain most about state syncing testnet is very slow and usually stuck on some requests.
In this enhancement, we want data to respond more evenly across peers so that syncing can continuously make progress and the overall syncing time can reduce from 30 - 45 min to around 5 min.
Interruption during state sync (node process get killed because of reboot computer or user impatience) would make already synced data in vain (because the current full node doesn't persist synced part on disk). Worsely it mistakenly writes a lock file prevents user state sync again.
In this enhancement, we want support break-resume downloading and keep the consistent status for arbitrarily restart.
State sync will download manifest and snapshot chunks from other peers.
5.1 Take snapshot
There are two ways to take snapshots from a fullnode: automatically or manually. Snapshots will be put under
$HOME/data/snapshot/<height>. All types involved in the snapshot are encoded by go-amino and compressed by snappy. More details will be explained later.
- To make fullnode automatically take snapshots, just make sure
$HOME/config.tomlis set to true. When set automatically snapshot, the fullnode will take a snapshot for blocks with a blocking time of 00:00 UTC each day. No snapshot will be taken for any other blocks during the day.
- To manually take snapshots, stop the node if it is running, then run
./bnbchaind snapshot --home <home> --height <height>.
If the snapshot taking procedure is interrupted, the node will be still in good status, but it cannot provide the interrupted height for other peers to sync.
Note: Automatic snapshot files will keep occupying disk space. Fullnode would not delete them automatically, so the user should periodically delete unneeded snapshots manually if they want to save disk space.
5.2 Sync snapshot
Syncing snapshot is designed to be only run once during full node first start. To enable state sync from others,
state_sync_reactor should be true and
state_sync_height should be set to non-negative (default
-1 means disable syncing from others).
If a user wants to sync from (majority) peers' latest sync-able height, they should set
state_sync_height to 0.
Stop and restart fullnode during state sync is allowed. The next time full node is started, it will resume by loading Manifest and downloaded snapshot chunks then download missing snapshot chunks.
Once state sync is successful, a
STATESYNC.LOCK file will be created under
$HOME/data to prevent state sync next time.
5.3 Manifest format
Manifest serves as a summary of snapshot chunks to be synced. It also maintains the order and types of snapshot chunks. Fullnode firstly asks peer's for the manifest file at the beginning of state sync and will trust majority peers with the same manifest.
SHA256 hash sum of each chunk synced will be checked against the hash declared within the manifest file.
|Height||int64||height of this snapshot|
|StateHashes||SHA256Sum||hashes of tendermint state chunks|
|AppStateHashes||SHA256Sum||hashes of app state chunks|
|BlockHashes||SHA256Sum||hashes of the blocks in this snapshot, currently only the block of requested height is synced. This synced block is needed mainly to make sure local databases are consistent with each other after state sync. It also provides block metadata like a timestamp for tendermint abci application.|
|NumKeys||int64||number of keys for each sub-store.
5.4 Snapshot chunk format
5.4.1 App state chunk
App state chunk includes iavl tree nodes. Usually, each app state chunk takes up to 4MB serialized iavl tree nodes (before snappy compression).
Iavl tree node bigger than 4MB is split into different incomplete chunks, that's where
Completeness field effect.
|StartIdx||int64||compare (startIdx and number of complete nodes) against (Manifest.NumKeys) we can know each node should be persisted to which application db's sub-store.
An app state chunk whose
After above chunk, there might be 4 app chunks whose
|Completeness||uint8||flag of completeness of this chunk, not enum because of go-amino doesn't support enum encoding.
possible values: 0 (Complete), 1 (InComplete_First), 2 (InComplete_Mid), 3 (InComplete_Last)
the InComplete flags are used to identify continuous large nodes' boundary.
|Nodes||byte||iavl tree serialized node, one big node (i.e. active orders and order book) might be split into different chunks (they share same StartIdx with different completeness flag), the order is ensured in the manifest file|
5.4.2 Tendermint state chunk
|Statepart||byte||current tendermint state|
5.4.3 Block chunk
|Block||byte||amino encoded block|
|SeenCommit||byte||amino encoded Commit
we need this because Block keeps seen commit for the last block. To save this block, we need to load and pass it in the same way it was saved
5.5 Operation Suggestion
As mentioned in section 5.1 Take snapshot, fullnode cannot delete snapshot directories (
$HOME/data/snapshot/<height>) automatically. This needs to be noticed by full node users who enabled
state_sync_reactor. Either run a script periodically delete the snapshots or turn off
state_sync_reactor(if they want to be selfish!) should be considered.
Once state sync succeeds, later full node restart would not state sync anymore (in case the local blocks are not continuous).
But if users do want state sync again (don't care that there are missing blocks between last stop and latest state sync snapshot height) and he wants to keep already synced blocks, he should delete
The content is licensed under CC0.