Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Store only essential Merkle tree snapshots #2043

Closed
Tracked by #2007
yito88 opened this issue Oct 24, 2023 · 6 comments · Fixed by #2050
Closed
Tracked by #2007

Store only essential Merkle tree snapshots #2043

yito88 opened this issue Oct 24, 2023 · 6 comments · Fixed by #2050

Comments

@yito88
Copy link
Member

yito88 commented Oct 24, 2023

Currently, we store a lot of Merkle tree snapshots. In the current testnet, the size of a snapshot was over 2GB and the 60 snapshots took over 100GB.

The Merkle tree has 4 subtrees under the Base tree; Account, PoS, Ibc, and BridgePool. Base tree has roots of subtrees. Each subtree has hashed key-value pairs. We store Base tree snapshot at every height(block) for the root hash. 4 subtree snapshots are stored every epoch for the readable period (60 epochs in the testnet).

// Merkle tree
{
let prefix_key = prefix_key
.push(&"tree".to_owned())
.map_err(Error::KeyError)?;
for st in StoreType::iter() {
if *st == StoreType::Base || is_full_commit {
let prefix_key = prefix_key
.push(&st.to_string())
.map_err(Error::KeyError)?;
let root_key = prefix_key
.push(&"root".to_owned())
.map_err(Error::KeyError)?;
batch.0.put_cf(
block_cf,
root_key.to_string(),
types::encode(merkle_tree_stores.root(st)),
);
let store_key = prefix_key
.push(&"store".to_owned())
.map_err(Error::KeyError)?;
batch.0.put_cf(
block_cf,
store_key.to_string(),
merkle_tree_stores.store(st).encode(),
);
}
}
}

However, only IBC and EthBridge require the Merkle proof though the Merkle tree snapshot is used for proof generation. The Account and PoS snapshots are never used (except for node restart).

So, we need to store only Ibc and BridgePool tree snapshots for the period and store Account and PoS tree snapshots only for the last epoch for restarting.
The changes would be about:

  • Check if the snapshots should be stored or not when each block commit
  • Restore a Merkle tree
    • When all snapshots are loaded
    • When not all snapshots are loaded
      • Load the last snapshots and restore the tree by diffs

It would drastically reduce the storage size. The size would be from 100GB to 4GB.

@cwgoes
Copy link
Contributor

cwgoes commented Oct 25, 2023

Thanks! We should definitely do this. Can you also estimate the storage size as an equation, e.g. dependent on what numbers of particular objects (accounts, proof-of-stake bonds, etc.) we're storing at any particular time? I would like to understand the asymptotic bounds.

@yito88
Copy link
Member Author

yito88 commented Oct 25, 2023

The size of a Merkle tree(our sparse Merkle tree) store is
$N \times 379 Bytes$
where $N$ is the number of key-value pairs in the tree.
$N$ depends on the transactions and PoS.

According to the investigation of v0.22.0, at Height 96319 Epoch 1605,

  • The number of PoS-related pairs is 3.46M pairs => 1.22GB
  • That of others (3 subtrees including Ibc and BridgePool) is 40K pairs => 15MB

The size of each store:

  • PoS: 1.22GB
  • Other 3 subtrees: 15MB

The size of snapshots for 60 epochs:

  • PoS: 1.22GB (for only the last one)
  • Others: 900MB (Actually, it is less because Account shouldn't be included)

The total size of the Mekle tree snapshots would be 2.1GB. (In the testnet with v0.22.0, it was 54GB)

@cwgoes
Copy link
Contributor

cwgoes commented Oct 25, 2023

Thanks, this is helpful, but I'd like to see a more specific breakdown in terms of what data structures are actually stored by PoS, e.g.

# PoS pairs = a * num bonds + b * num validators ...

Could you put that together?

@tzemanovic
Copy link
Member

Regarding the PoS data, it would be better to go with v0.24.0 as before we weren’t trimming old data properly so many fields were growing unbounded. This was fixed in #1944.

Each epoched data type is now trimmed to a configured number of past epoch - the last type parameter in Epoched/EpochedDelta type in https://github.com/anoma/namada/blob/v0.24.0/proof_of_stake/src/types.rs#L30. The various offset values can be found here: https://github.com/anoma/namada/blob/v0.24.0/proof_of_stake/src/epoched.rs#L1048

Bonds and unbonds are not being trimmed as we’re applying slashes lazily (on withdrawal) so we need to preserve their start epochs, but there can be only one record per unique (delegator, validator) pair per epoch.

We provide a more detailed breakdown of the number of stored fields in terms of parameters if needed. This might be a good addition for updated specs.

@sug0
Copy link
Contributor

sug0 commented Oct 26, 2023

we might be able to withgo storing multiple bridge pool merkle trees, as well. we only need to keep trees from the latest root that has been signed with a quorum signature onward. so, suppose the latest merkle tree nonce is $5$, but the latest signed root's nonce is $3$; then, we can delete all the trees signed with a nonce of $0$ through $2$, whilst upholding the liveness of the bridge.

@yito88
Copy link
Member Author

yito88 commented Oct 27, 2023

@sug0 Thanks for your suggestion. In this context, we are dealing with snapshots of each subtree. I think we should delete entries in each subtree and the storage subspace in a separate process.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants