Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

agd does not support joining with state sync #3769

Closed
3 tasks
dckc opened this issue Aug 27, 2021 · 18 comments · Fixed by #7225
Closed
3 tasks

agd does not support joining with state sync #3769

dckc opened this issue Aug 27, 2021 · 18 comments · Fixed by #7225
Assignees
Labels
bug Something isn't working cosmic-swingset package: cosmic-swingset Epic needs-design performance Performance related issues SwingSet package: SwingSet vaults_triage DO NOT USE xsnap the XS execution tool
Milestone

Comments

@dckc
Copy link
Member

dckc commented Aug 27, 2021

Describe the bug

While there is a practice of sharing informal snapshots, the only in-protocol way to join an Agoric chain, currently, is to replay all transactions from genesis; this may take days or weeks. Contrast this with the norm in the Cosmos community:

With block sync a node is downloading all of the data of an application from genesis and verifying it. With state sync your node will download data related to the head or near the head of the chain and verify the data. This leads to drastically shorter times for joining a network.
-- State Sync | Tendermint Core

Other blockchain systems have similar features. In Bitcoin and Ethereum, software releases include a hash of a known-good state; this way, new nodes can download a state that is not more than a few months old and start verifying from there.

Design Notes

  • consensus on swingset kernel DB state: currently, the swingset DB state is not part of consensus; only the sequence of messages. Mnemonic: "you can think whatever you want, as long as you say the same thing that everyone else says". Fast sync most likely requires including (most of) the KB state in consensus Merkle tree proofs.
  • consensus on xsnap snapshots: currently, XS snapshots are not part of consensus; we don't require that all validators deterministically get exactly the same bytes in their snapshots. (In particular, xsnap + SES boot not deterministic #2776 is an observed case of non-determinism in snapshots). Fast sync most likely requires that we include snapshots in consensus.
  • cosmos-sdk hooks to publish swingset state: baking our own system is undesirable; we just need some hooks to be able to leverage the Cosmos/Tendermint mechanisms for shipping states from select RPC nodes to the joining node (Add hooks to allow app modules to add things to state-sync cosmos/cosmos-sdk#7340 (comment))

cc @michaelfig @erights

@dckc dckc added bug Something isn't working SwingSet package: SwingSet xsnap the XS execution tool labels Aug 27, 2021
@dckc dckc added this to the Mainnet: Phase 1 - Treasury Launch milestone Aug 27, 2021
@rowgraus
Copy link

Conservatively, I am putting it on phase 1. Feel free to postpone it as you see fit.

Tend to agree unless there are good arguments for postponing this

@dckc dckc removed their assignment Aug 30, 2021
@michaelfig michaelfig changed the title joining as a new validator is too slow without fast state sync joining as a new validator is too slow without "state sync" Sep 5, 2021
@dckc
Copy link
Member Author

dckc commented Oct 22, 2021

@dtribble suggests that as long as catching up is 3x to 5x faster than the running chain, we can postpone this to a later milestone.

possible optimization:

  • replay from snapshot in parallel

(I think we only replay on-line vats, of which there is a bounded number, so that optimization doesn't seem worthwhile)

@Tartuffo
Copy link
Contributor

Tartuffo commented Jan 27, 2022

First need to measure how long it currently takes based on estimated number of blocks, and or calc the blocks / time of recovery. Create sub-ticket for this initial measurement.

@dckc
Copy link
Member Author

dckc commented Mar 11, 2022

some mainnet0 data shows new nodes should eventually catch up, but it takes a long time.
#4106 (comment)

The validator community seems to prefer informal snapshot sharing.
Agoric/testnet-notes#42

@mhofman
Copy link
Member

mhofman commented Mar 14, 2022

A few quick thoughts:

  • Without state sync, catching up a new validator relies on the same execution paths as a live execution. Whatever work we would do to optimize the time it takes to catch up from scratch, especially at the SwingSet / JS level, would be optimization we'd do for normal execution.
  • That means if we ever get to a fairly high utilization* (regardless of parallelization), it won't be possible to catch up by mere replay in a meaningful amount of time (unless you throw a lot more compute power at the catchup node, to reduce utilization)
  • During the catch up time, the chain continues to make progress, which has to be caught up to as well. IOU a formula

*: My understanding is that the meaningful number to measure is the amount of time spent in Swingset, compared to time elapsed since genesis. That gives Swingset utilization. If you consider the cosmos processing to be negligible comparatively, you can then calculate time it'd take to rebuild all the JS state through catch up. It also gives a lower bound.

@erights
Copy link
Member

erights commented Mar 14, 2022 via email

@michaelfig
Copy link
Member

Can we replay each vat separately and in parallel?

We need "state sync" to jump to a snapshot of the kernel data close to the current block or else we can only replay then verify a single block at a time since genesis. That's really slow, even if we do more in parallel.

@dckc
Copy link
Member Author

dckc commented Mar 15, 2022

@warner further to the discussion we just had about trade-offs between performance and integrity of snapshots, as I mentioned, our validator community is doing some informal snapshot sharing currently: Agoric/testnet-notes#42.

I looked around and found that it seems to take about 3.5min of downtime to do a daily mainnet0 snapshot.
Some follow-up step to make the snapshot available seems to take significantly longer; I'm not sure what going on in there...

---------------------------

|2022-03-14_01:00:01| LAST_BLOCK_HEIGHT 4103449
|2022-03-14_01:00:01| Stopping agoric.service
0
|2022-03-14_01:00:01| Creating new snapshot
|2022-03-14_01:03:33| Starting agoric.service
0
|2022-03-14_01:03:33| Moving new snapshot to /home/snapshots/data/agoric
155G	/home/snapshots/snaps/agoric_2022-03-14.tar
|2022-03-14_02:19:47| Done
---------------------------

|2022-03-15_01:00:01| LAST_BLOCK_HEIGHT 4116868
|2022-03-15_01:00:01| Stopping agoric.service
0
|2022-03-15_01:00:01| Creating new snapshot
|2022-03-15_01:03:27| Starting agoric.service
0
|2022-03-15_01:03:27| Moving new snapshot to /home/snapshots/data/agoric
156G	/home/snapshots/snaps/agoric_2022-03-15.tar
|2022-03-15_03:06:14| Done
---------------------------

-- https://snapshots.stake2.me/agoric/agoric_log.txt

@Tartuffo Tartuffo added this to the Mainnet 1 milestone Mar 23, 2022
@Tartuffo Tartuffo modified the milestones: Mainnet 1, RUN Protocol RC0 Apr 5, 2022
@dckc dckc changed the title joining as a new validator is too slow without "state sync" agd does not support joining with state sync Oct 15, 2022
@dckc dckc added performance Performance related issues cosmic-swingset package: cosmic-swingset labels Nov 2, 2022
@dckc
Copy link
Member Author

dckc commented Nov 2, 2022

one validator notes:

the post upgrade mainnet snapshot is already nearly 2GB in size.

@warner
Copy link
Member

warner commented Nov 3, 2022

One data point: I watched a node crash today, it missed about 200s before getting restarted. The restart took 2m10s to replay vat transcripts enough to begin processing blocks again, then took another 33s to replay the 95-ish (empty) missed blocks, after which is was caught up and following properly again.

The vat-transcript replay time is roughly bounded by the frequency of our heap snapshots: we take a heap snapshot every 2000 deliveries, so no single vat should ever need to replay more than 2000 deliveries at reboot time, so reboot time will be random but roughly constant (depends on deliveryNum % 2000 summed across all vats).

Note that this doesn't tell us anything about how long it takes to start up a whole new validator from scratch.

@mhofman
Copy link
Member

mhofman commented Nov 14, 2022

After discussing state-sync the other day, @arirubinstein mentioned validators leverage state sync to work around a cosmos DB pruning issue: they start a new node state syncing from their existing node to prune their DB.

In case for some reason we can't figure out state sync by the time the DBs grows out too large, we should check if the following rough hack may work:

  • shut down node at height of a block N that won't be pruned from cosmos DB.
  • Make copy of swingset state dir
  • Restart node
  • Start new node from swingset state copy, and using cosmos state-sync at same block height N to re-populate the IAVL tree.

For consistency protection, Swingset saves the block height it last committed, and checks that the next block it sees is either the next block N + 1, or the same block N (in which case it doesn't execute anything but simply replays calls it previously made back to the go / cosmos side).

@dckc
Copy link
Member Author

dckc commented Nov 28, 2022

... as long as catching up is 3x to 5x faster than the running chain ...

a recent data point: 26hrs to catch up on 26 chain days. So 24x.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working cosmic-swingset package: cosmic-swingset Epic needs-design performance Performance related issues SwingSet package: SwingSet vaults_triage DO NOT USE xsnap the XS execution tool
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants