Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chain recovery startup allows economic recovery first #4317

Open
rowgraus opened this issue Jan 18, 2022 · 8 comments
Open

Chain recovery startup allows economic recovery first #4317

rowgraus opened this issue Jan 18, 2022 · 8 comments
Assignees
Labels
cosmic-swingset package: cosmic-swingset SwingSet package: SwingSet

Comments

@rowgraus
Copy link

rowgraus commented Jan 18, 2022

What this means to me is:

  • when our chain halts (because of some bug, or maybe a planned upgrade), it may be offline for a while until we get a fix ready
  • the rest of the world moves on anyways, so things like asset prices can change considerably by the time our chain wakes back up
  • from the chain's point of view, no time passes, so it observes a sudden discontinuity in the asset price
    • this doesn't happen until the first price-oracle message arrives after chain restart
  • if I can submit a txn just as the chain is restarting, I might be able to take advantage of the chain's temporary ignorance of the new state of the world
  • everyone outside the chain knows when it is about to restart, and everybody wants to take that advantage, so we're likely to get a flood of messages all fighting with each other to claim the "prize"
  • those messages would compete with the actual price-oracle signal, making the problem worse

The fix we talked about was to do a "soft restart", in which the chain is told that it is restarting, and for the first minute or so, it does not accept any messages other than economy-critical price-oracle signals.

We'd implement this with the #5334 backpressure mechanism which controls ingress at the mempool/txn level to exclude non-oracle-signed transactions from blocks during the restart window, plus some code in the new version that knows when this window starts and ends. If the chain halted just after block 100, such that the next block executed will be 101, then our replacement/upgraded software should have something in cosmic-swingset that does:

if (blockHeight === 101) {
  disableNonEconomicTxs();
} else if (blockHeight === 111) {
  enableNonEconomicTxs();
}

to give roughly 60 seconds for the economic engine to get prepared for user requests. We'd also need to ensure that the oracle price signals / etc can be delivered during that window, even if user requests are flooding the RPC servers/etc.

We might consider making this more explicit: let the vats that manage vaults give a signal when they believe they're up to date, and disable non-economic messages until that point. That might mean control over the non-economic admissibility should be made available to userspace, which would be.. exciting. It would also want a way for the cosmic-swingset layer to signal to those economy vats that we'd entered soft-start mode, and that the vats are responsible for exiting it when they're ready.

if (blockHeight === 101) {
  disableNonEconomicTxs();
  controller.queueToVat(economy, 'economyPaused');
  // economy vats will re-enable the non-economic txs after getting a price update
}

@rowgraus points out that this feature could easily consume more effort than it warrants, and/or could expose more of an attack surface than it addresses, and I agree. I think we'll need to invoke our economist friends for advice too.

@rowgraus rowgraus added this to the Mainnet: Phase 1 - RUN Protocol milestone Jan 18, 2022
@warner warner changed the title Chain recovery bootstrap order is clear Chain recovery startup allows economic recovery first Jan 18, 2022
@warner warner added the MN-1 label Jan 27, 2022
@Tartuffo
Copy link
Contributor

@dtribble Is this needed for Mainnet-1?

@warner
Copy link
Member

warner commented Jan 27, 2022

@dtribble is this a MN-1 thing?

@Tartuffo
Copy link
Contributor

Tartuffo commented Feb 5, 2022

@warner For proper project planning and tracking, this needs an area label covered by one of our weekly planning meetings. Please pick the appropriate one from: agd, agoric-cli, agoric-cosmos, amm, core economy, cosmic-swingset, endo, ertp, getrun, governance, installation-bundling, metering, oracle, pegasus, run-protocol, ses, staking, swingset, swingset-runner, tc39, token economy, tooling, ui, wallet, xsnap, zoe, zoe contract

@warner warner added cosmic-swingset package: cosmic-swingset SwingSet package: SwingSet Epic needs-design labels Feb 7, 2022
@warner
Copy link
Member

warner commented Feb 7, 2022

next step: have a meeting to figure out a design

@Tartuffo Tartuffo removed the MN-1 label Feb 7, 2022
@Tartuffo Tartuffo removed this from the Mainnet: Phase 1 - RUN Protocol milestone Feb 8, 2022
@warner warner assigned mhofman and unassigned warner Feb 22, 2022
@Tartuffo Tartuffo added this to the Mainnet 1 milestone Mar 23, 2022
@mhofman mhofman removed the Epic label Apr 6, 2022
@mhofman
Copy link
Member

mhofman commented Apr 6, 2022

@rowgraus can we make this a regular issue? I believe I don't have the rights to do that.

@Tartuffo
Copy link
Contributor

Tartuffo commented Apr 6, 2022

@mhofman I was able to convert it to a regular issue.

@mhofman
Copy link
Member

mhofman commented Apr 7, 2022

We discussed this in the kernel meeting today. The summary of the discussion:

  • The high priority economic elements should be given a lever to indicate they are ready to process low priority requests
  • Once all high priority elements are ready, the kernel will be instructed to resume processing of low-priority queues
    • until then, low-priority queues would be left unprocessed, even if all high priority queues were empty
  • cosmic-swingset should be able to reject transactions from the mempool under certain conditions, probably through an admission handler
    • this could be controlled by an explicit lever, and/or automatically once the swingset queues reach a certain size

On a start after halt scenario, we'll need the swingset low-priority handling to start disabled.

We'll also likely want any low priority messages to be rejected at the cosmos layer until the kernel is ready to process low priority messages again, arguing for an explicit version of the lever for that mechanism.

An economic contract which relies on price oracles could decide they're ready once they've received a second price update (after it acknowledged the first update which may have been stale, and then oracle sent a now up-to-date quote).

@Tartuffo
Copy link
Contributor

Tartuffo commented Apr 8, 2022

@mhofman Can we remove the in-design label from this one?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cosmic-swingset package: cosmic-swingset SwingSet package: SwingSet
Projects
None yet
Development

No branches or pull requests

4 participants