-
Notifications
You must be signed in to change notification settings - Fork 199
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
new state-management approach #175
Comments
Hrm, I need a different approach. The state vector is the real core here, as represented by the |
This now works, I'll be landing that PR in a minute. I didn't end up making the "initialize the state" function.. instead, if the kernel is built from a state object that lacks the There's a new |
in the old repo. this was SwingSet issue 58 |
After we land Agoric/SwingSet#57, all the kernel state will be stored in a single key-value store, which is addressed through get/set messages so it can live in the outer "primal" realm (the get/set messages deal entirely with strings, which are safe to pass cross-realm).
The old state-management approach was to occasionally ask the controller for the entire kernel state, which it returned as a big JSON object, which could then be serialized to disk. At startup time, if the config object contained state, the kernel would be told to
loadState(config.state)
before it did anything else: this would populate the various tables and also replay the vat transcripts (to bring the javascript object-graph state back up to date). If there was no saved state, the bootstrap function would be called instead.The new approach I'm thinking of is:
src/controller.js
exports a function to build a kvstore object that wraps a file on disk, with a method that writes the full contents to disk. We also provide an empty state (in particular, the vat transcripts are empty) somehow.buildVatController
is constructed with a mandatory kvstore object (either the disk-wrapper, or something that puts state into the cosmos-sdk durable/provable kvstore)buildVatController
using state/no-state to switch betweenloadState
andcallBootstrap
, it always just callskernel.startup
. Thenkernel.startup
processes the vat transcripts (which might be empty), then looks at the flag to see whether bootstrap has been run yet or not, and synthesizes/enqueues the bootstrap message if so.get
at the same timeIn a solo-machine environment, the startup code should either create an empty state object, or build one from the file on disk. Then, after it cycles the kernel each time, it should tell the kvstore to save itself to disk.
In a chain environment, all state is read directly from the chain's kvstore, and changes are processed immediately. We don't need any special post-kernel-cycle
save
call.Lazy State Loading for Chain Machines
Our current plan for the cosmos-sdk integration is to defer creating the swingset environment until the first time the
x/swingset/handler.go
handler is invoked, which will occur some time after the chain node is launched, when a txn containing a swingset message is processed. (if most of the cosmos-sdk messages are to modules other than swingset, this could be a rather long time). At that moment, the handler will deliver thedeliverInbound
message over to the node.js side, which will realize that it doesn't have a kernel/controller to deliver into, and it will construct them. During construction, it will runkernel.startup
, which will rebuild the javascript environment by re-delivering all the vat messages that were recorded in the kvstore state. It can read this kvstore state because we're in the middle of a transaction: that state was unavailable during process startup, so we can't build the swingset module (or rather we can't inject its state) any earlier.This replay step could take an unpredictable amount of time, since our orthogonal-persistence approach requires us to replay those transcripts, which grow with the age of the vat (rather than with the size/number of objects in those vats). This might interfere with the chain node's ability to validate/vote in a timely fashion: any node that has been restarted since the last swingset message will take a lot longer than the ones that still have that javascript state intact. In the worst case, this could result in slashing as penalty for not voting quickly enough.
It might be nice to reduce this unpredictability by pre-loading as much of the JS state as possible. Our thought is to start with adding a sequence number to the kernel state (maybe counting turns, maybe counting additions to the runqueue).
Then, we manage a separate copy of the kernel state in a file on disk, outside of the normal cosmos-sdk kvstore. We need this separate copy because the kvstore is only available to handlers during the processing of a transaction (the Keeper knows whether it is processing a CheckTx or a DeliverTx, and provides different state objects in the two cases). To load the kernel from an earlier state during node startup, that state must come from the disk. But that state might not match what the node is really using, since we don't get notified when blocks are finished.
To deal with that, we store the messages that provoke turns along with the sequence number of the turn that results, and we can replay these messages to roll forward from the disk-based state to whatever the actual kvstore contains.
Nominally, just before delivering each message to the node.js side, we pull the full kvstore state (including the sequence number) and write it to disk. We can get this state because we're inside a transaction. We don't really want to do this every single message, as it's a lot of data, so we decimate the data in two ways. First, we use the
context
object to find out what the current block height is, and we only consider writing a new snapshot when that value has changed. Second we only write snapshots once out of every N times (perhaps 100 messages).After delivering the message, we pull the seqnum from the kvstore (which should now be one larger than before), and append both the seqnum and the contents of the message we delivered to the on-disk file, as an array of
(seqnum, message)
tuples.At startup, we read the latest snapshot state from disk, and build a swingset instance from it. This will take a while, since we're replaying every single vat message, but this all happens before the cosmos-sdk node is ready for validation, so it's the best time to do it. We need to manage a short-lived kvstore object with this saved state for a while, separate from the cosmos-sdk's real kvstore.
Then, in the swingset handler, upon entry, we pull the seqnum from the real kvstore, and compare it to the one in the short-lived populated-from-disk kvstore. In general, the real one will have a newer seqnum, because our snapshot is somewhat old. At that point, we read (seqnum,message) pairs from the disk table and apply the messages until it results in the short-lived kvstore having the same seqnum as the real one. While these messages are being applied, only the short-lived kvstore should be modified.
When the seqnum catches up, both kvstores should have the same contents, and the JS state should be the same as it was when the cosmos-sdk node last processed a transaction. At that point, we should swap out the kvstores, leaving the real (cosmos-sdk) one in place, and discarding the short-lived one.
If, for some reason, the on-disk snapshot is too new, the handler can throw out the failed-speculation kernel, and start up a new one, replaying the entire vat transcripts, and just take the latency hit
The text was updated successfully, but these errors were encountered: