Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

deterministic handling of adversarial code calling console.log with a Proxy #1852

Closed
warner opened this issue Oct 9, 2020 · 9 comments
Closed
Labels
enhancement New feature or request logging SwingSet package: SwingSet
Milestone

Comments

@warner
Copy link
Member

warner commented Oct 9, 2020

What is the Problem Being Solved?

@erights is close to landing a console wrapper into SES (endojs/endo#440 and endojs/endo#447), after which calling lockdown() will change the behavior of Error and console.log. In the new behavior, creating an Error will stash additional stack trace information (including deep stacks) in a private WeakMap where it cannot be seen by code that merely has the Error object. but the new console.log does get access to that stash, so when you console.log(error), your stdout will get the hidden details. This makes debug information (which is not necessarily deterministic, and probably exposes internal details of everyone on the call stack) unobservable to user-level code.

We generally treat console.log as a write-only channel: it is safe to whisper your internal secrets to the log, because only a system-level debugger tool can see them. User-level code should not be able to tell what console.log is doing: it might hold on to the objects to be rendered later, it might convert them into strings immediately, or it might just ignore them outright.

However, sneaky user-level code that submits a Proxy (or an object with accessor properties) to console.log can distinguish between these cases, by watching for their properties to be read.

This introduces a read channel by which user-level code can "read" data out of the kernel (or whatever is providing the supposedly-write-only console object). We need to prevent this from enabling non-deterministic behavior in the user-level code.

The way to think about this is that each layer of our platform is required to be a deterministic computation of some specified set of input data. Vat behavior must be a deterministic function of the transcript inputs, so that we can achieve orthogonal persistence by replaying the transcript. A swingset kernel that lives in a Cosmos-SDK blockchain (specifically the data and messages it publishes through Cosmos) must be a deterministic function of the transaction messages it receives from Cosmos, to enable replicated consensus validation.

When one layer has access to data that is not part of the specified input of the next-(weaker-) layer up, that data represents a source of non-determinism for the upper layer. The lower layer can use that data for its own purposes just fine, but it must carefully defend against allowing this data to leak into the upper layer.

console.log() has no return value, but if the caller can sense something about the implementation, then the "data" being read is that knowledge about what the implementation chooses to do. And if those choices are based upon inputs that are not supposed to be available to the vat layer, this would enable the vat to behave in ways that are non-deterministic relative to the suposed vat inputs.

The pattern for defending against this is very similar to hiding secrets from objects outside some boundary, but more subtle. Any observable difference in behavior would leak the non-determinism. Proxy and accessor properties increase the upper layer's ability to observe behavior in the lower layer.

on-chain console.log should be a immediate NOP

@erights argued that when a SwingSet is running on-chain, the console object we give to vats should immediately return without touching its arguments. Anything else might reveal to the vat if/when the lower-layer code does something with those arguments, and commits us to doing exactly the same set of accesses in all consensus-critical situations. For example, a validator running their node with logging turned on would allow vats to behave differently than on a node with logging turned off, which would allow vats to violate consensus and cause slashing for some subset of the validators.

What, then, is the point of leaving console.log statements in vat code? His idea is clever: we use deterministic replay of the vats in a separate, non-consensus environment, in which console.log actually does something. I've been working (#1359) on a scheme to gather enough data from the running chain to let us retroactively replay one or more vats on a local machine (under a debugger). We made vats deterministically replayable to facilitate orthogonal persistence (replaying the transcript in a very similar environment to the original), but a lovely side-effect is that we can replay the transcript in a different environment too: under a different JS engine, with a debugger attached, or with more logging enabled.

So the idea is that our chain nodes (validators) route all console methods to a stub that returns immediately without ever examining the arguments (so no Proxy hooks get triggered). But we run an extra non-voting follower node, which does have console.log turned on. We record the console messages it emits (probably through the slog.makeConsole object, which adds vatID and deliveryNum so we can correlate them with which message is being processed), and publish the results in a block-explorer -style tool.

The vat running in the follower node might be adversarial and use a Proxy to discover that it is running in this different environment, and could choose behave differently than when on-chain. We can compare some amount of behavior (syscalls) against what happened on the chain, to limit the deviation. But the vats running in the validator nodes will all have access to the same data (nothing), so they'll all behave the same, maintaining consensus, even adversarial vats which are trying to sense the kind of machine they're running on.

The uptake is that we'll give off-chain vats access to non-determinism via the observable treatment of objects passed into the console methods. Our programming style rule is "don't do that", but we only enforce this by withholding the non-determinism when running in the consensus-critical environment.

Other channels

We need a comprehensive analysis of all points of interaction between the vat and the host it runs in. Everything that crosses this boundary is a potential source of non-determinism, and we must specify exactly what can be observed from the vat and what cannot.

All vat syscalls are defined in terms of a data-only API, which ought to limit how sneaky the vat can be. Some vat workers live in separate (unix) processes entirely, and thus can only communicate with the kernel through a data pipe. But others share a process with the kernel, which means a non-data object might make it far enough into the kernel to sense how it gets accessed. For example, the vat might submit a Proxy as the capdata args argument to syscall.send, and monitor if/when the slots property is accessed.

Even for vats in separate processes, they interact with local code (like liveslots, or the code that serializes syscalls into messages to send over the pipe) which are outside the vat boundary. We must make sure this local code does not leak nondeterminisim into a vat.

Security Considerations

This is all about security. In particular, vat code must not be able to cause validator slashing or consensus faults (at least not when running on a chain). To achieve this, vat code (in a chain) must not be able to sense anything that isn't part of the execution model (which is what all the other validators are running and comparing against).

@warner warner added enhancement New feature or request SwingSet package: SwingSet labels Oct 9, 2020
@erights
Copy link
Member

erights commented Oct 9, 2020

When one layer has access to data that is not part of the specified input of the next-(weaker-) layer up, that data represents a source of non-determinism for the upper layer. The lower layer can use that data for its own purposes just fine, but it must carefully defend against allowing this data to leak into the upper layer.

OMG this is a good explanation of how "non-determinism" is relative to each layer of abstraction! I did not have words for this before.

@erights
Copy link
Member

erights commented Oct 12, 2020

endojs/endo#487 is for the SES and console level mechanisms needed. With that separated out, this issue should be understood to be about the SwingSet changes needed to make use of these mechanisms for deterministic replay.

@warner
Copy link
Member Author

warner commented Mar 7, 2021

#2519 is about adding a "consensus mode" flag to swingset (maybe as part of runtimeOptions). This will configure console.log to be a no-op, among perhaps other things.

@warner
Copy link
Member Author

warner commented Jan 26, 2022

@michaelfig recently landed #4364 which makes the console object available to worker vats respect a consensusMode flag, and become a complete nop when running on the chain.

@michaelfig
Copy link
Member

@warner, I'm reexamining this discussion, and I believe I have a counterpoint to the following:

@erights argued that when a SwingSet is running on-chain, the console object we give to vats should immediately return without touching its arguments. Anything else might reveal to the vat if/when the lower-layer code does something with those arguments, and commits us to doing exactly the same set of accesses in all consensus-critical situations.

I would argue that debugging information can be useful, and having its availability doesn't commit us to doing exactly the same things any more or less than returning immediately. Indeed, the fact that a given vat worker does anything different than another type should be accommodated, and speaks nothing about what that vat worker should or should not do.

With do-nothing logging, debugging a chain node looks like this:

  1. Run your code on the chain node, keeping consensus with the other nodes
  2. Run a second process to follow that node with debugging enabled, and hope that it doesn't diverge too much
  3. Look at the resulting log output

With do-something logging, debugging a chain node looks like this:

  1. Run your code on a chain node with $DEBUG enabled, keeping consensus with the other nodes
  2. Look at the resulting log output

This do-something approach is the same used by Cosmos everywhere: an archive node, a validator, a follower, a seed node all keep consensus, so you don't have to worry about turning those features on or off. I would argue we should do the same.

I do agree that we should also work on improving the debugging experience, but I think this situation has made day-to-day debugging more difficult than it needs to be.

@erights
Copy link
Member

erights commented Mar 8, 2022

Hazards to worry about:

GC decisions will almost certainly diverge

Snapshots are likely to diverge

Metering may diverge. If we're careful, perhaps metering should not diverge. But we have not yet tested it against divergent gc decisions, much less console changes.

Divergent metering will cause meter exhaustion to be outside consensus.

@mhofman
Copy link
Member

mhofman commented Mar 8, 2022

I think @michaelfig's point is that if we always process the log call, there is no source of divergence anymore. We pay a cost upfront to remove the complexity for nodes which wish to print the vat logs.

@mhofman
Copy link
Member

mhofman commented Apr 12, 2022

@michaelfig , is this done with the new object-inspect based serialization?

@michaelfig
Copy link
Member

is this done with the new object-inspect based serialization?

Yes, actually it was done for the old stuff, too. I'll close this issue.

@Tartuffo Tartuffo added this to the Mainnet 1 milestone Apr 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request logging SwingSet package: SwingSet
Projects
None yet
Development

No branches or pull requests

5 participants