Skip to content
This repository has been archived by the owner on Jan 24, 2024. It is now read-only.

Cannon Overview

protolambda edited this page Apr 24, 2023 · 12 revisions

This page now lives on the Cannon wiki


Note: this describes Cannon v0.2. Cannon v0.3 is in active development and introduces breaking changes.

For a shorter & sweeter overview, see Cannon High-Level Overview


minigeth

Minigeth is stripped down version of geth that removes the database, and adds a preimage oracle instead.

The preimage oracle takes a hash and returns its preimage. This is most notably used to read account values, storage values, and contract code.

Obviously, preimages are not computable, so minigeth either needs to query another node for the preimage, or use pre-supplied images (more on this below).

Running minigeth takes us from an input hash to an output hash.

  • The input hash is the hash, for a given block, of the concatenation of:
    1. the transaction root
    2. the coinbase
    3. the uncle root
    4. the gas limit
    5. the timestamp
  • The output hash is the hash, for the same block, of the concatenation of:
    1. the state root
    2. the receipts root

Minigeth can be run in two modes: MIPS or prefetch (aka x64 or PC).

In MIPS mode:

  • The input hash is read from the 0x30000000 MIPS memory, where it needs to have been put beforehand (e.g. by the Challenge.sol contract, or by one of the various RunXXX methods in mipsevm (see later)).
  • Account values, storage values and contract code are read using preimage oracle.
  • The preimage oracle uses pre-supplied preimages. See details in the mipsevm section below.
  • The output hash is written at the 0x30000800 MIPS memory address.

In prefetch mode:

  • Minigeth connects to another geth node (URL in env variable NODE, defaulting to Infura).
  • Minigeth will store the "prefetched" data (details below) in a root directory.
    • Typically, this is going to be /tmp/cannon/<chainID>_<blockHash>
    • But it is configurable, and doesn't need to be per-chain or per-block.
    • Going forward, we'll refer to this directory as <root>.
  • A block number is supplied on the command line.
  • The input hash is computed from data fetched from the node, for the supplied block number, and stored in <root>/input.
  • The preimage oracle queries the node for the preimages, and saves the preimage in the <root>/<hash> file.
  • The output hash is stored in <root>/output.

You can consult this issue to get a better sense of the diff between upstream geth and minigeth.

minigeth: Gnarly Details

  • The preimage oracle queries are split into one of the different PrefetchXXX call and a Preimage call.

    • In MIPS mode, prefetching is a no-op, and Preimage reads from a special location in the MIPS memory (see details in mipsevm section) but caches the value in a Go map.
    • In Prefetch mode, the PrefetchXXX calls query the node at most once and cache in a Go map, and Preimage just reads from the map.
    • The reason for the split between PrefetchXXX and Preimage is that it's necessary to know the "type" of preimage (storage slot, account value, ...) in order to query the node.
  • Within minigeth/oracle:

    • embedded_mips implements the oracle for MIPS mode.
    • All other files implement the oracle for prefetch mode.
  • Prefetch caching: Besides caching the preimages in a Go map during a single minigeth prefetch run, the result of node queries are also cached on disk in /tmp/cannon/json_<json_hash>, where json_hash is the hash of the JSON-RPC query. This speeds up subsequent runs for the same block.

mipigo

Go supports compilation to MIPS. However, the generated executable is in ELF format.

We'd like to get a pure sequence of MIPS instructions instead. For this purpose, there is a simple startup routine located in mipigo/startup/startup.s (the binary version is mipigo/startup/startup.bin).

The file mipigo/startup/compile.py create the new binary by doing the following:

  • (??) Generally jiggle things around?
  • Add the startup routine at the start of the binary.
  • Replace Go garbage collector activation by no-op instructions.

TODO: confirm all of the above with George

mipsevm

Since we don't have a MIPS machine, we can't run minigeth in MIPS mode directly. Either we can run it using the Unicorn library (which uses the QEMU CPU emulator), or on the Solidity MIPS interpreter (MIPS.sol), which itself runs on the EVM.

The mipsevm program responsability is precisely to load and run the MIPS binary, either via Unicorn on the EVM.

Besides this, mipsevm also needs to intercept the Preimage calls in order to return the pre-supplied preimages.

This interception is based on the Preimage implementation in minigeth/oracle/embedded_mips.go, which copies the hash at address 0x30001000, then calls os.Getpid() (syscall 4020), then reads the preimage from address 0x31000000 (the first 4 bytes are the size, followed by the preimage proper).

For this to work, we have to intercept the preimage request — either the system call or the read at 0x31000000 — and write the (previously supplied) preimage at this address.

The mipsevm program runs in one of three configuration:

  • unicorn
    • This configuration runs the MIPS directly on Unicorn, not on the EVM.
    • It hooks Unicorn so that upon a the os.Getpid(), we read the preimage from disk (<root>/<hash> — these are generated when running minigeth in prefetch mode).
    • This configuration can be run via mipsevm/main.go and takes two optional arguments.
      • blockNumber — used to derive the <root> as /tmp/cannon/0_<blockNumber>
      • target — a step number in the execution (each step is the execution of a MIPS instruction)
    • It will write a checkpoint based on the command line parameters.
      • A checkpoint is a Merkle trie containing the MIPS memory (as an address → 32bit value mapping).
      • This gets serialized in a JSON file containing the Merkle root, the step number, and a hash → preimage mapping (since every node in the trie is a hash of its children).
      • If the block number isn't provided, it will write the checkpoint corresponding to the initial memory state in /tmp/cannon/golden.json (notice the <root> isn't used here as the initial state is block-independant).
      • If the block number and a step number is provided, it will write the checkpoint corresponding to the memory state before the specified execution step in <root>/checkpoint_<step>.json.
      • If the step number isn't provided, it will write the checkpoint corresponding to the memory state after the last execution step.
  • chain
    • This configuration runs the Solidity MIPS interpreter (MIPS.sol) on the EVM.
    • MIPS.sol reads the preimage from the MIPSMemory.sol contract.
    • The preimages must have been supplied to the contract beforehand.
      • There is an example of this in mipsevm/compare_evm_to_chain_test.go.
  • evm
    • This configuration also runs the Solidity MIPS interpreter on the EVM.
    • The memory address in MIPS.sol is set to 0 (instead of a deployment of MIPSMemory.sol), and the (MIPS) memory is read via SLOAD.
    • The state DB implementation is modified (in mipsevm/minievm.go) so that reading the contact state is able to detect the os.Getpid() syscall and read the preimage from disk (/tmp/cannon/<hash> — generated by minigeth in prefetch mode).
    • TODO: "doesn't compute the merkle tree"

The chain configuration is the one we care about (it's how the single-step execution will be run). However, it's also pretty slow, it's useful to have the two other configurations as alternative trade-offs between speed and closeness to production environment.

Challenge Game

The last step of the challenge game is a single-step (i.e. single MIPS instruction) execution, performed in the style of the chain mipsevm configuration.

To run, this single step needs to access the MIPS memory: it will need to access the MIPS code to read the instruction, then some registers and/or memory locations to execute it. If reading from the 0x31000000-0x32000000 memory, we'll also need to access chain preimages.

In order to enter the single-step execution phase, both players need to have reached agreement on the root of a Merkle trie mapping the whole MIPs memory. The MIPS memory is read out from this trie using the MIPSMemory.sol contract (which has a single on-chain deployment).

This Merkle trie read works in the regular way, assuming a backing hash -> preimage database: the trie is traversed node by node along a path determined by the bytes in the key (in this case, the MIPS memory address). Each node in the trie is a hash whose childen can be obtained by retrieving the preimage of the node hash.

For this to work during the single step execution, the hash -> preimage mappings necessary for the trie traversal need to have been supplied in advance. This is done via the MIPSMemory.AddTrieNode(data) function, where the data supplied is the preimage (from which the hash is recomputed). These preimages can be acquired from a checkpoint generated by running mipsevm in unicorn mode.

Similarly, chain preimages must have been supplied if required, via the MIPSMemory.AddPreimage(data, offset) function. (Remember these preimages are acquired by running minigeth in prefetch mode.) The reason why these are handled by different function (despite both being simple hash -> preimage mappings) is that for the chain preimages, it's only necessary to store the 4 bytes starting at the specified offset (as a MIPS instruction can only ever ever touch this much data at a time). On the other hand, the MIPS trie node preimages are needed in full to perform the trie traversal. Separating both cases enables storage savings.

NOTE: This could be changed if we replace the Merkle trie with a simpler tree-hash structure. We strongly feel we should do this, as the Merkle trie logic is complex and brittle. It is easily feasible because every leaf in the MIPS memory is a fixed size (4 bytes) memory slot.