Skip to content

Latest commit

History

History
115 lines (80 loc) 路 6.78 KB

06-compilation-model.md

File metadata and controls

115 lines (80 loc) 路 6.78 KB

IV. Compilation Model {compilation-model}

Distributed programs need to be deployed to the network to be executed. A successful deployment requires that there's a common executable format that every runtime environment in the network understands.

In this chapter, we define the compilation model, the process of translating source code to an executable, distributed program. The end product of this process is an executable in bytecode format, ready for deployment.

Translating Ambients programs

The Ambients protocol overall is programming language-agnostic. That means almost any programming language can be used to write distributed programs, as long as there's a compiler that can process the source language and turn it into the Ambients bytecode. While most common programming languages can be used, due to the protocol primitives, functions and types, functional languages are especially well-suited to write distributed programs.

Compilation model requires all compilers to:

  1. compile original source code to an intermediate abstract syntax structure (usually as in Abstract Syntax Tree)
  2. translate the intermediate structure to the computation primitives, distribution primitives and computation abstractions of the Ambients protocol
  3. generate the bytecode executable from the primitives

How these requirements are met is up to the compiler implementation. The compilers are free to apply a variety of optimizations and internal logic at compile-time. Generated bytecode and the correctness of the primitive translation are validated upon execution, as described in the Execution model.

Program Bytecode

It is important to ensure that programs deployed to the network keep the information hidden from the computation participants who don't need to access it. This is one of the key properties of the execution model and one of the requirements is that programs can be sliced into their parallel sub-parts, so that only a minimal part of the program is exposed to the other participants. The compilation model, and its implementation, the compiler, satisfies this requirement by producing a bytecode representation for every unique ambient and their nested ambients as the compiler output.

The program instructions, each parallel sub-part of the program (a "slice"), and their call-order, are represented as a DAG and saved to a content-addressed storage, as a Merkle-DAG, giving each program and their sub-parts a unique hash. Using this hash, the program can be fetched from the network and referenced by the programs. Storing the bytecode as a Merkle-DAG, we can be assured that upon fetching the program from the network, the bytecode hasn't been tampered with. By sharing the hash of the bytecode of the program, the program can be discovered in the network and included in other programs as a dependency.

The Bytecode Format

As described in the previous section, the bytecode is a sequence of compact instructions to execute the program according to the Execution model discussed in later chapters.

The bytecode is a binary format which defines the ambients and their movement as "opcodes" that are executed on "targets". The bytecode expressions are, then, a sequence of instructions encoded as tuples of

(<opcode>, <target>)

The definitions of the individual elements are discussed in Opcodes and Targets and the relation between the tuples, i.e. the call order of instructions, is discussed in Instruction Order. The exact bytecode format will be later specified in the detailed protocol specification. In this paper, we sketch the high-level structures and formats.

The purpose of the bytecode encoding is to:

  • keep the programs as compact, efficient, and distributable as possible
  • make the execution order unambiguous for easier verification, and capture the sequential and parallel instructions

In the future, there is potential to:

  • Write a compiler with the protocol, i.e. verifiable compilation
  • Embed a compiler in the VM (JIT-like compiler)

Opcodes

The opcodes capture the type of the instruction to be executed. We first define a set of opcodes for the events specific to the execution model and the opcodes for the Robust Ambient calculus terms, the capabilities and co-capabilities:

  • 0: create
  • 1: deploy
  • 2: in
  • 3: in_
  • 4: out
  • 5: out_
  • 6: open
  • 7: open_

We then define opcodes for the computation and distribution primitives of the protocol:

  • 0: func
  • 1: call
  • 2: arg
  • 3: return

Targets

We continue with the definition that the target in the (<opcode>, <target>) tuple is either the opcode for the primitive or the name of the target ambient. For the co-capability open_, the target is not used - instead, always use 0 as the target opcode. That is, open_ compiles to (7, 0).

Instruction Order

We finish by defining how the expected execution order of the instructions is captured in the bytecode. As each step in the program is represented by the tuple (<opcode>, <target>), we construct an Execution DAG where the instruction tuples are the nodes of the DAG. Each instruction, a node in the Execution DAG, has an edge directed to the previous instruction of the program, which forms a causal order between them. As DAGs can branch and join, the Execution DAG captures both the sequential and parallel instructions.

For example, the program call[out a.in b|open_] would be represented as four execution steps:

("create", "call")
("out", "a")
("in", "b")
("open_", 0)

They would form the following Execution DAG:

      ("create", "call")
             |
            / \
           /   \
("out", "a")   ("open_", 0)
      |
("in", "b")

Using the previously defined opcodes, the instructions produced by the compiler are:

(0, 1)
(4, "a")
(2, "b")
(7, 0)

The first instruction (0, 1) can be read as "create an ambient called call". The second instruction (4, "a") maps to the capability out a followed by the third instruction (2, "b") for the capability in b. The open_ co-capability is captured in the fourth instruction (7, 0).

From the Execution DAG of the program, we can see how the DAG captures the sequential and parallel instructions, and divides the program into sub-parts:

      (0, 1)
         |
        / \
       /   \
(4, "a")   (7, 0)
   |
(2, "b")