Skip to content

Latest commit

 

History

History
152 lines (102 loc) · 7.76 KB

background.md

File metadata and controls

152 lines (102 loc) · 7.76 KB

[up] [next]

Background

Binary and Complete Trees

We assume the reader is familiar with standard tree structures. Given a tree T, the height of T is the length of the longest path from the root of T of any leaf.

A tree is:

  1. a binary tree if every node that is not a leaf has at most two children and,
  2. a complete (or perfect) tree if all the branches of the tree from the root to the leaves have the same height h.

Incomplete/Complete Binary Trees

Figure 1: A complete binary tree of height 3. The tree has
15 = 23 + 1 - 1 nodes and 8 = 23 leaves.

A complete binary tree T of height h satisfies the following properties:

  1. the number of nodes in T is 2h + 1 - 1 and,
  2. the number of leaves in T is 2h.

Merkle Tree

The core component of the Deposit Smart Contract is the so-called Incremental Merkle Tree algorithm.

A Merkle tree is a binary and complete tree decorated with the Merkle (hash) attribute.

A node's decoration (or attribute in the sequel) is a value attached to a node. Using this definition, in a Merkle Tree, a node's attribute is the hash of the attributes of its children (the attributes on the leaves are given). Computing the attribute for all the nodes of a tree usually requires to walk the tree (up or down) and to explore all the nodes, for instance via a Depth First Traversal algorithm.

In the context of a Blockchain, a Merkle tree is used to compute a property of a collection of elements e.g. a list L. Given L, we first build a tree T(L) associated with L, such that the leaves of T(L) are the elements of L. The property of the list L is the value of the attribute (e.g. hash) at the root of T(L).

The value on the tree root can be used to test membership of an element in the list L. The is known as a Merkle Proof.

In this project we are not concerned with Merkle Proofs but rather in computing an attribute (e.g. hash) on the root of a tree that stores a list L.

Attributed trees

An attributed tree is a tree the nodes of which are decorated with a value of a given type, called the attribute. The hash attribute in Merkle trees belongs to the category of synthesised attributes:

A synthesised attribute on a (complete binary) tree is defined bottom-up: the values of the attribute are given on the leaves, and to compute the values on the internal nodes, the values of the left and right children are combined using a binary function.

For the sake of simplicity we assume that the attribute to compute on the tree T is a simple function diff, defined by: diff(left, right) == left - right - 1. Each leaf holds an integer value, and the value of an internal node is the diff function applied to the values of the left and right children.

The leaves of T contains the elements of a list L from left to right when traversed in-order (left, node, right). To represent a list L of n < 2h elements with a binary complete tree of height h, L can be right-padded with dummy or neutral elements (e.g. zeroes) to obtain L' of length 2h.

For instance, the tree below stores the list [3, 6, 2, -1, 4] in the leaves of the tree, and the value of diff at the root of the tree for this list is -11.

Storing a list in the tree leaves

Figure 2: The `diff` attribute computed on a tree. The value of an internal node (including the root) is computed by combining the values of the left and right children using `diff`.

The root value of T(L) is the value of the attribute diff at the root of T(L).

Incremental Computation of Synthesised Attributes on Trees

We assume a fixed given height h. Given a list L, we denote T(L) the decorated tree of height h that contains the list L.

The incremental Merkle tree problem is informally stated as:

Assume we have a collection that grows monotonically, e.g. successive lists L, L::e1, L::e_1::e_2, ... and so on (where :: is the append operator on lists). Can we incrementally (and efficiently) compute the successive properties of L, L::e1, L::e_1::e_2 i.e. the values of the root of the trees T(L), T(L::e1), T(L::e1::e2), ... ?

By efficiently, we mean without traversing the 2h nodes of the tree.

Inserting a value in a tree

Let L = [3, 6, 2, -1, 4]. Assume we want to compute the root value of the diff attribute for the list L::5. As depicted on Figure 3 below, this amounts to inserting the new value 5 at the next available leftmost leaf of the tree which is n6.

Impact of leaf change

Figure 3: Insertion of value `5` at leaf `n6`. The purple nodes' attribute values on the green path to leaf `n6` are impacted by the insertion. To compute the new root value, we only the inserted value, and the value of the nodes on the left (yellow) and right (orange) of the green path.

The green path above is the path from the root to the leaf where the new value is inserted.

Fact 1: the decorations on the trees T(L) and T(L::5) differ only on the green path.

Fact 2: to compute the root value of T(L::3) we only need the values on the nodes that are siblings of nodes on the green path. As depicted above, if we have the values of diff on the left (yellow) and on the right (orange) nodes, we can compute the root value of T(L::5).

Fact 3: The leaves (n7, n8) on the right of the green path all have the default value. As a consequence, the right (orange) nodes have a default value that correspond to their height. Height 0 is a leaf, and the value is the default value (e.g. 0 for our diff attribute). At height 1, the value is diff(0, 0) = -1, at height 2 it is diff(diff(0,0), diff(0,0)) = -1 and so on.

If we want to compute the new root value after inserting 5 at n6 we just need to walk up the green path and compute the diff attribute. This can be done in time O(h) (h is the height of the tree).

After inserting 5 at n6 in the tree we obtain the new tree depicted below in Figure 4:

Nodes needed to compute updated root hash

Figure 4: Insertion of another value at leaf `n7`.

The green path is also updated to be the path to the next available leaf (n7).

Incremental Computation of the Diff attribute

The result of the previous facts is that the computation of the new root value can be done using the values of the left and right siblings of the green path. They can be stored in two vectors the size of which is the height of the tree.

Now assume that when we insert a new value in the tree, we can also compute the new value of those vectors i.e. the values of the siblings on the path to the next available leaf.

In this case, we can iteratively compute the root value of the tree for a list e1::e2:: ...::en storing only the two vectors (of size h), and computing the new root value by walking up to current path in linear time (O(h)).

For the reader familiar with algorithm design, it is a typical example of dynamic programming.

[up] [next]