We assume the reader is familiar with standard tree structures. Given a tree T
, the height of T
is the length of the longest path
from the root of T
of any leaf.
Figure 1: A complete binary tree of height 3. The tree hasA tree is:
- a binary tree if every node that is not a leaf has at most two children and,
- a complete (or perfect) tree if all the branches of the tree from the root to the leaves have the same height
h
.
15 = 23 + 1 - 1 nodes and 8 = 23 leaves.
A complete binary tree
T
of heighth
satisfies the following properties:
- the number of nodes in
T
is 2h + 1 - 1 and,- the number of leaves in
T
is 2h.
The core component of the Deposit Smart Contract is the so-called Incremental Merkle Tree algorithm.
A Merkle tree is a binary and complete tree decorated with the Merkle (hash) attribute.
A node's decoration (or attribute in the sequel) is a value attached to a node. Using this definition, in a Merkle Tree, a node's attribute is the hash of the attributes of its children (the attributes on the leaves are given). Computing the attribute for all the nodes of a tree usually requires to walk the tree (up or down) and to explore all the nodes, for instance via a Depth First Traversal algorithm.
In the context of a Blockchain, a Merkle tree is used to compute a property of a collection of elements e.g. a list L
.
Given L
, we first build a tree T(L)
associated with L
, such that the leaves of T(L)
are the elements of L
.
The property of the list L
is the value of the attribute (e.g. hash) at the root of T(L)
.
The value on the tree root can be used to test membership of an element in the list L
. The is known as a Merkle Proof.
In this project we are not concerned with Merkle Proofs but rather in computing an attribute (e.g. hash) on the root of a tree that stores a list
L
.
An attributed tree is a tree the nodes of which are decorated with a value of a given type, called the attribute. The hash attribute in Merkle trees belongs to the category of synthesised attributes:
A synthesised attribute on a (complete binary) tree is defined bottom-up: the values of the attribute are given on the leaves, and to compute the values on the internal nodes, the values of the left and right children are combined using a binary function.
For the sake of simplicity we assume that the attribute to compute on the tree T
is a simple function diff
,
defined by: diff(left, right) == left - right - 1
.
Each leaf holds an integer value, and the value of an internal node is the diff function applied to the values
of the left
and right
children.
The leaves of T
contains the elements of a list L
from left to right when traversed in-order (left, node, right).
To represent a list L
of n < 2h elements with a binary complete tree of height h
,
L
can be right-padded with dummy or neutral elements (e.g. zeroes) to obtain L'
of length 2h.
For instance, the tree below stores the list [3, 6, 2, -1, 4]
in the leaves of the tree, and the value of diff
at the root of the tree for this list is -11
.
The root value of
T(L)
is the value of the attributediff
at the root ofT(L)
.
We assume a fixed given height h
.
Given a list L
, we denote T(L)
the decorated tree of height h
that contains the list L
.
The incremental Merkle tree problem is informally stated as:
Assume we have a collection that grows monotonically, e.g. successive lists
L
,L::e1
,L::e_1::e_2
, ... and so on (where :: is theappend
operator on lists). Can we incrementally (and efficiently) compute the successive properties ofL
,L::e1
,L::e_1::e_2
i.e. the values of the root of the treesT(L)
,T(L::e1)
,T(L::e1::e2)
, ... ?
By efficiently, we mean without traversing the 2h nodes of the tree.
Let L = [3, 6, 2, -1, 4]
.
Assume we want to compute the root value of the diff
attribute for the list
L::5
.
As depicted on Figure 3 below, this amounts to inserting the new value 5
at the next available leftmost leaf of the tree which is n6
.
The green path above is the path from the root to the leaf where the new value is inserted.
Fact 1: the decorations on the trees
T(L)
andT(L::5)
differ only on the green path.
Fact 2: to compute the root value of
T(L::3)
we only need the values on the nodes that are siblings of nodes on the green path. As depicted above, if we have the values ofdiff
on the left (yellow) and on the right (orange) nodes, we can compute the root value ofT(L::5)
.
Fact 3: The leaves (
n7
,n8
) on the right of the green path all have the default value. As a consequence, the right (orange) nodes have a default value that correspond to their height. Height0
is a leaf, and the value is thedefault
value (e.g.0
for ourdiff
attribute). At height1
, the value isdiff(0, 0) = -1
, at height2
it isdiff(diff(0,0), diff(0,0)) = -1
and so on.
If we want to compute the new root value after inserting 5
at n6
we just need to walk up
the green path and compute the diff
attribute. This can be done in time O(h)
(h
is the height of the tree).
After inserting 5
at n6
in the tree we obtain the new tree depicted below in Figure 4:
The green path is also updated to be the path to the next available leaf (n7
).
The result of the previous facts is that the computation of the new root value can be done using the values of the left and right siblings of the green path. They can be stored in two vectors the size of which is the height of the tree.
Now assume that when we insert a new value in the tree, we can also compute the new value of those vectors i.e. the values of the siblings on the path to the next available leaf.
In this case, we can iteratively compute the root value of the tree for a list e1::e2:: ...::en
storing only the two vectors (of size h
), and computing the new root value by walking up to current path
in linear time (O(h)
).
For the reader familiar with algorithm design, it is a typical example of dynamic programming.