## 16.1 Binary tree

In M269 we focus on a restricted form of rooted trees: binary trees.
They can be defined recursively:

> A **binary tree** is either empty or it consists of an item,
> called the **root**, and two binary trees,
> called the **left subtree** and the **right subtree**.

The next figure shows a binary tree that represents the expression (3+4)×5 − 6.
Rooted trees are usually depicted from the root downwards, unlike natural trees.

![This figure shows a diagram made of circles connected by lines.
Each circle surrounds an arithmetic operator or a number.
Each circle with an operator is connected by two lines to
two other circles below it, to the left and right.
Each circle with a number has no further circles below it.
The circle at the top has the subtraction operator.
Below it, the left circle has the multiplication operator
and the right circle has number 6.
Below the multiplication operator, the left circle has the addition operator
and the right circle has number 5.
Below the addition operator, the left circle has number 3
and the right circle has number 4.
](16_1_example.png)

The root is the subtraction operator.
The left subtree's root is the multiplication operator.
The right subtree consists of root 6 and two empty subtrees.
Why the expression is represented in this particular way will become clear
in the next section, when we evaluate such expression trees.

Many examples, like the folder hierarchy on a disk, can't be modelled as
binary trees, because a folder can have more than two subfolders.
But the concepts and techniques for binary trees
can be extended to rooted trees with any number of subtrees.

### 16.1.1 Terminology

A **node** consists of an item and
the references to the left and right subtrees. The **size** of a tree is
the number of its items (or nodes); the example tree has size 7.

A node A is the **parent** of node B, and B is the **left or right child** of A,
if B is the root of the left or right subtree of A, respectively.
For example, node 6 is the right child of the subtraction node,
which is also the parent of the multiplication node.

Every node has exactly one parent, except the root, which has no parent.
A **leaf** is a childless node, i.e. both its subtrees are empty.
In the example, the leaves are the integer literals.

A node A is an **ancestor** of node B, and B is a **descendant** of A,
if A is a parent of B or A is an ancestor of the parent of B.
For example, the ancestors of node 3 are
addition, multiplication and subtraction.
A subtree rooted at a node A consists of A and all its descendants.
For example, the subtree rooted at the addition node consists of it
and the descendants 3 and 4.

The **level** or **depth** of a node is the number of its ancestors.
The root has depth 0 because it has no ancestors.
In the example tree, node 5 and the addition node have depth 2;
nodes 3 and 4 are at level 3.

The **height** of a tree is the number of levels,
which is the largest depth plus one.
The height of the example tree is 4. The height of the empty tree is 0.

<div class="alert alert-info">
<strong>Info:</strong> Many authors define the height of the empty tree as −1 and the height of
a non-empty tree as the largest depth.
</div>

A binary tree is **perfect** if all its levels are full, i.e. all parents have
two children and all leaves are at the same depth.
An empty tree and a tree with just one node are perfect trees.
The left-hand tree in the next figure is also perfect.

#### Exercise 16.1.1

Consider the following binary trees, which represent, from left to to right,
expressions (3+4)×(5-6), 3+((4×5)−6) and (3+(4×5))−6.

![This figure shows three separate circle-and-line diagrams,
with the same operators and numbers as the previous figure. Again,
only circles with operators are connected to a left and a right circle below.
Left diagram:
The top circle has the multiplication operator.
The circles below have the addition operator on the left
and the subtraction operator on the right.
Below the addition circle are the numbers 3 on the left and 4 on the right.
Below the subtraction circle are the numbers 5 on the left and 6 on the right.
Middle diagram:
The top circle has the addition operator.
The circles below have number 3 on the left
and the subtraction operator on the right.
Below the subtraction circle are the multiplication operator on the left
and number 6 on the right.
Below the multiplication circle are the numbers 4 on the left
and 5 on the right.
Right diagram:
The top circle has the subtraction operator.
The circles below have the addition operator on the left
and number 6 on the right.
Below the addition circle are number 3 on the left
and the multiplication operator on the right.
Below the multiplication circle are the numbers 4 on the left
and 5 on the right.
](16_1_exercise.png)

For each tree:

1. State its size and height.

_Write your answer here._

2. List the nodes at level 2.

_Write your answer here._

3. Explain if the multiplication node is a descendant of the subtraction node.

_Write your answer here._

[Answer](../32_Answers/Answers_16_1_01.ipynb)

### 16.1.2 ADT and data structure

The binary tree ADT has the following operations:

Operation | Effect | Algorithm in English
-|-|-
new  | create a new empty binary tree | let _t_ be an empty binary tree
join | create a tree from item _i_ and trees _l_ and _r_ | join(_i_, _l_, _r_)
root | obtain the root item of non-empty tree _t_ | root(_t_)
left | obtain the left subtree of non-empty tree _t_ | left(_t_)
right | obtain the right subtree of non-empty tree _t_ | right(_t_)
is empty | check if a given tree is empty | _t_ is empty

The join operation puts together a tree while
the root, left and right operations take it apart. We have
root(join(_i_, _l_, _r_)) = _i_,
left(join(_i_, _l_, _r_)) = _l_ and right(join(_i_, _l_, _r_)) = _r_.
This is the same approach as for composing sequences with the prepend operation
and decomposing them with the head and tail operations.

<div class="alert alert-warning">
<strong>Note:</strong> When designing a data type made of several parts,
include operations to join and to separate the parts.
</div>

Instead of writing in one go a large class with many methods,
I define a data structure and add operations one by one,
as standalone functions. Introducing operations incrementally
makes it easier for me to explain (and for you to learn) how binary trees work.

The binary tree data structure follows the recursive definition:
a tree node has an item and points to two children nodes.
A binary tree is like a bifurcating [linked list](../06_Implementing/06_7_linked_list.ipynb#6.7-Linked-lists).
I represent an empty tree by a node without a root or subtrees.

In [1]:
# this code is also in m269_tree.py

class Tree:
    """A rooted binary tree"""
    def __init__(self):
        self.root = None
        self.left = None
        self.right = None

The class could be named `BinaryTree` or `TreeNode`
but since we're using only one kind of tree, I prefer a shorter name.

The analogue to a doubly linked list would be to have an
additional pointer to the parent node.
This would allow us to write operations that
can navigate up a tree, but we can live without the parent pointer in M269.

Let's implement the ADT operations.
The new operation is provided by the constructor.
To obtain the root, left and right subtrees we can simply access
the corresponding attribute because we're using the class as the raw
data structure. The remaining operations are:

In [2]:
# this code is also in m269_tree.py

def is_empty(tree: Tree) -> bool:
    """Return True if and only if tree is empty."""
    return tree.root == tree.left == tree.right == None

def join(item: object, left: Tree, right: Tree) -> Tree:
    """Return a tree with the given root and subtrees."""
    tree = Tree()
    tree.root = item
    tree.left = left
    tree.right = right
    return tree

We construct trees bottom-up, starting from the leaves and joining two subtrees.
First, the leaves for all the example trees.

In [3]:
# this code is also in m269_tree.py

EMPTY = Tree()
THREE = join(3, EMPTY, EMPTY)
FOUR = join(4, EMPTY, EMPTY)
FIVE = join(5, EMPTY, EMPTY)
SIX = join(6, EMPTY, EMPTY)

Next, the trees in the second figure, named after the order in which
the plus, minus and times operators appear in the `join` arguments.

In [4]:
# this code is also in m269_tree.py

TPM = join('*', join('+', THREE, FOUR), join('-', FIVE, SIX)) # (3+4)*(5-6)
PMT = join('+', THREE, join('-', join('*', FOUR, FIVE), SIX)) # 3+((4*5)-6)
MPT = join('-', join('+', THREE, join('*', FOUR, FIVE)), SIX) # (3+(4*5))-6

I can reuse the same leaf objects for different trees
because they won't be modified (hence the uppercase names).

#### Exercise 16.1.2

Write the Python expression for the tree in the first figure,
for expression ((3+4)×5) − 6.

In [5]:
%run -i ../m269_tree

pass

[Answer](../32_Answers/Answers_16_1_02.ipynb)

Before we move on, here's one more operation to illustrate accessing the subtrees.

In [6]:
# this code is also in m269_tree.py

def is_leaf(tree: Tree) -> bool:
    """Return True if and only if the tree is a single leaf."""
    return not is_empty(tree) and is_empty(tree.left) and is_empty(tree.right)

In [7]:
is_leaf(THREE)

True

In [8]:
is_leaf(EMPTY)

False

In [9]:
is_leaf(TPM)

False

⟵ [Previous section](16-introduction.ipynb) | [Up](16-introduction.ipynb) | [Next section](16_2_algorithms.ipynb) ⟶