## 16.2 Algorithms on trees

Due to the recursive definition of binary trees, a function f on them
is usually defined recursively like this:

1. if _tree_ is empty: f(_tree_) = ...
2. otherwise: f(_tree_) = an expression based on operations root, left, right and join.

To come up with such a definition you need to answer these questions:

1. What's the output for an empty tree?
2. If I know the outputs for the left and right subtrees, what's the output for the whole tree?

For example, the size of the empty tree is zero, and if I know the sizes of the
left and right subtrees, then the size of the tree is their sum plus one,
for the root.

- if _tree_ is empty: size(_tree_) = 0
- otherwise: size(_tree_) = size(left(_tree_)) + size(right(_tree_)) + 1

The recursive definition of the length of a sequence didn't refer to the head
of the sequence; similarly, the size of a tree doesn't refer to the root.
Like for sequences, recursive definitions on trees are straightforward to
translate to code. First we must 'import' the definition of `Tree`.

In [1]:
%run -i ../m269_tree

Now we can define a new operation on binary trees.

In [2]:
# this code is also in m269_tree.py

def size(tree: Tree) -> int:
    """Return the number of nodes in tree."""
    if is_empty(tree):
        return 0
    else:
        return size(tree.left) + size(tree.right) + 1

I test the function on one expression tree, as they all have the same size.

In [3]:
size(TPM)

7

Algorithms on binary trees usually follow a divide-and-conquer approach
to process both subtrees and thereby all nodes.
This takes linear time in the size of the tree,
assuming that processing each node takes constant time.

#### Exercise 16.2.1

Recursively define the height of a tree.

- if _tree_ is empty: height(_tree_) = ...
- otherwise: height(_tree_) =

[Hint](../31_Hints/Hints_16_2_01.ipynb)
[Answer](../32_Answers/Answers_16_2_01.ipynb)

#### Exercise 16.2.2

Implement the operation.

In [4]:
%run -i ../m269_tree
%run -i ../m269_util

def height(tree: Tree) -> int:
    """Return the height of the tree."""
    pass

height_tests = [
    # case,         tree,   height
    ('empty',       EMPTY,  0),
    ('(3+4)*(5-6)',   TPM,  3),
    ('3+((4*5)-6)',   PMT,  4),
    ('(3+(4*5))-6',   MPT,  4),
]

test(height, height_tests)

[Answer](../32_Answers/Answers_16_2_02.ipynb)

### 16.2.1 Arm's-length recursion

The size algorithm always does two recursive calls per node,
whether a node has 0, 1 or 2 children.
However, empty subtrees don't add anything to the size of the tree.
Making a recursive call to immediately return zero seems a bit pointless.

**Arm's-length recursion** checks for the base case _before_ making a recursive
call. For the size function, this means checking if a subtree is empty and
not making a recursive call if it is. Since one or both subtrees may be empty,
we must check three additional cases.
The base case must still be checked in case the whole tree is empty.

In [5]:
def size_arm(tree: Tree) -> int:
    """Return the size of the tree using arm's length recursion."""
    if is_empty(tree):
        return 0
    elif is_leaf(tree):                 # both subtrees empty
        return 1
    elif is_empty(tree.left):           # left subtree empty
        return size_arm(tree.right) + 1
    elif is_empty(tree.right):          # right subtree empty
        return size_arm(tree.left) + 1
    else:
        return size_arm(tree.left) + size_arm(tree.right) + 1

The new algorithm is longer, inelegant, repetitive and thus
prone to typos and other errors. It only recurs on non-empty subtrees,
so it makes as many recursive calls as there are nodes, not twice as much,
but each call makes more checks.
Let's compare this version to the first one, using a tall tree
with one child per node; essentially, a linked list.

In [6]:
tree = join('leaf', EMPTY, EMPTY)
for level in range(1000):
    tree = join('a node', tree, EMPTY)

%timeit -r 5 size(tree)
%timeit -r 5 size_arm(tree)

662 µs ± 51.3 µs per loop (mean ± std. dev. of 5 runs, 1000 loops each)
1.21 ms ± 37.4 µs per loop (mean ± std. dev. of 5 runs, 1000 loops each)


In this example, arm's length recursion takes longer, even though it makes fewer recursive calls.

<div class="alert alert-warning">
<strong>Note:</strong> Avoid arm's length recursion:
it complicates your code and usually slows it down.
</div>

If an operation isn't defined for the empty tree, then an algorithm must first
check if a subtree isn't empty before making a recursive call.
Consider finding the largest item in a binary tree. The preconditions are that
the input tree isn't empty and its items are comparable.

- if _tree_ is a leaf: largest(_tree_) = root(_tree_)
- if left(_tree_) is empty and right(_tree_) isn't:
  largest(_tree_) = max(largest(right(_tree_)), root(_tree_))
- if right(_tree_) is empty and left(_tree_) isn't:
  largest(_tree_) = max(largest(left(_tree_)), root(_tree_))
- otherwise: largest(_tree_) = max(largest(left(_tree_)),
  largest(right(_tree_)), root(_tree_))

This is _not_ arm's-length recursion: each recurrence relation is checking for
the empty tree, not for the base case (tree is a leaf).
The definition is making sure no recursive call violates the preconditions.

⟵ [Previous section](16_1_binary.ipynb) | [Up](16-introduction.ipynb) | [Next section](16_3_traversals.ipynb) ⟶