# COMS W2132 Intermediate Computing in Python
## Trees

**Date**: Feb 19th and Feb 24th 2025
**Instructor**: Daniel Bauer (original notes by Jan Janak)

**Reading**: Data Structures and Algorithms in Python, Chapter 8

---

### Introduction

* One of the most ubiquitous data structures
* File systems, databases, websites, GUI, Python modules and classes, and generally everything organized in a hierarchy
* Hierarchically organized elements with relationships such as above, below, parent, child, ancestor, descendant

<img src="https://janakj.org/w3132/images/exception_hierarchy.png"/>

## General Trees

* In computer science, a tree is an abstract model of a hierarchical structure
* Trees offer faster than linear algorithms (think binary search)

### Definitions and Properties

* Tree $T$ is a set of nodes storing elements that have a parent-child relationship
* If $T$ is not empty, it has a **root** with no parent
* A node $v$ other than the root has a **parent** node $w$
* A node with parent $w$ is said to be a **child** of $w$

<img src="https://janakj.org/w3132/images/tree.png" width=500/>

* The children of the same parent are **siblings**
* **Internal** node has at least one child
* Nodes that have no children are called **leaves** (also external)
* Node $u$ is an **ancestor** of $v$ if $u=v$ or $u$ is an ancestor of $parent(v)$
* $u$ is a **descendant** of $v$
* **subtree** of $T$ at $v$ is the tree of all the descendants of $v$ in T (including $v$)

The order between a node's children may or may not be important. A tree is said to be **ordered** if the order among the children is important.

### Abstract Tree Definition

In [1]:
class AbstractPosition:
    'An abstraction representing the location of a single element'

    def element(self):
        raise NotImplemented('Must be implemented by a subclass')

    def __eq__(self, other):
        'Returns True if Position represents the same location'
        raise NotImplemented('Must be implemented by a subclass')

    def __ne__(self, other):
        return not (self == other)

class AbstractTree:

    def root(self):
        'Return the root of the tree or None'
        raise NotImplemented('Must be implemented in subclass')

    def parent(self, p):
        'Return the parent of the node at position p or none if p is root'
        raise NotImplemented('Must be implemented in subclass')

    def children(self, p):
        'Return all children of the node at position p'
        raise NotImplemented('Must be implemented in subclass')

    def num_children(self, p):
        'Return the number of children of the node at position p'
        raise NotImplemented('Must be implemented in subclass')

    def __len__(self, p):
        'Return the number of elements in the tree'
        raise NotImplemented('Must be implemented in subclass')

    # Query methods
    
    def is_root(self, p):
        return self.root() == p

    def is_leaf(self, p):
        return self.num_children(p) == 0

    def is_empty(self):
        return len(self) == 0

    def depth(self, p): 
        

### Depth and Height

* **depth** of a node: number of ancestors
* **height** of a tree: maximum depth of any node

In [30]:
def depth_recursive(self, p):
    'Calculate the depth of a tree. Recursive version.'
    if self.is_root(p):
        return 0
    else:
        return self.depth_recursive(self.parent(p)) + 1

# Add the newly defined function as a method to the AbstractTree class
AbstractTree.depth = depth_recursive

In [3]:
def depth_iterative(self, p):
    'Calculate the depth of a tree. Iterative version.'
    depth = 0
    parent = self.parent(p)
    while parent is not None:
        depth += 1
        parent = self.parent(parent)
    return depth

Running time: $O(d_p)$, where $d_p$ is the depth of $p$ in the tree. There is a constant-time recursive step for each ancestor. The worst case: single branch. Thus, depth() runs in in $O(n)$ in the worst case.

Note: $d_p$ is usually much smaller than $n$ (trees usually don't form a single branch)

The **height** of a subtree rooted in position $p$:
* If $p$ is leaf then $height(p)=0$
* Otherwise, the height of $p$ is the maximum height of its children + 1

In [4]:
def _height(self, p): # compute the height of subtree rooted in p, recursively
    if self.is_leaf(p):
        return 0
    else:
        return 1 + max([self._height(c) for c in self.children(p)])

Computing the height of the entire tree:

In [10]:
def height(self, p=None):
  if p is None:
      p = self.root()
  return self._height(p)

Running time: touch each node once $O(n)$

## Binary Trees

A binary tree is an ordered tree where:
* Every node has at most two children
* Children are labeled as **left** and **right**
* A left child precedes a right child (ordered pair)

A binary tree is called **proper** (or **full**) if each node has either zero or two children (no partial nodes)

<img src="https://janakj.org/w3132/images/binary-tree.png" width=500/>

Applications: arithmetic expressions, searching, decision processes (yes-no questions)

A binary tree representing the expression $(2 (a − 1) + (3 × b))$:

<img src="https://janakj.org/w3132/images/arithmetic-tree.png" width=500/>

Internal nodes represent operators. External nodes represent operands.

### Abstract Binary Tree Definition

In [6]:
class AbstractBinaryTree(AbstractTree):
    def left(self, p):
        'Return the left child of node at position p'
        raise NotImplemented('Must be  implemented by subclass')

    def right(self, p):
        'Return the right child of node at position p'
        raise NotImplemented('Must be  implemented by subclass')

    def sibling(self, p):
        parent = self.parent(p)
        if parent is None:
            return None
        else:
            if p == self.left(parent):
                return self.right(parent)
            else:
                return self.left(parent)

    def children(self, p):
        children = []
        if self.left(p) is not None:
            children.append(self.left(p))
            
        if self.right(p) is not None:
            children.append(self.right(p))
        return children

Interesting relationships between the height and number of nodes in a binary tree:
* Each level $d$ has at most $2^d$ nodes.

* A binary tree of height $h$ has at least $h+1$ nodes and at most $n \leq 2^{(h+1)}-1$ nodes.

* Thus a tree with n nodes has at least height $h \geq \lfloor \log_2 (n + 1) \rfloor$.

Furthermore, if the binary tree is full/proper: 

* Relationship between leaves *l* and internal nodes *i*
  * $l = i + 1$, and thus
  * $n = 2i + 1 = 2l -1$
  * $i = (n– 1)/2$
  * $l = (n + 1)/2$

## Implementing Trees

The classes defined above are abstract. We now specify concrete implementations for some of the methods, namely root, parent, num)children, children, and \_\_len\_\_.

### Linked Trees

We will start with a **linked structure** where nodes maintain references to children and the parent.

<img src="https://janakj.org/w3132/images/tree-linked.png" width=700/>

Worst-case constant-time update methods:
* add_root(e): Create a root for an empty tree
* add_left(p, e): Link the node as left child of p
* add_right(p, e): Link the node as right child of p
* replace(p, e): Replace the element stored at p with e
* delete(p) Remove the node at p, replacing it with its child (if any) and return the element. Report error if there are multiple children
* attach(p, t1, t2): Attach the internal structures of t1 and t2 as left and right subtrees of leaf p

In [7]:
class LinkedBinaryTree(AbstractBinaryTree):
    class _Node:
        def __init__(self, element, parent=None, left=None, right=None):
          self._element = element
          self._parent = parent            
          self._left = left
          self._right = right
                
    class Position(AbstractPosition):
        def __init__(self, container, node):
            self._container = container
            self._node = node
        
        def element(self):
            return self._node._element

        def __eq__(self, other):
            return type(other) is type(self) and other._node is self._node
            
    def _make_position(self, node):
        "Converts a node to the node's position in the tree"
        if node is None:
            return None
        else:
            return self.Position(self, node)

    def _validate(self, p): # retrieve the node object in position p 
        if not isinstance(p, self.Position):
            raise TypeError('p must be proper Position type')
        if p._container is not self:
            raise ValueError('p does not belong to this container')
        if p._node._parent is p._node:
            raise ValueError('p is no longer valid')
        return p._node

    def __init__(self):
        'Create an empty linked binary tree'
        self._root = None
        self._size = 0

    def __len__(self):
        return self._size

    def root(self):
        return self._make_position(self._root)

    def parent(self, p):
        'Return the parent of the node at position p or none if p is root'
        node = self._validate(p)
        return self._make_position(node._parent)

    def left(self, p):
        node = self._validate(p)
        return self._make_position(node._left)

    def right(self, p):
        node = self._validate(p)
        return self._make_position(node._right)

    def num_children(self, p):
        node = self._validate(p)
        count = 0
        if node._left is not None:
            count += 1
        if node._right is not None:
            count += 1
        return count

    # Constant time operation
    def _add_root(self, e):
        if self._root is not None:
            raise ValueError('Root exists')
        self._size = 1
        self._root = self._Node(e)
        return self._make_position(self._root)

    # Constant-time operation
    def _add_left(self, p, e):
        node = self._validate(p)
        if node._left is not None:
            raise ValueError('Left child exists')
        self._size += 1
        node._left = self._Node(e, node)
        return self._make_position(node._left)

    # Constant-time operation
    def _add_right(self, p, e):
        node = self._validate(p)
        if node._right is not None:
            raise ValueError('Right child exists')
        self._size += 1
        node._right = self._Node(e, node)
        return self._make_position(node._right)

    def _replace(self, p, e):
        node = self._validate(p)
        old = node._element
        node._element = e
        return old
                
    def _delete(self, p):
        node = self._validate(p)

        # We cannot easily delete a node that has two children. If the
        # node has only one child, the child could be plugged into the
        # tree instead of the parent being removed. But there is no easy
        # way to plug the other child.
        if self.num_children(p) == 2:
            raise ValueError('Position has two children')
            
        child = node._left if node._left else node._right
        if child is not None:
            child._parent = node._parent
            
        if node is self._root:
            self._root = child
        else:
            parent = node._parent
            if node is parent._left:
                parent._left = child
            else:
                parent._right = child

        self._size -= 1
        node._parent = node
        return node._element
        
    def _replace(self, p, e):
        node = self._validate(p)
        old = node._element
        node._element = e
        return old
  
    def _attach(self, p, t1, t2):
        node = self._validate(p)
        if not self.is_leaf(p):
            raise ValueError('position must be leaf')
            
        if not type(self) is type(t1) is type(t2):
            raise TypeError('Tree types must match')
    
        self._size += len(t1) + len(t2)
        if not t1.is_empty():
            t1._root._parent = node
            node._left = t1._root
            t1._root = None
            t1._size = 0
        if not t2.is_empty():
            t2._root._parent = node
            node._right = t2._root
            t2._root = None
            t2._size = 0

#### Running Times

* The len method takes $O(1)$
* Method is_empty (inherited) calls len() and is thus $O(1)$
* Methods root, left, right, parent, and num_children are all $O(1)$
* Methods siblings and children (inherited from AbstractBinaryTree) use a constant number of accessors and are $O(1)$
* is_root and is_leaf are both $O(1)$
* depth is $O(d_p + 1)$
* height is $O(n)$
* The various update methods are all $O(1)$

### Array Binary Trees

Binary (and only binary) trees can also be efficiently stored in arrays. We simply derive an array index $rank(p)$ for every position $p$ of a binary tree as follows:
* $rank(p) = 1$ if $p$ is the root of the tree
* $rank(p) = 2\cdot rank(q)$ if $p$ is the left child of $q$
* $rank(p) = 2\cdot rank(q) + 1$ if $p$ is the right child of $q$

<img src="https://janakj.org/w3132/images/array-binary-trees.png" width=700/>

**Important**: The elements are not necessarily stored in consecutive elements of the array!

Going back to the arithmetic tree example, let's see how that would be implemented in the array.

<img src="https://janakj.org/w3132/images/arithmetic-tree.png" width=500/>

| 0       | 1      | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
| --------| -------|---|---|---|---|---|---|---|---|
|         | +      |   |   |   |   |   |   |   |   |

Characteristics of an array-based binary tree implementation:
* A position can be represented by a single integer (rank(p))
* Methods such as root, parent, left, and right are simple arithmetic operations on rank(p)
* Space usage:
  * Depends on the shape of the tree
  * The array may have a number of empty cells ($N = 2^n - 1$ in the worst case, **prohibitive**)
* Some update operations cannot be efficiently supported, e.g., deleting a node takes $O(n)$.
 
Note: There is a class of binary trees called heaps that can be stored efficiently in an array of $n$ elements, where $n$ is the number of nodes, i.e., heaps leave no empty cells.

## Tree Traversal Algorithms

So far, we have developed a Python class hierarchy to represent various trees looking as follows:

<img src="https://janakj.org/w3132/images/tree-hierarchy.png"/>

Now it is time to see how to use them. In this section, we will discuss selected tree traversal algorithms, i.e., algorithms that systematically access (or visit) all positions in a tree. The algorithm will perform some action for each visited position as part of the visit. These actions could involve simple or complex computations, depending on the application.

There are several systematic ways of visiting all positions of a tree:
  * **Preorder**: the position is visited first, followed by recursive visits to all its children
  * **Postorder**: The children of a node are visited before the node itself is visited
  * **Inorder**: A subset of children is visited first, then the position itself, and then the rest of the children. Inorder traversal is mostly applicable to binary trees.
  * **Breadth-First**: The positions are visited according to their depth (root first, then all nodes at depth 1, then all nodes at depth 2, and so on)

We will introduce these traversal techniques in the context of their applications. For a more formal treatment of the topic, please consult Chapter 8 in the textbook.

### Binary Search (Inorder Traversal)

Homework 1 asked you to implement the Wordle game. The assignment asked you to check if the guessed word is in the Wordle dictionary. Most of you probably implemented a simple check looking as follows:

In [8]:
def load_dictionary(filename):
    dictionary = []
    with open(filename, 'r') as file:
        for line in file:
            word = line.strip().lower()
            dictionary.append(word)
    return dictionary

wordle = load_dictionary('wordle.txt')

We can then show the length of the list and use the in operator to test if a word is in the list:

In [9]:
print(len(wordle))
print("adobe" in wordle)
print("aaaaa" in wordle)

2315
True
False


The above check takes $O(n)$ in the worst case, i.e., when the word is not in the Wordle dictionary.

We can do better by storing the word list in a binary tree. Let's define functions to add the words to a binary tree:

In [10]:
# Write the _insert_below and insert method and insert them in the AbstractBinaryTree class

def _insert_below(self, p, e):
    if e < p.element():
        if self.left(p):
            self._insert_below(self.left(p), e)
        else:
            self._add_left(p, e)
    else:
        if self.right(p):
            self._insert_below(self.right(p), e)
        else:
            self._add_right(p, e)

def insert(self, e):
    if self.is_empty():
        self._add_root(e)
    else:
        self._insert_below(tree.root(), e)

AbstractBinaryTree._insert_below = _insert_below
AbstractBinaryTree.insert = insert

And convert the list to a tree as follows:

In [11]:
tree = LinkedBinaryTree()

for word in wordle:
    tree.insert(word)

When you run the above cell, you will notice that it takes a long time. When you try to print the height of the resulting tree, you are probably going to get "a maximum recursion depth" exception:

In [12]:
print(tree.height())

RecursionError: maximum recursion depth exceeded

It turns out that the Wordle list is alphabetically sorted. Thus, we produce a really long list, or a tree where all elements are inserted to the right of the root or the parent element.

<img src="https://janakj.org/w3132/images/wordle-tree-bad.png"/>

We can fix this by shuffling the input sequence:

In [74]:
import random
random.shuffle(wordle)

tree = LinkedBinaryTree()

for word in wordle:
    tree.insert(word)

In [81]:
print(tree.right(tree.right(tree.root())).element())

waver


That was much faster! The tree will look similar to this example:

<img src="https://janakj.org/w3132/images/wordle-tree-good.png"/>

Also, we can display the tree's height now:

In [14]:
print(tree.height())

26


To find an element in the tree, we define a new AbstractBinaryTree class method called find. This method returns the element (if found in the tree) or None.

In [15]:
def find(self, e, p=None):
    if p is None:
        p = self.root()
    cur = p.element()
    
    if e < cur:
        if self.left(p) is not None:
            return self.find(e, self.left(p))
    elif e == cur:
        return e
    else:
        if self.right(p) is not None:
            return self.find(e, tree.right(p))
    return None

AbstractBinaryTree.find = find

In [16]:
print(tree.find("adobe"))

adobe


In [17]:
print(tree.find("aaaaa"))

None


What if we wanted to print the wordle list in the original sorted order? We can use inorder traversal:

In [18]:
def inorder(self):
    if not self.is_empty():
        for p in self._subtree_inorder(self.root()):
            yield p

def _subtree_inorder(self, p):
    if self.left(p) is not None:
        for other in self._subtree_inorder(self.left(p)):
            yield other
    yield p
    if self.right(p) is not None:
        for other in self._subtree_inorder(self.right(p)):
            yield other

AbstractBinaryTree._subtree_inorder = _subtree_inorder
AbstractBinaryTree.inorder = inorder

<img src="https://janakj.org/w3132/images/inorder.png"/>

Print the beginning of the Wordle dictionary (first 20 words):

In [19]:
i = 0
for p in tree.inorder():
    print(p.element())

    i = i + 1
    if i == 20:
        break

aback
abase
abate
abbey
abbot
abhor
abide
abled
abode
abort
about
above
abuse
abyss
acorn
acrid
actor
acute
adage
adapt


### General Tree Iteration Methods

We can generalize the generator approach used in the inorder traversal methods and define a general method positions to iterate over all positions in a tree in *some* order. We will also define the special method \_\_iter\_\_ to iterate over the elements in the tree:

In [20]:
def positions(self):
    'Generate an iteration of all positions of the tree'
    raise NotImplemented('To be implemented')

def __iter__(self):
    'Generate an iteration of all elements stored within the tree'
    for p in self.positions():
        yield p.element()

AbstractTree.positions = positions
AbstractTree.__iter__ = __iter__

With the above two methods defined, one can traverse a tree as follows:

In [None]:
# for el in tree.positions():
#     # perform some action
#     pass

Next, we need to define the positions method to traverse the tree in some systematic way. We have seen this before: the inorder traversal method. In fact, the inorder traversal method is most useful on binary trees, so we can set it as the default position method on binary trees:

In [21]:
AbstractBinaryTree.positions = AbstractBinaryTree.inorder

In [23]:
i = 0
for e in tree:
    print(e)

    i = i + 1
    if i == 20:
        break

aback
abase
abate
abbey
abbot
abhor
abide
abled
abode
abort
about
above
abuse
abyss
acorn
acrid
actor
acute
adage
adapt


### Preorder Traversal

In [63]:
us = LinkedBinaryTree()
us._add_root("United States")

ny = LinkedBinaryTree()
ny._add_root("New York")

nj = LinkedBinaryTree()
nj._add_root("New Jersey")

newark = LinkedBinaryTree()
newark._add_root("Newark")

hoboken = LinkedBinaryTree()
hoboken._add_root("Hoboken")

nyc = LinkedBinaryTree()
nyc._add_root("New York City")

albany = LinkedBinaryTree()
albany._add_root("Albany")

nj._attach(nj.root(), newark, hoboken)
ny._attach(ny.root(), nyc, albany)
us._attach(us.root(), ny, nj)

In [64]:
def preorder(self):
    if not self.is_empty():
        for p in self._subtree_preorder(self.root()):
            yield p

def _subtree_preorder(self, p):
    yield p
    for c in self.children(p):
        for other in self._subtree_preorder(c):
            yield other

AbstractTree.preorder = preorder
AbstractTree._subtree_preorder = _subtree_preorder

<img src="https://janakj.org/w3132/images/preorder.png"/>

In [72]:
section_number = [0]
for p in us.preorder():
    i = us.depth(p)
    if i >= len(section_number):
        section_number.append(0)
    else:
        section_number = section_number[:i+1]
    section_number[i] += 1

    print(f'{".".join(map(str, section_number))} {p.element()}')

1 United States
1.1 New York
1.1.1 New York City
1.1.2 Albany
1.2 New Jersey
1.2.1 Newark
1.2.2 Hoboken


### Postorder Traversal

In [41]:
# ((3+1)*4)/((9-5)+2)

exp = LinkedBinaryTree()
exp._add_root("/")

t1 = LinkedBinaryTree()
t1._add_root("+")
t1._add_left(t1.root(), 3)
t1._add_right(t1.root(), 1)

t2 = LinkedBinaryTree()
t2._add_root(4)

t3 = LinkedBinaryTree()
t3._add_root("*")
t3._attach(t3.root(), t1, t2)

t4 = LinkedBinaryTree()
t4._add_root("-")
t4._add_left(t4.root(), 9)
t4._add_right(t4.root(), 5)

t5 = LinkedBinaryTree()
t5._add_root(2)

t6 = LinkedBinaryTree()
t6._add_root("+")
t6._attach(t6.root(), t4, t5)

exp._attach(exp.root(), t3, t6)

In [44]:
def evaluate(self):
    return self._evaluate_subtree(self.root())

def _evaluate_subtree(self, p):
    if self.is_leaf(p):
        return int(p.element())
    else:
        op = p.element()
        left_val = self._evaluate_subtree(self.left(p))
        right_val = self._evaluate_subtree(self.right(p))
        if op == '+':
            return left_val + right_val
        elif op == '-':
            return left_val - right_val
        elif op == '/':
            return left_val / right_val
        elif op == '*':
            return left_val * right_val
        else:
            raise Exception(f'Unsupported operation {op}')

AbstractBinaryTree.evaluate = evaluate
AbstractBinaryTree._evaluate_subtree = _evaluate_subtree

In [45]:
print(exp.evaluate())

2.6666666666666665


In [46]:
((3+1)*4)/((9-5)+2)

2.6666666666666665

Generalizing to postorder tree traversal:

In [25]:
def postorder(self):
    if not self.is_empty():
        for p in self._subtree_postorder(self.root()):
            yield p

def _subtree_postorder(self, p):
    for c in self.children(p):
        for other in self._subtree_postorder(c):
            yield other
    yield p

AbstractTree.postorder = postorder
AbstractTree._subtree_postorder = _subtree_postorder

<img src="https://janakj.org/w3132/images/postorder.png"/>

### Bread-First Traversal

In [26]:
# This is a copy of the ArrayQueue class from one of the previous lectures

class ArrayQueue:
    'A queue backed with a built-in Python list'

    def __init__(self):
        'Creates an empty ArrayQueue instance'        
        self._data = list()
    
    def enqueue(self, e):
        'Enqueue element at the back of the queue'
        self._data.append(e)

    def dequeue(self):
        '''Dequeue the element from the front of the queue

        Raise an exception if the queue is empty
        '''
        if len(self._data) == 0:
            raise Empty('The queue is empty')

        return self._data.pop(0)
    
    def front(self):
        '''Return a reference to the element at the front

        Raise an exception if the queue is empty
        '''
        if len(self._data) == 0:
            raise Empty('The queue is empty')
        
        return self._data[0]

    def is_empty(self):
        'Return true if the queue is empty'
        return len(self._data) == 0

    def __len__(self):
        'Return the length of the queue'
        return len(self._data)

In [28]:
def breadth_first(self):
    if not self.is_empty():
        fringe = ArrayQueue()
        fringe.enqueue(self.root())
        while not fringe.is_empty():
            p = fringe.dequeue()
            yield p
            for c in self.children(p):
                fringe.enqueue(c)

AbstractTree.breadth_first = breadth_first

<img src="https://janakj.org/w3132/images/breadthfirst.png"/>

In [33]:
tree = LinkedBinaryTree()
b = tree._add_root("b")
a = tree._add_left(tree.root(), "a")
c = tree._add_right(tree.root(), "c")

tree._add_left(a, "a1")
tree._add_right(a, "a2")

tree._add_left(c, "c1")
tree._add_right(c, "c2")

last = 0
for p in tree.breadth_first():
    if tree.depth(p) != last:
        last = tree.depth(p)
        print()
    print(f'{p.element()} ', end='')

b 
a c 
a1 a2 c1 c2 