## Today's Agenda
- Trees

## Objectives
- To understand the structure of trees
- To understand what a tree data structure is and how it is used.

## Tree

- Lists, Stacks, and Queues are linear relationships
- Information often contains hierarchical relationships
- For example, file directories or folders
     
     <img src="images/week-04/tree_directory.png"  width="500">
     
- Hierarchies in organizations
    
    <img src="images/week-04/treeorgancharthier.png"  width="500">
    
- Build a tree to support fast searching

## Structure of Trees

- Nodes root
     - Special node at top: root
- Links
    - Connect nodes
    - Zero or more nodes connected to a node
    - Nodes can store information

<img src="images/week-04/tree_structure.png"  width="300" height="300">

## Tree Jargon - Terminology

- **Node:** A node is a fundamental part of a tree. It can have a name, which we call the “key".
- **Edge:** An edge connects two nodes to show that there is a relationship between them (incoming/outgoing).
- **Root:** The root of the tree is the only node in the tree that has no incoming edges.
- **Path:** A path is an ordered list of nodes that are connected by edges.
- **Children:** The set of nodes that have incoming edges from the same node to are said to be the children of that node.
- **Parent:** A node is the parent of all the nodes it connects to with outgoing edges.
- **Sibling:** Nodes in the tree that are children of the same parent are said to be siblings.
- **Internal node:** An internal node is a node with at least one child.
- **Leaf Node:** A leaf node is a node that has no children.


<img src="images/week-04/tree_details.png" width="500" height="500">

### More on Terminology
- **Level:** The level of a node $n$ is the number of edges on the path from the root node to $n$.
- **Height:** The height of a tree is equal to the maximum level of any node in the tree.
- **Subtree:** A subtree is a set of nodes and edges comprised of a parent and all the descendants of that parent.

- **Ancestors of a node:** parent, grandparent, grand-grandparent, etc.
    - If there is a path from node $u$ to node $v$, $u$ is an ancestor of $v$
    - Better to say recursively as: $u$ is an ancestor of $v$ if $u=v$ or $u$ is an ancestor of the parent of $v$
- **Descendant of a node:** child, grandchild, grand-grandchild, etc.

## More on terminology
- Length of a path = number of edges
- Depth of a node N = length of path from root to N
- Height of node N = length of longest path from N to a leaf
- Depth of tree = depth of deepest node
- Height of tree = height of root

<img src="images/week-04/treevalues.png">

## Some facts
- Root note does not have a parent.
- Subtree consists of a node and its descendants.

## Tree ADT

### Tree operations
- Usual operations: 
    - size(): returns the number of nodes, 
    - isEmpty(): returns true if the tree is empty.
- root(): returns the root of the tree, and flags an error if the tree is empty
- Accessor methods :
    - parent($v$): returns the parent of the node $v$, flags an error if $v$ is the root,
    - children($v$): returns the collection of chidren of the node $v$.
- Queries: 
    - isRoot($v$) : returns true if the node $v$ is the root
    - isExternal($v$): returns true if the node $v$ is a leaf
    - isInternal($v$): returns true if the node $v$ has children
    

## Trees - Definition
Defined as Nodes and Edges
- One node of the tree is designated as the root node
- Every node $n$, except the root node, is connected by an edge from exactly one other node $p$, where $p$ is the parent of $n$.
- A unique path traverses from the root to each node.
- If each node in the tree has a maximum of two children, we say that the tree is a **binary tree**.

A Tree with a Set of Nodes and Edges

<img src="images/week-04/treenodes.png" width="350">

It is better to define trees recursively as:
- A tree is either empty or consists of a root and zero or more subtrees, each of which is also a tree
- The root of each subtree is connected to the root of the parent tree by an edge

<img src="images/week-04/treesubtree.png" width="300">

## Definition - wrap up
- A **tree** is a set of nodes, i.e., either
    - it’s an empty set of nodes, or
    - it has one node called the root from which zero or more trees (subtrees) descend.
    
<img src="images/week-04/tree_basic.png"> 

- Root: top node
- Parent: node “above”
- Every node (except root) has exactly one parent
- Child: node “below”
    - Nodes may have zero or more children
    - Binary trees have at most two children
- Leaf: node without children
- Subtree: tree below a given node
    - That node becomes root of the subtree
- Level: distance from root

## More facts

- A tree with $n$ nodes always has $n-1$ edges (Exercise: Prove it by induction).
- A node has a single parent.
- Two nodes in a tree have at most one path between them.
- Leaf (aka external) node: node without children.
- Internal node: a node that is not a leaf.
- Siblings: two nodes with the same parent.
- Trees can never have cycles (loops), so there is no non-zero length path from a node to itself.

## Implementation of Trees

One possible pointer-based implementation:
- Tree nodes with value and a pointer to each child
- Problem:
    - How many pointers should we allocate space for?

An alternate pointer-based implementation
- 1st Child / Next Sibling List Representation
- Each node has 2 pointers: one to its first child and one to next sibling
- It can handle arbitrary number of children

<img src="images/week-04/tree_implementation.png">

# Binary Trees

### Definition:
A **binary tree** is either empty, or it consists of a node called the **root** together with two binary trees called the **left subtree** and the **right subtree** of the root, which are disjoint from each other and from the root.

<img src="images/week-04/binary_tree.png" width="500" height="500">

<img src="images/week-04/binary_tree2.png" width="300" height="300">

- These subtrees may be empty.

- Every node has at most two children
    - Left child
    - Right child
    - if only 1 child, still need to specify if left or right child
- Most popular trees in computer science

<img src="images/week-04/binary_tree_details.png" width="500" height="500">


**Problem:** Given a binary tree with $N$ nodes, what is the minimum depth of the tree?

**Constructive answer:** Minimum depth is obtained at maximum number of nodes.
At depth $d$, there are at most $N = 2^{d}$ to $N = 2^{d+1}-1$ nodes. After simple mathematics:
\begin{equation}
2^{d}\le N\le 2^{d+1}\quad \textrm{implies}\quad d_{min}=\log_{2}{N}
\end{equation}

**Exercise:** Prove that for a binary tree with $N$ nodes the minimum depth is $d_{min}=\log_{2}{N}$. (*Hint:* Use mathematical induction.)

## Binary Tree ADT
In addition to Tree methods binary trees have:
- left($v$): returns the left child of $v$ if it exists, else NULL
- right($v$ ): returns the right child of $v$ if it exists, else NULL
- hasLeft($v$): returns TRUE if node $v$ has left child
- hasRight($v$): returns TRUE if node $v$ has right child

### Special binary trees
- **Perfect:** A binary tree is a perfect if every level is completely full.
<img src="images/week-04/binary_tree_perfect.png">

- **(Left-)Complete:** A binary tree is (left-)complete if
     - every level is completely full, possibly excluding the lowest level
     - at the lowest level, all nodes are as far left as possible.
<img src="images/week-04/binary_tree_complete.png">

- **Full:** A binary tree T is full if each node is either a leaf or possesses exactly two child nodes.

**Method 2**
- Visit the left subtree (i.e., visit the tree whose root is the left child) and do this recursively,
- Visit the root,
- Visit the right subtree (i.e., visit the tree whose root is the right child) and do this recursively.
*Output:*
- **A ^ 2 - 2 * A * B + B ^2 / A - B**
- But need paranthesis **(A ^ 2 - 2 * A * B + B ^2) / (A - B)**


### Representation

<img src="images/week-04/binary_tree_node_rep.png" width="100">

For example, 
<img src="images/week-04/binary_tree_rep1.png" width="550">

In [1]:
class BinaryTree:
    def __init__(self,root):
        self.key = root
        self.left = None
        self.right = None
        
    def insertLeft(self,key):
        if self.left == None:
            self.left = BinaryTree(key)
        else:
            t = BinaryTree(key)
            t.left = self.left
            self.left = t

    def insertRight(self,key):
        if self.right == None:
            self.right = BinaryTree(key)
        else:
            t = BinaryTree(key)
            t.right = self.right
            self.right = t

    def getRightChild(self):
        return self.right

    def getLeftChild(self):
        return self.left

    def setRootVal(self,obj):
        self.key = obj

    def setNodeVal(self,obj):
        self.key = obj

    def getRootVal(self):
        return self.key
    
    def getNodeVal(self):
        return self.key

In [2]:
bt = BinaryTree('a')

In [5]:
print(bt.getRootVal())

ABC


In [4]:
bt.setRootVal("ABC")

In [6]:
print(bt.getLeftChild())

None


In [None]:
bt.insertLeft('b')

In [None]:
print(bt.getLeftChild())

In [None]:
print(bt.getLeftChild().getNodeVal())

In [None]:
bt.insertRight('c')

In [None]:
print(bt.getRightChild())

In [None]:
print(bt.getRightChild().getNodeVal())

In [None]:
bt.getRightChild().setNodeVal('hello')

In [None]:
print(bt.getRightChild().getNodeVal())

## Tree traversal 
Tree traversal means that visit a node and do something, and repeat the process for each node in the tree.

- It is not obvious how to visit every node of a tree, unlike arrays which have a natural order.
- There are tree traversal algorithms for visiting every node in a tree
     - Each visits nodes in different orders.

     
For example, consider the expression tree:
<img src="images/week-04/expression_tree.png" width="500">

**Expression tree** is a binary tree in which each internal node corresponds to operator and each leaf node corresponds to operand.

**Method 1**
- Visit the root,
- Visit the left subtree (i.e., visit the tree whose root is the left child) and do this recursively,
- Visit the right subtree (i.e., visit the tree whose root is the right child) and do this recursively.
*Output:* **/ + - ^ A 2 * * 2 A B ^ B 2 - A B**

**Method 2**
- Visit the left subtree (i.e., visit the tree whose root is the left child) and do this recursively,
- Visit the root,
- Visit the right subtree (i.e., visit the tree whose root is the right child) and do this recursively.
*Output:*
- **A ^ 2 - 2 * A * B + B ^2 / A - B**
- But need paranthesis **(A ^ 2 - 2 * A * B + B ^2) / (A - B)**

**Method 3**
- Visit the left subtree (i.e., visit the tree whose root is the left child) and do this recursively,
- Visit the right subtree (i.e., visit the tree whose root is the right child) and do this recursively,
- Visit the root.
*Output:* **Exercise**

**Method 1** is known as *pre-order*, **Method 2** *in-order*, and **Method 3** *post-order*.

- **Pre-order:** visits each node before visiting left and right,
- **Post-order:** visits each child before visiting node,
- **In-order:** visits left child, node and then right child.

These are three different well-known traversal algorithms.

### Implementation

**Pre-order:**
- Visit the root,
- Visit the left subtree (i.e., visit the tree whose root is the left child) and do this recursively,
- Visit the right subtree (i.e., visit the tree whose root is the right child) and do this recursively.

In [None]:
def preorder(tree):
    if tree:
        print(tree.getRootVal(), end =" ")
        preorder(tree.getLeftChild())
        preorder(tree.getRightChild())

For example, the corresponding travelsal outputs for the expression tree:
<img src="images/week-04/expression_tree2.png" width="150">

In [None]:
et = BinaryTree('+')
et.insertLeft('*')
et.insertRight('5')
et.getLeftChild().insertLeft('2')
et.getLeftChild().insertRight('4')

- **Pre-order output:** + * 2 4 5

In [None]:
preorder(et)

**In-order:**
- Visit the left subtree (i.e., visit the tree whose root is the left child) and do this recursively,
- Visit the root,
- Visit the right subtree (i.e., visit the tree whose root is the right child) and do this recursively.

- **In-order output:** 2 * 4 + 5

In [None]:
def inorder(tree):
    if tree:
        inorder(tree.getLeftChild())
        print(tree.getRootVal(), end =" ")
        inorder(tree.getRightChild())

In [None]:
inorder(et)

**Post-order:**
- Visit the left subtree (i.e., visit the tree whose root is the left child) and do this recursively,
- Visit the right subtree (i.e., visit the tree whose root is the right child) and do this recursively,
- Visit the root.

- **Post-order output:** 2 4 * 5 +

# Binary Search Tree (BST)

BST has two properties:
### Structure property (binary tree)
-  Each node has ≤ 2 children

### Order property
    - All keys in left subtree smaller than node’s key
    - All keys in right subtree larger than node’s key
        - Result: easy to find any given key

<img src="images/week-04/binary_search_tree.png" width="300">

For example,
<img src="images/week-04/binary_search_tree2.png" width="300">
is a BST, but

<img src="images/week-04/binary_search_tree3.png" width="300">
is not BST.

**A binary search tree is a type of binary tree (but not all binary trees are binary search trees!)**

# BST ADT
- find(key) : returns TRUE if key is at the tree.

find(11):

<img src="images/week-04/BSTfind.png" width="300">

find(14):

<img src="images/week-04/BSTfind2.png" width="400">

- findMax() : retuns the maximum valued key at the tree and, returns nothing if the tree is empty.
- findMin : retuns the minimum valued key at the tree and, returns nothing if the tree is empty. 
- Here is the pseudocode for findMin:

Node findMin(Node root){

    if(root == null)
 
        return null;
    
    if(root.left==null)
 
        return root;
   
    return findMin(root.left);
 
}

## Insert

- insert(key) : Find the right spot and hook on a new node.

insert(17)

<img src="images/week-04/BSTinsert.png"  width="300">

## Delete

- Deletion can be tricky
- There are three cases to consider
     - Removing a leaf: easy, just delete it
     - Removing internal node with 1 child (e.g., 15)
     - Removing internal node with 2 children (e.g., 7)
<img src="images/week-04/BSTdelete.png"  width="300">

### Case 2 
- Delete a note with one child.
- **Strategy** “Splice out” node by connecting its parent to its child
- For example: delete 15
    - set parent’s left pointer to 17
    - remove 15’s pointer
    - no more references to 15 so erased
    - BST order is maintained
<img src="images/week-04/BSTdelete1.png"  width="300">

### Case 3 
- Delete a note with two children.
- **Strategy** “replace” node by the successor
    - successor: next largest node
- Delete successor
- For example: delete 7
    - What is successor of 7?
- Since node has 2 children it has a left subtree **a right subtree**
- Successor is leftmost node in right subtree, i.e. node with NO left child at right subtree.
    - Here is the pseudocode
    
        successor(node){
            curr = node.right
            while (curr.left != null):
                curr = curr.left
            return curr
        }
<img src="images/week-04/BSTdelete2.png"  width="300">
- Now, replace node with successor
- Observation
     - Successor can’t have left sub-tree
        - …otherwise its left child would be successor
     - so successor only has right child
     - Remove successor using either Case #1 or #2
         - In this example, use case #2 (internal node has one child)
- Successor removed and BST order restored

<img src="images/week-04/BSTdelete3.png"  width="300">

In Delete(key), instead of successor, it is okay to use **predecessor** - rightmost node in left subtree, i.e. node with NO right child at left subtree.

In [7]:
class Node:
    """A class for creating a binary tree node and inserting elements.

       Attributes:
       -----------
       key : int, str
            The value that exists at this node of the tree.  eg. tree=Node(4) initializes a tree with 
            a stump integer value of 4.

       Methods: 
       --------   
       insert(self, key) : Inserts a new element into the tree. 
    """

    def __init__(self, key):
        self.key = key
        self.right = None
        self.left = None


    def insert(self, key):
        if self.key == key:
            return
        elif self.key < key:
            if self.right is None:
                self.right = Node(key)
            else:
                self.right.insert(key)
        else: # self.key > key
            if self.left is None:
                self.left = Node(key)
            else:
                self.left.insert(key)
                
 

    def display(self):
        lines, _, _, _ = self._display_aux()
        for line in lines:
            print(line)


    def _display_aux(self):
        """Returns list of strings, width, height, and horizontal coordinate of the root. this is 
           a utility function that gets used by the <display()> method for building pretty stdout 
           visualization of the binary tree. """

        # No child exists.
        if self.right is None and self.left is None:
            line = '%s' % self.key
            width = len(line)
            height = 1
            middle = width // 2
            return [line], width, height, middle

        # Only left child exists.
        if self.right is None:
            lines, n, p, x = self.left._display_aux()
            s = '%s' % self.key
            u = len(s)
            first_line = (x + 1) * ' ' + (n - x - 1) * '_' + s
            second_line = x * ' ' + '/' + (n - x - 1 + u) * ' '
            shifted_lines = [line + u * ' ' for line in lines]
            return [first_line, second_line] + shifted_lines, n + u, p + 2, n + u // 2

        # Only right child exists.
        if self.left is None:
            lines, n, p, x = self.right._display_aux()
            s = '%s' % self.key
            u = len(s)
            first_line = s + x * '_' + (n - x) * ' '
            second_line = (u + x) * ' ' + '\\' + (n - x - 1) * ' '
            shifted_lines = [u * ' ' + line for line in lines]
            return [first_line, second_line] + shifted_lines, n + u, p + 2, u // 2

        # Two children exist.
        left, n, p, x = self.left._display_aux()
        right, m, q, y = self.right._display_aux()
        s = '%s' % self.key
        u = len(s)
        first_line = (x + 1) * ' ' + (n - x - 1) * '_' + s + y * '_' + (m - y) * ' '
        second_line = x * ' ' + '/' + (n - x - 1 + u + y) * ' ' + '\\' + (m - y - 1) * ' '

        if p < q:
            left += [n * ' '] * (q - p)
        elif q < p:
            right += [m * ' '] * (p - q)
            
        zipped_lines = zip(left, right)
        lines = [first_line, second_line] + [a + u * ' ' + b for a, b in zipped_lines]
        return lines, n + m + u, max(p, q) + 2, n + u // 2

In [17]:
bst2 = Node(15)
keys = [10,11,12,13,14,16,17,18,19]
#keys = [11,12,13,14,15,16,17,18,19]
for key in keys:
    bst2.insert(key)

In [18]:
bst2.display()

  ________15_       
 /           \      
10_         16_     
   \           \    
  11_         17_   
     \           \  
    12_         18_ 
       \           \
      13_         19
         \          
        14          


In [10]:
def get_height(tree):
    '''
    Returns the height of the tree.  
    '''
    if tree is None or (tree.left is None and tree.right is None): 
        return 0
    return 1 + max(get_height(tree.left), get_height(tree.right))    

In [14]:
get_height(bst1.right)

8

In [15]:
def is_balanced(tree):
    '''
    Method for determining if a binary tree is balanced.

    A binary tree is balanced if:
        - it's empty
        - the left sub tree is balanced
        - the right subtree is balanced
        - the difference in depth between left and right is <=1

    Parameters:
    ____________
    root : the node object, below which the definition of 'balanced' will be applied.    
    '''
    if tree is None: 
        return True
    return is_balanced(tree.right) and is_balanced(tree.left) and abs(get_height(tree.left) - get_height(tree.right)) <= 1   

In [19]:
is_balanced(bst2)

False

## Binary Search Tree Analysis

- How fast are BST operations?
     - Given a tree, what is the worstcase node to find/remove?
- What is the best-case tree?
     - a balanced tree
     
     <img src="images/week-04/BSTbalanced.png" width="300">
     
- What is the worst-case tree?
     - a completely unbalanced tree
     
     <img src="images/week-04/BSTunbalanced.png" width="200">
     
**Problem**: operations may be inefficient if BST is unbalanced.