## Parse tree

With the implementation of our tree data structure complete, we now look at an example of how a tree can be used to solve some real problems. In this section we will look at parse trees. Parse trees can be used to represent real-world constructions like sentences or mathematical expressions. The Figure below shows the hierarchical structure of a simple sentence. Representing a sentence as a tree structure allows us to work with the individual parts of the sentence by using subtrees.

![](./images/p1.png)

We can also represent a mathematical expression like $((7+3)∗(5−2))$ as a parse tree.

![](./images/p2.png)

We have already looked at fully parenthesized expressions, so what do we know about this expression? We know that multiplication has a higher precedence than either addition or subtraction. Because of the parentheses, we know that before we can do the multiplication we must evaluate the parenthesized addition and subtraction expressions. The hierarchy of the tree helps us understand the order of evaluation for the whole expression. Before we can evaluate the top-level multiplication, we must evaluate the addition and the subtraction in the subtrees. The addition, which is the left subtree, evaluates to 10. The subtraction, which is the right subtree, evaluates to 3. Using the hierarchical structure of trees, we can simply replace an entire subtree with one node once we have evaluated the expressions in the children. Applying this replacement procedure gives us the simplified tree shown below.

![](./images/p3.png)

In the rest of this section we are going to examine parse trees in more detail. In particular we will look at

- How to build a parse tree from a fully parenthesized mathematical expression.
- How to evaluate the expression stored in a parse tree.
- How to recover the original mathematical expression from a parse tree.

The first step in building a parse tree is to break up the expression string into a list of tokens. There are four different kinds of tokens to consider: left parentheses, right parentheses, operators, and operands. We know that whenever we read a left parenthesis we are starting a new expression, and hence we should create a new tree to correspond to that expression. Conversely, whenever we read a right parenthesis, we have finished an expression. We also know that operands are going to be leaf nodes and children of their operators. Finally, we know that every operator is going to have both a left and a right child.

Using the information from above we can define four rules as follows:

- If the current token is a `'('`, add a new node as the left child of the current node, and descend to the left child.
- If the current token is in the list `['+','-','/','*']`, set the root value of the current node to the operator represented by the current token. Add a new node as the right child of the current node and descend to the right child.
- If the current token is a number, set the root value of the current node to the number and return to the parent.
- If the current token is a `')'`, go to the parent of the current node.

Before writing the Python code, let’s look at an example of the rules outlined above in action. We will use the expression $(3+(4∗5))$. We will parse this expression into the following list of character tokens `['(', '3', '+', '(', '4', '*', '5' ,')',')']`. Initially we will start out with a parse tree that consists of an empty root node. The Figure below illustrates the structure and contents of the parse tree, as each new token in the expression is processed in turn. The gray note indicates the active/current node as the algorithm process unfolds.

<img src='./images/p4.png' width=800px/>

Using the previous Figure, let’s walk through the example step by step:

    a) Create an empty tree
    b) Read ( as the first token. By rule 1, create a new node as the left child of the root. Make the current node this new child
    c) Read 3 as the next token. By rule 3, set the root value of the current node to 3 and go back up the tree to the parent
    d) Read + as the next token. By rule 2, set the root value of the current node to + and add a new node as the right child The new right child becomes the current node
    e) Read a ( as the next token. By rule 1, create a new node as the left child of the current node. The new left child becomes the current node
    f) Read a 4 as the next token. By rule 3, set the value of the current node to 4. Make the parent of 4 the current node
    g) Read * as the next token. By rule 2, set the root value of the current node to * and create a new right child. The new right child becomes the current node
    h) Read 5 as the next token. By rule 3, set the root value of the current node to 5. Make the parent of 5 the current node
    i) Read ) as the next token. By rule 4 we make the parent of * the current node
    j) Read ) as the next token. By rule 4 we make the parent of + the current node. At this point there is no parent for + so we are done
    
From the example above, it is clear that we need to keep track of the current node as well as the parent of the current node. The tree interface provides us with a way to get children of a node, through the `getLeftChild` and `getRightChild` methods, but how can we keep track of the parent? **A simple solution to keeping track of parents as we traverse the tree is to use a stack.** Whenever we want to descend to a child of the current node, we first push the current node on the stack. When we want to return to the parent of the current node, we pop the parent off the stack.

Using the rules described above, along with the `Stack` and `BinaryTree` operations, we are now ready to write a Python function to create a parse tree. The code for our parse tree builder is presented below:

In [1]:
from utils import Stack, BinaryTree


def buildParseTree(fpexp):
    fplist = fpexp.split()
    pStack = Stack()
    eTree = BinaryTree('')
    pStack.push(eTree)
    currentTree = eTree
    for i in fplist:
        if i == '(':
            currentTree.insertLeft('')
            pStack.push(currentTree)
            currentTree = currentTree.getLeftChild()
        elif i not in ['+', '-', '*', '/', ')']:
            currentTree.setRootVal(int(i))
            parent = pStack.pop()
            currentTree = parent
        elif i in ['+', '-', '*', '/']:
            currentTree.setRootVal(i)
            currentTree.insertRight('')
            pStack.push(currentTree)
            currentTree = currentTree.getRightChild()
        elif i == ')':
            currentTree = pStack.pop()
        else:
            raise ValueError
    return eTree

#pt = buildParseTree("( ( 10 + 5 ) * 3 )")
pt = buildParseTree("( 3 + ( 4 * 5 ) )")
pt

<utils.BinaryTree at 0x4db7198>

The four rules for building a parse tree are coded as the first four clauses of the `if` statement on lines 11, 15, 19, and 24 of the code above. In each case you can see that the code implements the rule, as described above, with a few calls to the `BinaryTree` or `Stack` methods. The only error checking we do in this function is in the else clause where we raise a ValueError exception if we get a token from the list that we do not recognize.

Now that we have built a parse tree, what can we do with it? As a first example, we will write a function to evaluate the parse tree, returning the numerical result. To write this function, we will make use of the hierarchical nature of the tree. Recall that we can replace the original tree with the a simplified tree after evaluating a subtree.

![](./images/p2-3.png)

This suggests that we can write an algorithm that evaluates a parse tree by recursively evaluating each subtree.

As we have done with past recursive algorithms, we will begin the design for the recursive evaluation function by identifying the base case. A natural base case for recursive algorithms that operate on trees is to check for a leaf node. In a parse tree, the leaf nodes will always be operands. Since numerical objects like integers and floating points require no further interpretation, the `evaluate` function can simply return the value stored in the leaf node. The recursive step that moves the function toward the base case is to call `evaluate` on both the left and the right children of the current node. The recursive call effectively moves us down the tree, toward a leaf node.

To put the results of the two recursive calls together, we can simply apply the operator stored in the parent node to the results returned from evaluating both children. In the example from the following Figure

![](./images/p3.png)

we see that the two children of the root evaluate to themselves, namely 10 and 3. Applying the multiplication operator gives us a final result of 30.

The code for a recursive `evaluate` function is shown in the code Listing below. First, we obtain references to the left and the right children of the current node. If both the left and right children evaluate to `None`, then we know that the current node is really a leaf node. This check is on line 9. If the current node is not a leaf node, look up the operator in the current node and apply it to the results from recursively evaluating the left and right children.

To implement the arithmetic, we use a dictionary with the keys `'+'`, `'-'`, `'*'`, and `'/'`. The values stored in the dictionary are functions from Python’s operator module. The operator module provides us with the functional versions of many commonly used operators. When we look up an operator in the dictionary, the corresponding function object is retrieved. Since the retrieved object is a function, we can call it in the usual way `function(param1,param2)`. So the lookup `opers['+'](2,2)` is equivalent to `operator.add(2,2)`.

In [3]:
import operator

def evaluate(parseTree):
    opers = {'+':operator.add, '-':operator.sub, '*':operator.mul, '/':operator.truediv}

    leftC = parseTree.getLeftChild()
    rightC = parseTree.getRightChild()

    if leftC and rightC:
        fn = opers[parseTree.getRootVal()]
        return fn(evaluate(leftC),evaluate(rightC))
    else:
        return parseTree.getRootVal()

#pt = buildParseTree("( 3 + ( 4 * 5 ) )")
evaluate(pt)

23

Finally, we will trace the `evaluate` function on the parse tree we created for the expression $( 3 + ( 4 * 5 ) )$. When we first call `evaluate`, we pass the root of the entire tree as the parameter `parseTree`. Then we obtain references to the left and right children to make sure they exist. The recursive call takes place on line 11. We begin by looking up the operator in the root of the tree, which is `'+'`. The `'+'` operator maps to the `operator.add` function call, which takes two parameters. As usual for a Python function call, the first thing Python does is to evaluate the parameters that are passed to the function. In this case both parameters are recursive function calls to our `evaluate` function. Using left-to-right evaluation, the first recursive call goes to the left. In the first recursive call the `evaluate` function is given the left subtree. We find that the node has no left or right children, so we are in a leaf node. When we are in a leaf node we just return the value stored in the leaf node as the result of the evaluation. In this case we return the integer 3.

At this point we have one parameter evaluated for our top-level call to `operator.add`. But we are not done yet. Continuing the left-to-right evaluation of the parameters, we now make a recursive call to evaluate the right child of the root. We find that the node has both a left and a right child so we look up the operator stored in this node, `'*'`, and call this function using the left and right children as the parameters. At this point you can see that both recursive calls will be to leaf nodes, which will evaluate to the integers four and five respectively. With the two parameters evaluated, we return the result of `operator.mul(4,5)`. At this point we have evaluated the operands for the top level `'+'` operator and all that is left to do is finish the call to `operator.add(3,20)`. The result of the evaluation of the entire expression tree for $(3+(4∗5))$ is $23$.

## Tree Traversals

Now that we have examined the basic functionality of our tree data structure, it is time to look at some additional usage patterns for trees. These usage patterns can be divided into the three ways that we access the nodes of the tree. There are three commonly used patterns to visit all the nodes in a tree. The difference between these patterns is the order in which each node is visited. We call this visitation of the nodes a “traversal.” The three traversals we will look at are called preorder, inorder, and postorder. Let’s start out by defining these three traversals more carefully, then look at some examples where these patterns are useful.

- **preorder**: In a preorder traversal, we visit the root node first, then recursively do a preorder traversal of the left subtree, followed by a recursive preorder traversal of the right subtree.
- **inorder**: In an inorder traversal, we recursively do an inorder traversal on the left subtree, visit the root node, and finally do a recursive inorder traversal of the right subtree.
- **postorder**: In a postorder traversal, we recursively do a postorder traversal of the left subtree and the right subtree followed by a visit to the root node.

Let’s look at some examples that illustrate each of these three kinds of traversals. First let’s look at the preorder traversal. As an example of a tree to traverse, we will represent a book as a tree. The book is the root of the tree, and each chapter is a child of the root. Each section within a chapter is a child of the chapter, and each subsection is a child of its section, and so on. The Figure below shows a limited version of a book with only two chapters. Note that the traversal algorithm works for trees with any number of children, but we will stick with binary trees for now.

<img src='./images/tt1.png' width=500px/>

Suppose that you wanted to read this book from front to back. The preorder traversal gives you exactly that ordering. Starting at the root of the tree (the Book node) we will follow the preorder traversal instructions. We recursively call `preorder` on the left child, in this case Chapter1. We again recursively call `preorder` on the left child to get to Section 1.1. Since Section 1.1 has no children, we do not make any additional recursive calls. When we are finished with Section 1.1, we move up the tree to Chapter 1. At this point we still need to visit the right subtree of Chapter 1, which is Section 1.2. As before we visit the left subtree, which brings us to Section 1.2.1, then we visit the node for Section 1.2.2. With Section 1.2 finished, we return to Chapter 1. Then we return to the Book node and follow the same procedure for Chapter 2.

The code for writing tree traversals is surprisingly elegant, largely because the traversals are written recursively. The Listing below shows the Python code for a **preorder traversal** of a binary
tree.

In [4]:
def preorder(tree):
    if tree:
        print(tree.getRootVal())
        preorder(tree.getLeftChild())
        preorder(tree.getRightChild())

The algorithm for the **inorder traversal**, shown next, is nearly identical to preorder except that we move the call to print after calling `getLeftChild` but before calling `getRightChild`. In the **inorder traversal** we visit the left subtree, followed by the root, and finally the right subtree. The Listing below shows our code for the **inorder traversal**. 

In [7]:
def inorder(tree):
    if tree != None:
        inorder(tree.getLeftChild())
        print(tree.getRootVal())
        inorder(tree.getRightChild())

The algorithm for the **postorder traversal**, shown next, is nearly identical to preorder and inorder except that we move the call to print to the end of the function.

In [5]:
def postorder(tree):
    if tree != None:
        postorder(tree.getLeftChild())
        postorder(tree.getRightChild())
        print(tree.getRootVal())

We have already seen a common use for the postorder traversal, namely evaluating a parse tree.

Notice that in all three of the traversal functions we are simply changing the position of the `print` statement with respect to the two recursive function calls.

## Priority Queues with Binary Heaps

In earlier lectures you learned about the first-in first-out (**FIFO**) data structure called a queue. One important variation of a queue is called a priority queue. A priority queue acts like a queue in that you dequeue an item by removing it from the front. However, in a priority queue the logical order of items inside a queue is determined by their priority. The highest priority items are at the front of the queue and the lowest priority items are at the back. Thus when you enqueue an item on a priority queue, the new item may move all the way to the front depending on its priority. We will see that the priority queue is a useful data structure for some of the graph algorithms we will study in future lectures.

![](./images/pq.png)

You can probably think of a couple of easy ways to implement a priority queue using sorting functions and lists. However, inserting into a list is $O(n)$ and sorting a list is $O(nlog(n))$. We can do better. The classic way to implement a priority queue is using a data structure called a **binary heap**. A binary heap will allow us both enqueue and dequeue items in $O(log( n))$.

The binary heap is interesting to study because when we diagram the heap it looks a lot like a tree, but when we implement it we use only a single list as an internal representation. The binary heap has two common variations: the **min heap**, in which the smallest key is always at the front, and the max heap, in which the largest key value is always at the front. In this section we will implement the min heap.

## Binary Heap Operations

The basic operations we will implement for our binary heap are as follows:

- `BinaryHeap()` creates a new, empty, binary heap.
- `insert(k)` adds a new item to the heap.
- `findMin()` returns the item with the minimum key value, leaving item in the heap.
- `delMin()` returns the item with the minimum key value, removing the item from the heap.
- `isEmpty()` returns true if the heap is empty, false otherwise.
- `size()` returns the number of items in the heap.
- `buildHeap(list)` builds a new heap from a list of keys.

We will now turn our attention to creating and implementing this idea.

## Binary Heap Implementation

### The Heap Complete Structure Property

In order to make our heap work efficiently, we will take advantage of the logarithmic nature of the binary tree to represent our heap. In order to guarantee logarithmic performance, we must keep our tree balanced. A balanced binary tree has roughly the same number of nodes in the left and right subtrees of the root. In our heap implementation we keep the tree balanced by creating a complete binary tree. A complete binary tree is a tree in which each level has all of its nodes. The exception to this is the bottom level of the tree, which we fill in from left to right. The next Figure shows an example of a complete binary tree.

![](./images/h1.png)

Another interesting property of a complete tree is that we can represent it using a single list. We do not need to use nodes and references or even lists of lists. Because the tree is complete, the left child of a parent (at position $p$) is the node that is found in position $2p$ in the list. Similarly, the right child of the parent is at position $2p+1$ in the list. To find the parent of any node in the tree, we can simply use Python’s integer division (//). Given that a node is at position $n$ in the list, the parent is at position ${\lfloor}\frac{n}{2}{\rfloor}$. The next Figure shows a complete binary tree and also gives the list representation of the tree. Note the $2p$ and $2p+1$ relationship between parent and children. The list representation of the tree, along with the full structure property, allows us to efficiently traverse a complete binary tree using only a few simple mathematical operations. We will see that this also leads to an efficient implementation of our binary heap. You will notice that the binary heap implementation has a single zero as the first element of `heapList`. This zero is not used and is not part of the tree, but is there so that simple integer division can be used in later methods.

![](./images/h2.png)


### The Heap Order Property

The method that we will use to store items in a heap relies on maintaining the **heap order property**. **The heap order property** is as follows: In a heap, for every node $x$ with parent $p$, the key in $p$ is smaller than or equal to the key in $x$. The next Figure illustrates a complete binary tree that has the heap order property.

![](./images/h2.png)


### Heap Operations

We will begin our implementation of a binary heap with the constructor. Since the entire binary heap can be represented by a single list, all the constructor will do is initialize the list and an attribute `currentSize` to keep track of the current size of the heap. The code Listing below shows the Python code for the constructor. Once again, an empty binary heap has a single zero as the first element of `heapList`. This zero is not used, but is there so that simple integer division can be used in later methods.

In [1]:
class BinHeap:
    def __init__(self):
        self.heapList = [0]
        self.currentSize = 0        

The next method we will implement is `insert`. The easiest, and most efficient, way to add an item to a list is to simply append the item to the end of the list. The good news about appending is that it guarantees that we will maintain the **Heap Complete Structure property**. The bad news about appending is that we will very likely violate the **Heap Order property**. However, it is possible to write a method that will allow us to regain the **Heap Order property** by comparing the newly added item with its parent. If the newly added item is less than its parent, then we can swap the item with its parent. The next Figure shows the series of swaps needed to percolate a newly added item (7) to the heap up to its proper position in the tree.

![](./images/h3.png)

Notice that when we percolate an item up, we are restoring the **Heap Order property** between the newly added item and the parent. We are also preserving the **Heap Complete Structure property** for any siblings. Of course, if the newly added item is very small, we may still need to swap it up another level. In fact, we may need to keep swapping until we get to the top of the tree. The code Listing below shows the `percUp` method, which percolates a new item as far up in the tree as it needs to go to maintain the **Heap Order property**. Here is where our wasted first element in `heapList = [0]` is important. Notice that we can compute the parent of any node by using simple integer division. The parent of the current node can be computed by integer division of the index of the current node by 2.

We are now ready to write the `insert` method. Most of the work in the `insert` method is really done by `percUp`. Once a new item is appended to the tree, `percUp` takes over and positions the new item properly.

In [2]:
class BinHeap:
    def __init__(self):
        self.heapList = [0]
        self.currentSize = 0        
        
    def percUp(self,i):
        while i // 2 > 0:
            if self.heapList[i] < self.heapList[i // 2]:
                tmp = self.heapList[i // 2]
                self.heapList[i // 2] = self.heapList[i]
                self.heapList[i] = tmp
            i = i // 2        
          
    def insert(self,k):
        self.heapList.append(k)
        self.currentSize = self.currentSize + 1
        self.percUp(self.currentSize)          


With the `insert` method properly defined, we can now look at the `delMin` method. Since the **Heap Order property** requires that the root of the tree be the smallest item in the tree, finding the minimum item is easy. The hard part of `delMin` is restoring full compliance with the **Heap Complete structure** and **Heap Order** properties after the root has been removed. We can restore our heap in two steps.

1. we will restore the root item by taking the last item in the list and moving it to the root position. Moving the last item maintains our **Heap Complete Structure property**. However, we have probably destroyed the **Heap Order property** of our binary heap. 

2. we will restore the **Heap Order property** by pushing the new root node down the tree to its proper position. The next Figure shows the series of swaps needed to move the new root node to its proper position in the heap.

![](./images/h4.png)

In order to maintain the **Heap Order** property, all we need to do is swap the root with its smallest child less than the root. After the initial swap, we may repeat the swapping process with a node and its children until the node is swapped into a position on the tree where it is already less than both children. The code for percolating a node down the tree is found in the `percDown` and `minChild` methods.

In [1]:
class BinHeap:
    def __init__(self):
        self.heapList = [0]
        self.currentSize = 0        
        
    def percUp(self,i):
        while i // 2 > 0:
            if self.heapList[i] < self.heapList[i // 2]:
                tmp = self.heapList[i // 2]
                self.heapList[i // 2] = self.heapList[i]
                self.heapList[i] = tmp
            i = i // 2        
          
    def insert(self,k):
        self.heapList.append(k)
        self.currentSize = self.currentSize + 1
        self.percUp(self.currentSize)          
        
    def percDown(self,i):
        while (i * 2) <= self.currentSize:
            mc = self.minChild(i)
            if self.heapList[i] > self.heapList[mc]:
                tmp = self.heapList[i]
                self.heapList[i] = self.heapList[mc]
                self.heapList[mc] = tmp
            i = mc

    def minChild(self,i):
        if i * 2 + 1 > self.currentSize:
            return i * 2
        else:
            if self.heapList[i*2] < self.heapList[i*2+1]:
                return i * 2
            else:
                return i * 2 + 1        
                
    def delMin(self):
        retval = self.heapList[1]
        self.heapList[1] = self.heapList[self.currentSize]
        self.currentSize = self.currentSize - 1
        self.heapList.pop()
        self.percDown(1)
        return retval                
        

To finish our discussion of binary heaps, we will look at a method to build an entire heap from a list of keys. The first method you might think of may be like the following. Given a list of keys, you could easily build a heap by inserting each key one at a time. Since you are starting with a list of one item, the list is sorted and you could use binary search to find the right position to insert the next key at a cost of approximately $O(logn)$ operations. However, remember that inserting an item in the middle of the list may require $O(n)$ operations to shift the rest of the list over to make room for the new key. Therefore, to insert $n$ keys into the heap would require a total of $O(nlogn)$ operations. However, if we start with an entire list then we can build the whole heap in $O(n)$ operations. The next Listing shows the code to build the entire heap.

In [2]:
def buildHeap(self,alist):
    i = len(alist) // 2
    self.currentSize = len(alist)
    self.heapList = [0] + alist[:]
    while (i > 0):
        self.percDown(i)
        i = i - 1
        
"""
We have defined the function buildHeap outside the scope of the class BinHeap. We can add this function as a method to the
BinHeap Class in the following manner:
"""

setattr(BinHeap, "buildHeap", buildHeap)        

The next Figure shows the swaps that the `buildHeap` method makes as it moves the nodes in an initial tree of `[9, 6, 5, 2, 3]` into their proper positions. 

![](./images/h5.png)

Although we start out in the middle of the tree and work our way back toward the root, the `percDown` method ensures that the largest child is always moved down the tree. Because the heap is a complete binary tree, any nodes past the halfway point will be leaves and therefore have no children. Notice that when `i=1`, we are percolating down from the root of the tree, so this may require multiple swaps. As you can see in the rightmost two trees of the previous Figure, first the 9 is moved out of the root position, but after 9 is moved down one level in the tree, `percDown` ensures that we check the next set of children farther down in the tree to ensure that it is pushed as low as it can go. In this case it results in a second swap with 3. Now that 9 has been moved to the lowest level of the tree, no further swapping can be done. It is useful to compare the list representation of this series of swaps as shown in the previous Figure with the tree representation.

```
Initial Heap [0, 9, 6, 5, 2, 3]
i = 2  [9, 2, 5, 6, 3]
i = 1  [0, 2, 3, 5, 6, 9]
```

The key to understanding that you can build the heap in $O(n)$ is to remember that the $logn$ factor is derived from the height of the tree. For most of the work in `buildHeap`, the tree is shorter than $log(n)$. Using the fact that you can build a heap from a list in $O(n)$ time, you can construct a sorting algorithm that uses a heap and sorts a list in $O(nlogn))$.

Finally, we demonstrate the use of some of the binary heap methods. Notice that no matter the order that we add items to the heap, the smallest element is removed each time we call `delMin()`. 

In [3]:
bh = BinHeap()
bh.buildHeap([5,7,3,11])
print(bh.delMin())
print(bh.delMin())

3
5


In [4]:
bh.insert(8)
bh.insert(15)
print(bh.delMin())
print(bh.delMin())
print(bh.delMin())
print(bh.delMin())

7
8
11
15


In [13]:
bh = BinHeap()
bh.buildHeap([9, 8, 5, 6, 7])

In [14]:
bh.heapList

[0, 5, 6, 9, 8, 7]

In [15]:
bh.delMin()

5

In [16]:
bh.delMin()

6

In [17]:
bh.delMin()

7

In [18]:
bh.delMin()

8

In [19]:
bh.delMin()

9

In [20]:
bh.delMin()

IndexError: list index out of range

In [21]:
bh.heapList

[0]

#### References

- [Problem Solving with Algorithms and Data Structures using Python by Bradley N. Miller, David L. Ranum is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.](https://runestone.academy/runestone/books/published/pythonds/Trees/toctree.html)