In [None]:
// run this cell to prevent Jupyter from displaying the null output cell
com.twosigma.beakerx.kernel.Kernel.showNullExecutionResult = false;

# Binary search trees

A binary search tree (BST) is a special kind of binary tree where adding an element and searching for an element in the tree have $\Omega(\log n)$ complexity where $n$ is the number of elements in the tree; unfortunately, these operations have $O(n)$ complexity. Furthermore, an in-order traversal of the elements in a BST visits the elements in sorted order. A requirement on the elements of a BST is that the elements must be comparable with respect to their order (the elements must be sortable from smallest to largest).

A BST can be defined recursively using the binary search tree property: If the BST $t$ has a root node $n$ then:

1. the left subtree of $t$ is a BST where all elements are *less than or equal to* the element in $n$
2. the right subtree of $t$ is a BST where all elements are *greater than* the element in $n$

![Binary search tree](../resources/images/bst-2/Slide3.PNG)

The reader can easily verify that every node in the left subtree of node 50 contains an element less than 50 and every node in the right subtree of node 50 contains an element greater than 50.

**Note** Rule 2 can be modified so that all nodes in the right subtree have elements that are greater than *or equal to* the element of the root node. This means that duplicated elements can appear in either the left or right subtrees of a node containing the uppermost instance of a duplicated element. This modification adds some complications to most BST algorithms.

**Exercise 1** Verify that node 83 is the root of a subtree that is a BST.

**Exercise 2** Is the following statement true or false? Every tree of size 1 is a BST (assuming the operations less than, equal to, and greater than make sense for the elements of the tree).

## Traversal

A left-to-right inorder traversal of a BST visits the nodes of the tree in sorted order of its elements.

**Proof** (by induction) In a tree of size $n = 1$, the left-to-right inorder traversal visits the single node of the tree in sorted order of its element.

Induction hypothesis: In a tree of size $n > 1$, a left-to-right inorder traversal of a BST visits the nodes of the tree in sorted order of its elements.

Induction step: Let $t$ be a (non-empty) BST of size $n + 1$. The root node of $t$ has up to two subtrees rooted at $t.left$ and $t.right$ where the sizes of the subtrees are between $0$ and $n$.

* In an in-order traversal of $t$, we first perform an in-order traversal of $t.left$ (which has a maximum size of $n$). By the induction hypothesis, all elements in the $t.left$ are visisted in sorted order. 
* The next step of an in-order traversal is to visit the root node. By the binary search tree property, the element in the root node is greater than or equal to all elements in $t.left$. Therefore, visiting the root node after traversing $t.left$ visits the nodes of $t.left$ and the root node in sorted order. 
* Finally, an in-order traversal performs an in-order traversal of $t.right$ (which has a maximum size of $n$). By the induction hypothesis, all elements in the $t.right$ are visisted in sorted order. By the binary search tree property, all elements in $t.right$ are greater than the element in the root node of $t$. Therefore, traversing $t.right$ after traversing $t.left$ and the root node visits all of the nodes of the tree in sorted order.

The reader can easily verify that an in-order traversal of the tree shown above in fact visits the nodes in order of the elements of the tree.

## Finding the minimum element in a BST

The minimum value in a BST is stored in the leftmost node of the tree.

**Proof** (by contradiction) Assume that the minimum value is not stored in the leftmost node of the tree. Then the minimum value must be in a right subtree of some node $m$. This is a contradiction because every value in the right subtree is greater than the value in node $m$ by the binary search tree property. Therefore, the assumption is false.

An iterative psedocode algorithm for finding the minimum element in a BST is:

```
E min() {
    n = root node
    while (n.hasLeft()) {
        n = n.left
    }
    return n.elem
}
```

A recursive psedocode algorithm for finding the minimum element in a BST is:

```
E min() {
    return min(root node)
}

E min(BinaryNode n) {
    if (!n.hasLeft()) {
        return n.elem
    }
    return min(n.left)
}
```

**Exercise 3** Where is the maximum element located in a BST? Prove that your answer is correct.

**Exercise 4** Create iterative and recursive algorithms for finding the maximum element in a BST.

## Searching for an element

To search for an element in a BST we exploit the recursive structure of the tree. If the tree is empty then the searched for element is not in the tree. Otherwise, we test if the searched for element is equal to the element stored in the root node, returning the root node if the elements are equal. If the searched for element is less than or equal to the element stored in the root then we recursively search the left subtree for the element, otherwise we recursively search the right subtree for the element. A recursive psedocode algorithm for searching for an element in a BST is:

```
getNode(elem, node) {
    if (node == null) {        // elem is not in the tree rooted at node
        return null
    }
    if (elem == node.elem) {   // elem is in node
        return node
    }
    if (elem <= node.elem) {   // search left subtree
        return getNode(elem, node.left)
    }
    else {                     // search right subtree
        return getNode(elem, node.right)
    }
}
```

`getNode` can also be implemented iteratively; see the `contains` method in the [Implementation](./bst_implementation.ipynb#notebook_id) notebook for details.

Consider searching for the element 74 in the following BST:

![getNode(74)](../resources/images/bst-2/Slide65.PNG)

The search begins at the root node. 74 is greater than the value stored in the root node (50) so we recursively search the right subtree of the root node.

![getNode(74)](../resources/images/bst-2/Slide66.PNG)

74 is greater than the value stored in the subtree root node (73) so we recursively search the right subtree of the node.

![getNode(74)](../resources/images/bst-2/Slide67.PNG)

74 is less than the value stored in the subtree root node (83) so we recursively search the left subtree of the node.

![getNode(74)](../resources/images/bst-2/Slide68.PNG)

74 is equal to the value stored in the subtree root node so we return the subtree root node.

![getNode(74)](../resources/images/bst-2/Slide69.PNG)

Suppose that we search the same tree for the element 20. The search eventually reaches node 19. 20 is greater than 19 so we attempt to search the right subtree of node 19; however, there is no right subtree and `null` is returned.

**Exercise 5** Trace the steps required to get the nodes containing the elements 8, 76, and 55 in the tree shown above.

## Adding an element to a BST

Adding an element to a BST always results in adding a new node as a child to a leaf node. To find the location where the leaf node is added we exploit the recursive structure of the tree. If the tree is empty then we create a new node to store the element and assign the node to be the tree root. Otherwise, we compare the element to the value stored in the root and then recursively add the new element to the appropriate subtee: If the element to add is less than or equal to the element in the root then we recursively add to the left subtree, otherwise we recursively add to the right subtree. A recursive psedocode algorithm for adding an element to a BST is:

```
add(elem) {
    if (tree is empty) {
        root node = new BinaryNode(elem)
    }
    else {
        add(elem, root node)
    }
}

add(elem, node) {
    if (elem < node.elem) {
        if (!node.hasLeft()) {
			node.setLeft(new BinaryNode(elem));
		} else {
			add(elem, node.left);
		}
	} else {
		if (!node.hasRight()) {
			node.setRight(new BinaryNode(elem));
		} else {
			add(elem, root.right);
		}
	}
}
```

Consider adding the element 19 to the following BST:

![Add 19: Step 1](../resources/images/bst-2/Slide10.PNG)

We begin by comparing the element to be added to the element in the root node. 19 is less than 50, thus it belongs in the left subtree of the root node. The root node has a left child so we recursively add the 19 to subtree rooted at the left child node of node 50:

![Add 19: Step 2](../resources/images/bst-2/Slide11.PNG)

We now compare the element to be added to the subtree root node. 19 is less than 27, thus it belongs in the left subtree of node 27. Node 27 has a left child so we recursively add the 19 to subtree rooted at the left child node of node 27:

![Add 19: Step 3](../resources/images/bst-2/Slide12.PNG)

Again we compare the element to be added to the subtree root node. 19 is greater than 8, thus it belongs in the right subtree of node 8. 

![Add 19: Step 4](../resources/images/bst-2/Slide13.PNG)

Node 8 has no right child so we create a new node for the added element and set the right child of node 8 to the new node.

![Add 19: Step 5](../resources/images/bst-2/Slide14.PNG)

**Exercise 6** Where in the BST shown above would the element 55 be added?

**Exercise 7** Where in the BST shown above would the element 3 be added?

**Exercise 8** Where in the BST shown above would the element 50 be added?

**Exercise 9** Suppose we start with an empty BST and then add the elements 1, 2, 3, 4, and 5 to the tree in that order. What does the tree look like?

**Exercise 10** Suppose we start with an empty BST and then add the elements 5, 4, 3, 2, and 1 to the tree in that order. What does the tree look like?

## Predecessors and successors

In a BST there is something special about a node's:

* left subtree right-most child
* right subtree left-most child

![Predecessors and successors](../resources/images/bst-2/Slide18.PNG)

The right-most child in the left subtree of a node $n$ is called the *inorder predecessor* of $n$. The inorder predecessor is the node containing the greatest element that is less than or equal to the element in $n$. In other words, if we wrote the elements of the tree in sorted order then the inorder predecessor contains the element that comes immediately before the element in $n$.

The left-most child in the right subtree of a node $n$ is called the *inorder successor* of $n$. The inorder successor is the node containing the smallest element that is greater than the element in $n$. In other words, if we wrote the elements of the tree in sorted order then the inorder successor contains the element that comes immediately after the element in $n$.

**Exercise 11** Consider a node $n$. How can you tell if $n$ has an inorder predecessor in the subtree rooted at $n$? How can you tell if $n$ has an inorder successor in the subtree rooted at $n$?

For reasons that will become clear when deletion from the tree is discussed, we describe algorithms for returning the inorder predecessor node of a node $n$.

```
inorderPredecessor(BinaryNode n) {
    if (n.hasLeft()) {
        pred = n.left                 // goto root of left subtree of n
        while (pred.hasRight()) {     // go right as far as possible
            pred = pred.right
        }
        return pred
    }
    else {
        return null                   // n has no predecessor
    }
}
```

Observe that the part of the algorithm inside the if block simply finds the node containing the maximum element of the left subtree of $n$. If a `max` algorithm exists then the `inorderPredecessor` algorithm becomes:

```
inorderPredecessor(BinaryNode n) {
    if (n.hasLeft()) {
        return max(n.left)
    }
    else {
        return null                   // n has no predecessor
    }
}
```

**Exercise 12** State an algorithm for finding the inorder successor node of a node $n$.

## Removing an element

Removing an element from a BST has three cases depending on if the element is in a:

1. leaf node
2. node with one child
3. node with two children

### Removing an element in a leaf node

If the element to be removed is a leaf node $n$ then we can simply remove the leaf node by finding the parent of $n$ and setting the appropriate child reference to `null`. A special case occurs when $n$ is the root node because the root node has no parent.

In the example shown below, deleting the element 93 is done by finding the parent of node 93 (node 83) and then setting the right child of node 83 to `null`.

![Remove 93](../resources/images/bst-2/Slide24.PNG)

![Remove 93](../resources/images/bst-2/Slide25.PNG)

**Exercise 13** What do we do in the special case where removing a leaf node removes the root node?

### Removing an element in a node with one child

If the element to be removed is in a node $n$ with one child $m$ then we can simply replace $n$ with its child $m$. This requires finding the parent of $n$ and the setting the appropriate child reference to $m$. As with removing a leaf node, a special case occurs when $n$ is the root node because the root node has no parent.

In the example shown below, deleting the element 83 is done by finding the parent of node 83 (node 73) and then setting the right child of node 73 to refer to the child of 83 (node 74).

![Remove 83](../resources/images/bst-2/Slide27.PNG)

![Remove 83](../resources/images/bst-2/Slide28.PNG)

**Exercise 14** Explain why replacing $n$ with its child node $m$ preserves the properties of a BST.

### Removing an element in a node with two children

Removing an element in a node with two children is performed by replacing the node with its inorder predecessor or successor using the following steps:

1. call the node to be removed $n$
2. find the inorder predecessor or inorder successor
    * call this node $m$
3. copy $m.elem$ into $n$ (overwriting $n.elem$)
4. remove $m$

**Exercise 15** Explain why replacing $n$ with its inorder predecessor preserves the properties of a BST.

**Exercise 16** Explain why replacing $n$ with its inorder successor preserves the properties of a BST.

**Exercise 17** Is there a special case if $n$ is the root node of the tree?

**Exercise 18** Explain why removing the inorder predecessor/successor is easier than removing $n$.

In the example shown below, deleting the element 50 is done by finding the successor node (node 51), copying the element from the successor node into node 50, and then removing the successor node.

![Remove 50](../resources/images/bst-2/Slide30.PNG)

![Remove 50](../resources/images/bst-2/Slide31.PNG)

![Remove 50](../resources/images/bst-2/Slide32.PNG)

![Remove 50](../resources/images/bst-2/Slide33.PNG)

![Remove 50](../resources/images/bst-2/Slide34.PNG)

**Exercise 19** Draw the BST shown above after removing node 51 using the inorder predecessor.

The BST element removal algorithm can be described as follows:

```
remove(elem) {
    BinaryNode n = getNode(elem)
    if (n == null) {
        do nothing, elem is not in the tree
    }
    else {
        removeNode(n)
    }
}

removeNode(node) {
    if (node.isLeaf()) {
        replaceNode(node, null)            // replace the leaf node with null
    }
    else if (node.hasOneChild()) {
        if (node.hasLeft()) {
            replaceNode(node, node.left)   // replace node with its left child
        }
        else {
            replaceNode(node, node.right)  // replace node with its right child
        }
    }
    else {
        BinaryNode m = inorderPredecessor(node)   // swap with predecessor and remove predecessor
        node.elem = m.elem;
        removeNode(m)
    }
}

replaceNode(node, replace) {
	BinaryNode parent = node.parent
	if (parent == null) {
		// replacing the root node
		tree root = replace
	}
	else if (node == parent.left) {
		parent.setLeft(replace)
	} 
    else {
		parent.setRight(replace)
	}
}
```

The `replaceNode` algorithm replaces the node `node` with the node `replace` adjusting the appropriate reference in the parent of `node` to refer to the replacement node.

## Complexity

The algorithms for adding an element, searching for an element, and removing an element in a BST all start at the root node and, in the worst case, follow a path to a leaf node. The complexity of these operations is $\Theta(h)$ where $h$ is the height of the tree.

The worst-case height of a tree having $n$ nodes is an element of $\Theta(n)$ which means the worst-case complexity of adding, searching, and removing an element from a BST is $\Theta(n)$.

In a balanced BST, the height of a tree having $n$ nodes is an element of $\Theta(\log n)$ which means the complexity of adding, searching, and removing an element from a balanced BST is $\Theta(\log n)$. Obviously, it is desirable to try to keep a BST as balanced as possible.

**Exercise 20** A list of $n$ elements can be sorted by adding the elements to an empty BST and then performing an in-order traversal. What is the best-case and worst-case complexity of such a sorting algorithm?