# Binary Search Trees

Trees are one of the most fundamental data structures in all of computer science.

They allow us to easily and efficiently store and search for data.

We've seen trees before, in the context of algorithms! You already know that they beautifully illustrate the behavior of algorithms and can allow us to elegantly and rigorously analyze thei runtimes.

In the context of data structures, a **Tree** is a linked data structure consisting of nodes. A **node** contains and element and references to 0 or more child nodes.

<img src = "figures/BST-00.jpeg" width = "75%">

In contrast to our algorithms discussions, here we are using a tree to store information rather than represent the behavior of algorithms.

As you already know from our algorithms discussion, the **root** is at the top of the tree. Nodes with 0 children are **leaves**.

A **binary tree** has a branching factor of 2. Put another way, every node has at most two children.

The **height** of a tree is the number of levels in it. The height of a binary tree is $O(\lg n)$.

# Binary Search Trees

A Binary Search Tree is a binary tree with a special search property.

**The Binary Search Tree Property**

For every node in the tree, all elements in its left subtree are less than it, and all element in its right subtree are greater than (or equal to) it.

What about duplicates? Normally, duplicates are placed in the right subtree.



# Implementation

BSTs are naturally recursive.

A BST contains its root element and two subtrees: its left and right subtree. A subtree is just a tree.

Attributes:
- root element
- left subtree
- right subtree

```python
class BST:
    def __init__(self, element):
        self.element = element
        self.left = None
        self.right = None
```

## Operations

Fundamental BST operations include:

- `contains`: search for a `key`, return `True` if that `key` is in the tree, `False` otherwise.
- `insert`: insert a new element into the tree, obeying the BST property
- `remove`: remove an element from the tree, maintaining the BST property
- `traverse`: iterate over the tree, processing each element



## contains

Binary **Search** Trees have their name for a purpose. We'll start with the contains method to see how we can search through BSTs.

Suppose are are given a BST:

<img src = "figures/BST-00.jpeg" width = "75%">

First off, we can verify that this is a BST by ensuring that the BST property is satisfied for every node. 

> The BST Property:
>
> For every node in the tree, all elements in its left subtree are less than it, and all element in its right subtree are greater than (or equal to) it.

The BST property is satisfied for this tree.

For example, the tree rooted at $3$ is a BST since $0 < 3$ and $5 > 3$.

It is easy enough to see that the tree rooted at $17$ also satisfies the property.

We can verify the same for $8$ as well. $0$, $3$, and $5$ are all less than it, and $9$, $17$, and $21$ are all greater than it.

### Searching in a BST

To search in this tree, we can use the BST property to our advantage.

If we want to search for $9$, we start at the root. Since $9 > 8$, we continue our search in $8$'s right subtreee. Since $9 < 17$, we search in $17$'s left subtree. There we find $9$.

<img src = "figures/BST-search.jpeg" width = "75%">

If we were to search for an element not in the tree, eventually we would end up in a leaf and have no where else to search. In that case, we would know that it must not be in a tree.

### Implementation

We can use recursion! Depending on the direction we need to continue our search, recursively call search on that subtree. 

```python
def contains(self, key):
    if key < self.element:
        # if we should go left, but there is no
        # left subtree, key not in tree
        if self.left == None:
            return False
        # continue search in left subtree
        return self.left.contains(key)
    else:
        # if we should go right, but there is no
        # right subtree, key not in tree
        if self.right == None:
            return False
        # continue search in left subtree
        return self.right.contains(key)
```

### Runtime

We could write a recurrence for this, but we can do an even simpler analysis. In the worst case, we have check a single node on each level. Each check is a constant time operation so the runtime of `contains` is equivalent to the depth of the tree.

Assuming that the tree is balanced, that is that every node has roughly the same number of children in its left and right subtrees:

`contains()` $\in O(\lg n)$

## insert

Having discussed `contains` we already have a key idea, inserting will be pretty similar.

To insert an element, we need to:

1. search for the location where the element should belong
2. create a new BST containing it and assign it there

For example, suppose we want to insert 11 into our BST:



<img src = "figures/BST-insert.jpeg" width = "75%">

Insertion is very similar to searching, except at the end we create a new subtree rather than return True or False.

### Implementation

```python
def insert(self, element):
    if element < self.element:
        # if the element belongs in the left subtree,
        # but there is nothing there, put this element there
        if self.left == None:
            self.left = BST(element)
        # recursively insert into the left subtree
        return self.left.insert(element)
    else:
        # if the element belongs in the right subtree,
        # but there is nothing there, put this element there
        if self.right == None:
            self.right = BST(element)
        # recursively insert into the left subtree
        return self.right.insert(element)
```

### Runtime

The runtime for `insert` is the same as `contains`.

If that the tree is balanced:

`insert()` $\in O(\lg n)$

## Remove

Removal from a BST is interesting, but I will omit a detailed discussion here in lieu of moving forward to tree traversals.

I will only note here that the runtime of `remove`, like `insert` is proportional to the height of the tree.

If the tree is balanced:

`remove()` $\in O(\lg n)$.

# Tree Traversals

A tree traversal allows us to visit for access every element in a tree. 

There are 3 tree traverals:
- preorder traversal
- inorder traversal
- postorder traversal

We implement traversals recursively. 

The difference between the three traversals is the when the processing of the element is performed with respect to the traversing of its subtrees.

Here is the pseudocode for the three traversals, using `print` to allow us to see that we are visiting each element.

**preorder traversal**
```python
print(self.element)
preorder_traverse(self.left)
preorder_traverse(self.right)
```

**inorder traversal**
```python
inorder_traverse(self.left)
print(self.element)
inorder_traverse(self.right)
```

**postorder traversal**
```python
postorder_traverse(self.left)
postorder_traverse(self.right)
print(self.element)
```

## Performing an inorder traversal

An inorder traversal is the easiest to understand initially.

We can start on a tree with a single element.



<img src = "figures/BST-single.jpeg" width = "55%">

**inorder traversal**
```python
inorder_traverse(self.left)
print(self.element)
inorder_traverse(self.right)
```

To perform an inorder traveral on $8$, we first try to traverse its left subtree. There is nothing there. We then print $8$. Then we try to traverse its right subtree. There is nothing there. We are done.

The output of the traveral is:

`8`

Ok, here is a more involved example:

<img src = "figures/BST-traverse-3.jpeg" width = "55%">

To traverse this tree, we start at $8$.

1. Traverse its left subtree
    - we are now at $3$
2. Try to traverse $3$'s left subtree
    - nothing there
3. print $3$.
4. Try to traverse $3$'s right subtree
    - nothing there, done at $3$
5. Return to $8$, print $8$.
6. Traverse $8$'s right subtree
    - now at $17$
7. Try to traverse $17$'s left subtree
    - nothing there
8. print $17$.
9. Try to traverse $17$'s right subtree
    - nothing there, done at $17$
10. return to $8$. done.

Final output:

`3 8 17`

You will note that it visits all elements in order.

With this, I think you get the idea. Walk through the process yourself on larger trees to verify your understanding.

You can also apply this same process to understand the behavior of the other two traversals. Their output isn't as intuitive as the inorder traverse, but they turn out to be very useful. We'll see more about that in this section's video.

# Balanced Trees

In each of our runtime analysis above, we explicitly noted that if the tree is balanced, the runtimes of our three major operations - `contains`, `insert`, and `remove` - are all in $O(\lg n)$.

Our tree however, is not inherently balanced. That is, there is nothing to stop it from becoming unbalanced.

For example, given our rules for insertion, what tree do you get if you insert the following list of numbers, in order, into an empty BST?

`1 2 3 4 5 6`

Pause and draw it!

You should note that it definitely is NOT balanced.

### Actual Balanced BSTs

We've been discussing the basic BST, but there are Binary Search Trees that are balanced.

If you are interested, a few examples of Balances BSTs include:

- Red-Black Trees
- AVL Trees
- B-Trees

My favorite, if you are curious is the AVL tree. It balances trees with "rotations", and that is fun.

B-trees are also super cool. They are used to implement efficient access in slow structures such as file systems and databases.