In [None]:
! jupyter nbconvert "Binary Search Trees.ipynb" --Exporter.preprocessors jupybeans.RemoveSkip --to html_embed 

# Binary Search Trees

*And all this time I thought "BST" stood for some kind of sandwich* 🥪

Binary Search Trees
- Only two children, specific meaning
- Inorder traversal
- Searching for a value
- Big O for search

A **Binary Search Tree (BST)** is a tree with a few additional qualities:

- There are no duplicate values in the tree
- Each node has at most two children:
  - A `left` subtree whose maximum value is smaller than the current value
  - A `right` subtree whose minimum value is larger than the current value


In [None]:
%%file build_test.sh
cat << EOF > test.txt
5 > 3
3 > 2
3 < 4
2 > 1
5 < 7
7 > 6
7 < 9
9 > 8
EOF
python3 ~/projects/btrees/btrees.py test.txt

<img src='test.png' />

- Each node has at most two children
  - The left child is always smaller than the node value
  - The right child is always greater than the node value
- There are no duplicates

- What is the maximum value of the left subtree of 5?
- What is the minimum value of the right subtree of 7?

## Traversing a BST

With a generic tree, the order you iterate through children is undefined.

For a BST, you always iterate from `left` to `right`.

What is the preorder (DFS) traversal of this tree?

<img src='test.png' />

5, 3, 2, 1, 4, 7, 6, 9, 8

What is the postorder (DFS) traversal of this tree?

<img src='test.png' />

1, 2, 4, 3, 6, 8, 9, 7, 5

With BSTs, there is another kind of traversal available: **in-order** (a.k.a *inorder*)

You process the left child before the node before the right child.

I.E. `left`, `this`, `right`

What is the inorder traversal of our binary tree?

<img src='test.png' />

1, 2, 3, 4, 5, 6, 7, 8, 9

<div class='big centered' style='font-size: 100pt'> 🤯 </div>

<div class='big centered' style='font-size: 100pt'> 😮 🤯 🤭 🤔 😵 😶‍🌫️ 😴 </div>

## Code

In [1]:
#include <string>
using std::string;

#include <iostream>
using std::cout, std::endl;

#include <sstream>
using std::stringstream;

#include <vector>
using std::vector;

In [2]:
template<class T>
struct Node {
    T value;
    Node<T>* left;
    Node<T>* right;
    Node(T value) : value(value), left(nullptr), right(nullptr) {}
};

In [3]:
template<class T>
void _inorder_to_string(Node<T>* const& node, string const& prefix, vector<string>& lines) {
    if (node == nullptr) {
        return;
    }
    
    // Print left
    _inorder_to_string(node->left, prefix + "  ", lines);
    
    // Print node
    stringstream ss;
    ss << prefix << node->value;
    lines.push_back(ss.str());
    
    // Print right
    _inorder_to_string(node->right, prefix + "  ", lines);
}


template<class T>
string inorder_to_string(Node<T>* const& node) {
    vector<string> lines;
    _inorder_to_string(node, "", lines);
    
    // Transpose!
    int max_len = 0;
    for (auto line : lines) {
        max_len = (max_len < line.length()) ? line.length() : max_len;
    }
    
    char** out = new char*[max_len];
    for (int i = 0; i < max_len; i++) {
        out[i] = new char[lines.size()];
    }
    
    stringstream ss;
    for (int i = 0; i < max_len; i++) {
        for (int j = 0; j < lines.size(); j++) {
            if (i < lines[j].length()) {
                ss << lines[j][i];
            } else {
                ss << ' ';
            }
        }
        ss << endl;
    }
    
    return ss.str();
}

In [4]:
template<class T>
void print_inorder(Node<T>* const& node) {
    cout << inorder_to_string(node) << endl;
}

In [5]:
Node<string>* M = new Node<string>("M");

M->left = new Node<string>("G");
M->left->left = new Node<string>("A");
M->left->right = new Node<string>("J");

M->right = new Node<string>("T");
M->right->left = new Node<string>("Q");

In [6]:
print_inorder(M)

   M  
      
 G   T
      
A J Q 



## How can we tell if a value is in our BST?

Perform a depth-first-search!

- If the current node value is the search value, return true.
- If the search value is less than our current value, search the left subtree.
- If the search value is greater than our current value, search the right subtree.
- If I'm supposed to search the left or right subtree but there is no node there, return false.

- What is my base case?
- Do we progress towards the base case?
- Is the induction correct?

- What is my base case?
  - If the current node is the search value
  - If I'm supposed to recurse but there is no node to recurse onto
- Do we progress towards the base case?
  - Yes, at each step I move further down the tree.
  - At some point I'll find my value or run out of tree.
- Is the induction correct?
  - If the search value is equal to the current value I'm done.
  - Every value in the left subtree is less than the current value.
  - Every value in the right subtree is greater than the current value.
  - If the search value is greater than the current value, then if it is in the tree it will be found in the right subtree and we continue our search there.
    - Symmetric logic for the left side
  - So, if the problem isn't solved on step $k$, we progress towards a solution with step $k+1$.


In [7]:
template<class T>
Node<T>* search_bst(Node<T>* const& node, T item) {
    // If current node is null or the value is the same, return current
    if (node == nullptr || node->value == item) {
        return node;
    }
    
    // Check left
    if (item < node->value) {
        return search_bst(node->left, item);
    } else {
        // Check right
        return search_bst(node->right, item);
    }
}

In [9]:
auto node = search_bst<string>(M, "Z");

if (node != nullptr) {
    cout << node->value << endl;
} else {
    cout << "nullptr" << endl;
}

nullptr


## BST Big O

What is the Big-O for BST search?

At each step we cut out half of the search values.

How many times can I cut $n$ values in half before I run out of values?

$O(\log n)$

We've observed the following properties about BSTs:

- $O(\log n)$ search time
- Does not permit duplicate values
- Prints out contents in sorted order (using the inorder traversal)

What data structure have we seen with these qualities?

A **set**!

<div style='font-size: 200px'> 🤯 </div>

Sets (i.e. *ordered* sets) are implemented with a BST.

But wait, you say.

You can add and remove items from a set. How do we add to and remove from a BST?

## Adding Items to a BST

- Let the root be the current node.
- While the current node is not null:
  - If the item is equal to the current node value, return false (item not added)
  - If the item is greater than the current node value, let current = current.right
  - If the item is less than the current node value, let current == current.left
- Insert the item at the current location

In [None]:
%%file build_first.sh
cat << EOF > first.txt
5 > 3
3 < 4
5 < 7
EOF
python3 ~/projects/btrees/btrees.py first.txt

What do I get when I insert 2?

<img src="first.png"/>

In [None]:
%%file build_first_p2.sh
cat << EOF > first_p2.txt
5 > 3
3 < 4
5 < 7
3 > 2
EOF
python3 ~/projects/btrees/btrees.py first_p2.txt

- 2 < 5, go left
- 2 < 3, go left
- No node here, add it!

<img src="first_p2.png"/>

What happens when I insert 8?

In [None]:
%%file build_first_p2_p8.sh
cat << EOF > first_p2_p8.txt
5 > 3
3 < 4
5 < 7
3 > 2
7 < 8
EOF
python3 ~/projects/btrees/btrees.py first_p2_p8.txt

- 8 > 5, go right
- 8 > 7, go right
- No node here, add it!

<img src="first_p2_p8.png"/>

What happens when I insert 4?

- 4 < 5, go left
- 4 > 3, go right
- 4 == 4, return false

In [10]:
template<class T>
bool add(Node<T>*& node, T item) {
    if (node == nullptr) {
        node = new Node<T>(item);
        return true;
    }
    if (node->value == item) {
        return false;
    }
    if (item < node->value) {
        return add(node->left, item);
    } else {
        return add(node->right, item);
    }
}

In [11]:
print_inorder(M)

   M  
      
 G   T
      
A J Q 



In [12]:
add<string>(M, "Z")

true

In [13]:
add<string>(M, "Z")

false

In [14]:
add<string>(M, "a")

true

In [15]:
print_inorder(M)

   M    
        
 G   T  
        
A J Q Z 
        
       a



### Big-O for Add

What is the big-O complexity for add?

At each iteration, I halve the size of the tree I still need to search, so $O(\log n)$

### Can I use a BST to sort and find data?

Sure!

What would the big-O complexity be for such an algorithm?

Time to add an item: $O(\log n)$

How many items to I have to add? $O(n)$

Total complexity: $O(n \log n)$

### Key Ideas

- Adding a node preserves the structure of the BST
- Adding a node takes $O(\log n)$ time


## Removing from a BST

<img src="first_p2_p8.png"/>

If I want to remove 4, what do I do?

- Find 4, delete it, and set 3->right to `nullptr`

What if I want to remove 7?

- Find 7, delete it, and have 5 point to 8.

<img src="first_p2_p8.png"/>

What if I want to delete 3?

Which node should take its place?

What if I want to delete 5?

Which node should take its place?

- When a node has no children, it's easy: just delete the node.
- When a node has only one child, it's easy: just replace the node with the child.
- When a node has two children (and possibly decendents!), we need a little more decision making.

### Inorder Predecessor

When removing a node with two children, we want to replace the node with the **inorder predecessor**.

I.E the node that would come immediately before the removed node in an inorder traversal.

**The inorder predecessor is the largest node in the left subtree.**

In [None]:
%%file build_big.sh
cat << EOF > big.txt
10 > 6
6 > 3
3 > 1
3 < 4
4 < 5
1 < 2
1 > 0
    
6 < 8
8 > 7
    
10 < 15
15 > 12
12 < 14
14 > 13

15 < 18
18 > 17
17 > 16
18 < 19
EOF
python3 ~/projects/btrees/btrees.py big.txt

<img src="big.png"/>

What is the inorder predecessor for 1?

**0**

What is the inorder predecessor for 3?

**2**

What is the inorder predecessor for 6?

**5**

What is the inorder predecessor for 15?

**14**

What is the inorder predecessor for 10?

**8**

In [16]:
template<class T>
Node<T>* inorder_predecessor(Node<T>* const& node) {
    // Pass in the node, not the left child
    Node<T>* iop = node->left;
    while (iop->right != nullptr) {
        iop = iop->right;
    }
    return iop;
}

In [17]:
print_inorder(M)

   M    
        
 G   T  
        
A J Q Z 
        
       a



In [18]:
auto node = inorder_predecessor(M);
cout << node->value << endl;

J


## Removing from a BST

```
bool remove(node, value):
    if node is null, return false
    
    if node->value == value:
        if node->left == null:
            node = node->right
```
```
        else if node->right == null:
            node = node->left
        else:
            iop = get_inorder_predecessor(node)
            node->value = iop->value
            remove(node->left, iop->value)
        return true
```
```
    else if value < node->value:
        return remove(node->left, value)
        
    else:
        return remove(node->right, value)
```        

What will the tree look like after removing **17**?

<img src="big.png"/>

In [None]:
%%file build_big_no17.sh
cat << EOF > big_no17.txt
10 > 6
6 > 3
3 > 1
3 < 4
4 < 5
1 < 2
1 > 0
    
6 < 8
8 > 7
    
10 < 14
14 > 12
12 < 13

14 < 18
18 > 16
18 < 19
EOF
python3 ~/projects/btrees/btrees.py big_no17.txt

<img src="big_no17.png" />

What will the tree look like after removing **6**?

<img src="big.png"/>

In [None]:
%%file build_big_6.sh
rm -f big_6.png
cat big.txt.dot | sed -e 's/}/  n5 -> n6 [color="red"]\n}/' | neato -Tpng > big_6.png

<img src="big_6.png" />

In [None]:
%%file build_big_no6.sh
cat << EOF > big_no6.txt
10 > 5
5 > 3
3 > 1
3 < 4
1 < 2
1 > 0
    
5 < 8
8 > 7
    
10 < 14
14 > 12
12 < 13

14 < 18
18 > 17
17 > 16
18 < 19
EOF
python3 ~/projects/btrees/btrees.py big_no6.txt

<img src="big_no6.png" />

What will the tree look like after removing **15**?

<img src="big.png"/>

In [None]:
%%file build_big_14.sh
rm -f big_14.png
cat big.txt.dot | sed -e 's/}/  n14 -> n15 [color="red"]\n}/' | neato -Tpng > big_14.png

<img src="big_14.png" />

In [None]:
%%file build_big_no15.sh
cat << EOF > big_no15.txt
10 > 6
6 > 3
3 > 1
3 < 4
4 < 5
1 < 2
1 > 0
    
6 < 8
8 > 7
    
10 < 14
14 > 12
12 < 13

14 < 18
18 > 17
17 > 16
18 < 19
EOF
python3 ~/projects/btrees/btrees.py big_no15.txt

<img src="big_no15.png" />

In [None]:
template<class T>
bool remove(Node<T>*& node, T item) {
    if (node == nullptr) {
        return false;
    }
    if (node->value == item) {
        if (node->left == nullptr) {
            auto tmp = node;
            node = node->right;
            delete tmp;
        } else if (node->right == nullptr) {
            auto tmp = node;
            node = node->left;
            delete tmp;
        } else {
            Node<T>* iOP = getInorderPredecessor(node->left);
            node->value = iOP->value;
            remove(node->left, iOP->value);
        }
        return true;
    }
    if (item < node->value) {
        return remove(node->left, item);
    } else {
        return remove(node->right, item);
    }
}

### Big-O complexity of remove

What is the time complexity of the remove function?

At each iteration I do constant work.

I have at most $\log n$ iterations (I will have to potentially search down to the lowest leaf during the search for an inorder predecessor). 

$O(\log n)$

### Key Ideas

- When removing a node:
  - If the node has no children, just remove it
  - If the node has only one child, just replace it with the child
  - If the node has two children, replace its value with the value of its inorder predecessor and remove the inorder predecessor

## Pathologic BST

What kind of tree do I get when I add 1, 2, 3, 4, 5, 6, 7, 8, 9, in that order?

In [19]:
auto tree = new Node<int>(1);
for (int i = 1; i <= 9; i++) {
    add(tree, i);
}
print_inorder(tree)

1        
         
 2       
         
  3      
         
   4     
         
    5    
         
     6   
         
      7  
         
       8 
         
        9



What is the big-O search complexity for this tree?

$O(n)$!

### Key Ideas

- The $O(\log n)$ lookup depends on the tree having a balanced structure.


## Key Ideas for a BST

- $O(\log n)$ complexity for search, add, and remove
  - But only if the tree stays balanced!
- BST is the underlying structure used to implement `set` and `map`