# Balanced Search Trees

**Balanced search trees** are an implementation of symbol tables (with comparable keys) that guarantee efficient operations of search, insert, delete, max, min, rank, floor, ceiling, and select.

## 2-3 Search Trees

Recall from the previous section on Elementary Symbol Tables that the goal for symbol table implementations was $\lg N$ for all operations. **2-3 Trees**, which are left-leaning red-black BSTs, are an old implementation to do this. They allow 1 or 2 keys per node, so there's a **2-node** (one key, two children) or a **3-node** (two keys, three children). The 2-node has two links - one to keys less than the node key and one for keys greater. The 3-node has three links - one for keys less than the smaller key, one for keys between the two keys, and one for keys greater than the larger key.

2-3 trees also have **perfect balance**, so every path from the root to a null link has the same length. They also have **symmetric order** so an in-order traversal (follow left-most paths to keys)  yields the keys in ascending order.

To **insert**, you first search for the key. The easy case is if you end at a 2-node at the bottom, then you just replace that 2-node with a 3-node containing the new inserted key with what was in that 2-node, and add a null link for the third child. To insert a new key to a 3-node at the bottom, first create a temporary 4-node, then move the middle key in the 4-node into the parent. The parent becomes a 3-node, and the 2-node child is split so the children are re-linked (the smaller key becomes the new middle link of the parent and the larger key becomes the right link). If the parent were already a 3-node, it would become a temporary 4-node and that process would propagate up the tree. The only time the height of a 2-3 tree grows is when the root was a 3-node and the process reaches it, so the root has to split.

Splitting a 4-node is a **local** transformation - there are a constant number of operations and they don't touch the subtrees, no matter how many keys are below where the split happens. Each transformation maintains symmetric order and perfect balance.

**Tree height** worst case is $\lg N$ (with all 2-nodes), or best case $\log_{3} N \approx 0.631 \lg N$ (with all 3-nodes). This guarantees **logarithmic** performance for search and insert.

**Implementation** is complicated (see the red-black BST option below instead):
- Maintaining multiple node types is cumbersome
- Need multiple compares to move down the tree
- Need to move back up the tree to split 4-nodes
- Large number of cases for splitting

**Example of inserting in a 2-3 search tree:**

![Wikipedia 2-3 search tree example](https://upload.wikimedia.org/wikipedia/commons/thumb/4/44/2-3_insertion.svg/581px-2-3_insertion.svg.png)

Source: Wikipedia

## Red-Black BSTs

Red-black BSTs are simple data structures that help implement 2-3 trees with very little extra code beyond the basic binary search tree. The idea is to represent a 2-3 tree as a binary search tree, and use "internal" left-leaning links as "glue" for 3-nodes. So the larger of the two keys in a 3-node will be the root in this subtree - its right link goes to keys larger than it and its left link (colored red) connects to the smaller of the two original keys. That key is now a 2-node with a left link to keys smaller and right link to keys that were between the original two keys.

**Black links** connect 2-nodes and 3-nodes, **red links** "glue" nodes within a 3-node.

Some characteristics:
- No node has two red links connected to it
- Every path from the root to a null link has the same number of black links (**perfect black balance**)
- Red links lean left

There's a 1-1 correspondence of left-leaning red-black (LLRB) BSTs and 2-3 trees (think of the red links as horizontal ones, and it looks like a 2-3 tree. You can use the same search code from elementary BST, just ignore the color. It actually runs faster because of better balance. Most other operations (ceiling, selection) are also identical.

Because each node is pointed to by precisely one link (its parent), you can encode the color of the links as data in the node (e.g. `node.left.color == 'RED'` or `node.right.color == 'BLACK'`) and null links are black.

In [18]:
# Node object from BST notes, ignores non-critical methods
class Node:
    def __init__(self, key, val, left=None, right=None, count=1, color='BLACK'):
        self.key = key
        self.val = val
        self.left = left
        self.right = right
        self.count = count  # Number of nodes in subtree including self
        self.color = color  # NEW FOR LLRB BST - color of parent link

    # NEW FOR LLRB BST
    def is_red(self, node_x):
        if node_x is None:
            return False
        return self.color == 'RED'
    
    def is_black(self, node_x):
        if node_x is None:
            return True  # Null links are black
        return self.color == 'BLACK'

In [17]:
n = Node(1, 1)
n.right = Node(4, 4)
print(n.has_right_child())
print(n.has_left_child())
# print(n.is_red(n.left))
# print(n.is_black(n.right))

True
False


### Red-Black BST Operations

A new operation for red-black BSTs is a **rotation**. During an insertion operation, sometimes you end up with a right-leaning red link (the wrong direction). A rotation will re-orient the link so it leans to the left. Rotations maintain symmetric order and perfect black balance.

**Left Rotation Java Implementation:**

![Example of a right-to-left rotation. Source: Princeton.edu](https://algs4.cs.princeton.edu/33balanced/images/redblack-left-rotate.png)

Sometimes during an insertion, you'll need to temporarily rotate links to have a red right-leaning link before rotating it left. The rotation implementation is similar (see LLRB_BST class for implementation below).

**Right Rotation Java Implementation:**

![Example of a left-to-right rotation. Source: Princeton.edu](https://algs4.cs.princeton.edu/33balanced/images/redblack-right-rotate.png)

Another operation is called a **color flip**, which you use to re-color the local links to split a temporary 4-node. You don't need to change any links, but the parent key will have two red links (both left and right), and will be black itself. You flip the colors so the parent is red and both its links are black.

**Color Flip Java Implementation:**

![Example of a red-black BST color flip. Source: princeton.edu](https://algs4.cs.princeton.edu/33balanced/images/color-flip.png)

**Insertions** use these three operations (left rotation, right rotation, and color flip) to maintain a legal red-black BST with 1-1 correspondence to a 2-3 tree. Here are the main scenarios:

1. Insert into a tree with exactly 1 node
    - **Left:** search ends at the left null link, create a red link to the new node (converts a 2-node into a 3-node)
    - **Right:** search ends the right null link, you attach a new node with a red link on the right, then rotate left to make a legal 3-node
    - **Generalization:** (insert into a 2-node at the bottom) you do a standard BST insert and color the new link red. If it's on the right, rotate left

2. Insert into a tree with 2 nodes (see image). The generalization is to insert into a 3-node at the bottom
    - Do standard BST insert, color the new link red
    - Rotate to balance the 4-node (if needed)
    - Flip colors to pass the red link up one level
    - Rotate to make left-leaning (if needed)

![Example inserting into a tree with 2 nodes](https://x-wei.github.io/images/algoI_week5_1/pasted_image026.png)

The same code handles all cases:
- Right child red, left child black -> rotate left
- Left child, left-left grandchild red -> rotate right
- Both children red -> flip colors

In [26]:
# Full Left-leaning Red-Black Binary Search Tree class
class LL_Red_Black_BST:
    def __init__(self):
        self.root = None

    def size(self):
        return self._size(self.root)

    def _size(self, node_x):
        if node_x is None:
            return 0
        else:
            return node_x.count

    def put(self, key, val):
        # Insert a new key-value node in the BST
        self.root = self._put(self.root, key, val)

    # Insertion code - NEW FOR LLRB BST
    def _put(self, node_h, key, val):
        if (node_h is None):
            return Node(key, val, color='RED')
        if key < node_h.key:
            node_h.left = self._put(node_h.left, key, val)
        elif key > node_h.key:
            node_h.right = self._put(node_h.right, key, val)
        else:
            node_h.val = val

        if (node_h.is_red(node_h.right) and node_h.is_black(node_h.left)):
            node_h = self.rotate_left(node_h)
        if (node_h.is_red(node_h.left) and node_h.is_red(node_h.left.left)):
            node_h = self.rotate_right(node_h)
        if (node_h.is_red(node_h.left) and node_h.is_red(node_h.right)):
            self.flip_colors(node_h)
        return node_h

    # Rotate left - NEW FOR LLRB BST
    def rotate_left(self, node_h):
        # node_h is parent with right-leaning red link to node_x
        # Rotates the cluster so node_x is new parent with red link to node_h
        node_x = node_h.right
        node_h.right = node_x.left  # Move middle keys over so they're h's right link
        node_x.left = node_h
        node_x.color = node_h.color  # Assign h's original color to x
        node_h.color = 'RED'
        node_x.count = node_h.count  # Move h's size to x, its new parent
        node_h.count = 1 + self._size(node_h.left) + self._size(node_h.right)
        return node_x

    # Rotate right - NEW FOR LLRB BST
    def rotate_right(self, node_h):
        # node_h is parent with two left red links in a row (child, grandchild)
        # Rotates to the right
        node_x = node_h.left
        node_h.left = node_x.right  # Move middle keys over so they're h's left link
        node_x.right = node_h
        node_x.color = node_h.color  # Assign h's original color to x
        node_h.color = 'RED'
        node_x.count = node_h.count  # Move h's size to x, its new parent
        node_h.count = 1 + self._size(node_h.left) + self._size(node_h.right)
        return node_x

    # Color flip - NEW FOR LLRB BST
    def flip_colors(self, node_h):
        # Flip color of parent and two child node links
        assert node_h.is_black(node_h)
        assert node_h.is_red(node_h.left)
        assert node_h.is_red(node_h.right)
        node_h.color == 'RED'
        node_h.left.color = 'BLACK'
        node_h.right.color = 'BLACK'

    def get(self, key):
        # Returns the value associated with given key if in tree, None otherwise
        node_x = self.root
        while node_x is not None:
            if key < node_x.key:
                node_x = node_x.left
            elif key > node_x.key:
                node_x = node_x.right
            else:
                return node_x.val
        return None

    def delete(self, key):
        # Remove the node for a given key
        self.root = self._delete(self.root, key)

    def _delete(self, node_x, key):
        if node_x is None:
            return None
        if key < node_x.key:
            node_x.left = self._delete(node_x.left, key)
        elif key > node_x.key:
            node_x.right = self._delete(node_x.right, key)
        else:
            if node_x.right is None:
                # No right child
                return node_x.left
            if node_x.left is None:
                # No left child
                return node_x.right
            
            # Node has two children - replace with successor
            node_t = node_x
            node_x = self._min(node_t.right)
            node_x.right = self._delete_min(node_t.right)
            node_x.left = node_t.left
        
        # Update subtree counts
        node_x.count = 1 + self._size(node_x.left) + self._size(node_x.right)
        return node_x

    def delete_min(self):
        if self.root:
            self.root = self._delete_min(self.root)

    def _delete_min(self, node_x):
        if node_x.left is None:
            return node_x.right
        node_x.left = self._delete_min(node_x.left)
        node_x.count = 1 + self._size(node_x.left) + self._size(node_x.right)
        return node_x

    def floor(self, key):
        # Return the largest key in the BST <= a given key
        node_x = self._floor(self.root, key)
        if node_x is None:
            return None
        return node_x.key
    
    def _floor(self, node_x, key):
        if node_x is None:
            return None
        if key == node_x.key:
            return node_x
        elif key < node_x.key:
            # Floor is in left subtree
            return self._floor(node_x.left, key)
        
        # Floor may be in right subtree or is root of that subtree
        node_t = self._floor(node_x.right, key)
        if node_t is not None:
            return node_t
        else:
            return node_x
    
    def ceiling(self, key):
        # Return the smallest key in the BST >= a given key
        node_x = self._ceiling(self.root, key)
        if node_x is None:
            return None
        return node_x.key
    
    def _ceiling(self, node_x, key):
        if node_x is None:
            return None
        if key == node_x.key:
            return node_x
        elif key > node_x.key:
            # Ceiling is in right subtree
            return self._ceiling(node_x.right, key)
        
        # Ceiling may be in the left subtree or is root of that subtree
        node_t = self._ceiling(node_x.left, key)
        if node_t is not None:
            return node_t
        else:
            return node_x
    
    def get_max(self):
        # Returns the largest key in the tree
        node_x = self.root
        while node_x is not None:
            if node_x.right is not None:
                node_x = node_x.right
            else:
                return node_x.val
        return None
    
    def get_min(self):
        # Returns the smallest key in the tree
        node_x = self.root
        while node_x is not None:
            if node_x.left is not None:
                node_x = node_x.left
            else:
                return node_x.key
        return None
    
    def _min(self, node_x):
        # Return the node with the smallest key in node_x's subtree
        while node_x is not None:
            if node_x.left is not None:
                node_x = node_x.left
            else:
                return node_x
        return None
    
    def rank(self, key):
        # Returns how many keys in the BST are < given key
        return self._rank(self.root, key)
    
    def _rank(self, node_x, key):
        if node_x is None:
            return 0
        if key < node_x.key:
            return self._rank(node_x.left, key)
        elif key > node_x.key:
            return 1 + self._size(node_x.left) + self._rank(node_x.right, key)
        else:
            return self._size(node_x.left)
             
    def __len__(self):
        return self.size()
    
    def __setitem__(self, key, val):
        self.put(key, val)
    
    def __getitem__(self, key):
        return self.get(key)

    def __contains__(self, key):
        if self.get(key):
            return True
        else:
            return False

In [27]:
# Test LLRB BST functionality
llrb = LL_Red_Black_BST()

# Add nodes, value equals the key. Check put operation
print('Add nodes to the BST and get values: check put and get operations')
keys = [5, 3, 8, 1, 4, 6, 10]
for k in keys:
    llrb.put(k, k)

# Check get operation
for k in keys:
    print('Key: {}, Val: {}'.format(k, llrb.get(k)))

# print('\nTree size: {}'.format(llrb.size()))

# print('\nCheck ceiling and floor operations')
# for k in [7, 2, 11]:
#     print('Key: {}\nCeiling: {}\nFloor: {}\n'.format(k, llrb.ceiling(k), llrb.floor(k)))

# print('Check max and min operations')
# print('Max value: {}'.format(llrb.get_max()))
# print('Min value: {}'.format(llrb.get_min()))

# print('\nCheck rank operation (number of keys in the tree < given key)')
# print('Rank of 6: {}'.format(bst.rank(6)))
# print('Rank of 11: {}'.format(bst.rank(11)))
# print('Rank of 0: {}'.format(bst.rank(0)))

# print('\nCheck dunder methods')
# bst[12] = 12  # set item
# print('Add 12 to table. New max value is {}'.format(bst.get_max()))
# print('New tree size: {}'.format(len(bst)))  # len
# print('Value for bst[10] is: {}'.format(bst[10]))  # get item
# print('Is 8 in the table? {}'.format(8 in bst))  # contains
# print('Is 9 in the table? {}'.format(9 in bst))  # contains

# print('\nCheck delete operation')
# bst.delete(1)
# print('Remove node 1. Tree size: {}. New min value: {}'.format(len(bst), bst.get_min()))
# bst.delete(12)
# print('Remove node 12. Tree size: {}. New max value: {}'.format(len(bst), bst.get_max()))
# bst.delete(8)
# print('Remove node 8. Tree size: {}'.format(len(bst)))

Add nodes to the BST and get values: check put and get operations


AssertionError: 

## Summary

The worst case (WC) is after $N$ inserts, and the average case (AC) is after $N$ random inserts.

| Implementation | WC Search | WC Insert | WC Delete | AC Search | AC Insert | AC Delete | Ordered Iteration? |
| --- | --- | --- | --- | --- | --- | --- | --- |
| Sequential Search (unordered list) | $N$ | $N$ | $N$ | $N/2$ | $N$ | $N/2$ | No |
| Binary Search (ordered array) | $\lg N$ | $N$ | $N$ | $\lg N$ | $N/2$ | $N/2$ | Yes |
| Binary Search Tree (BST) | $N$ | $N$ | $N$ | $1.39 \lg N$ | $1.39 \lg N$ | ? | Yes |
| 2-3 Tree | $c \lg N$ | $c \lg N$ | $c \lg N$ | $c \lg N$ | $c \lg N$ | $c \lg N$ | Yes |
