# Balanced Search Trees

**Balanced search trees** are an implementation of symbol tables (with comparable keys) that guarantee efficient operations of search, insert, delete, max, min, rank, floor, ceiling, and select.

## 2-3 Search Trees

Recall the goal for symbol table implementations is $\lg N$ for all operations. **2-3 Trees**, which are left-leaning red-black BSTs, are an old implementation to do this. They allow 1 or 2 keys per node, so there's a **2-node** (one key, two children) or a **3-node** (two keys, three children). The 2-node has two links - one to keys less than the node key and one for keys greater. The 3-node has three links - one for less, one for between, and one for greater.

2-3 trees also have **perfect balance**, so every path from the root to a null link has the same length. They also have **symmetric order** so an in-order traversal (follow left-most paths to keys)  yields the keys in ascending order.

To **insert**, you first search for the key. The easy case is if you end at a 2-node at the bottom, then you just replace that 2-node with a 3-node containing the new inserted key with what was in that 2-node, and add a null link for the third child. To insert a new key to a 3-node at the bottom, first create a temporary 4-node, then move the middle key in the 4-node into the parent. The parent becomes a 3-node, and the 2-node child is split so the children are re-linked (the smaller key becomes the new middle link of the parent and the larger key becomes the right link). If the parent were already a 3-node, it would become a temporary 4-node and that process would propagate up the tree. The only time the height of a 2-3 tree grows is when the root was a 3-node and the process reaches it, so the root has to split.

Splitting a 4-node is a **local** transformation - there are a constant number of operations and they don't touch the subtrees, no matter how many keys are below where the split happens. Each transformation maintains symmetric order and perfect balance.

**Tree height** worst case is $\lg N$ (with all 2-nodes), or best case $\log_{3} N \approx 0.631 \lg N$ (with all 3-nodes). This guarantees **logarithmic** performance for search and insert.

**Implementation** is complicated - you could do it, but there's a better way:
- Maintaining multiple node types is cumbersome
- Need multiple compares to move down the tree
- Need to move back up the tree to split 4-nodes
- Large number of cases for splitting


## Red-Black BSTs

To Come.

## Summary

The worst case (WC) is after $N$ inserts, and the average case (AC) is after $N$ random inserts.

| Implementation | WC Search | WC Insert | WC Delete | AC Search | AC Insert | AC Delete | Ordered Iteration? |
| --- | --- | --- | --- | --- | --- | --- | --- |
| Sequential Search (unordered list) | $N$ | $N$ | $N$ | $N/2$ | $N$ | $N/2$ | No |
| Binary Search (ordered array) | $\lg N$ | $N$ | $N$ | $\lg N$ | $N/2$ | $N/2$ | Yes |
| Binary Search Tree (BST) | $N$ | $N$ | $N$ | $1.39 \lg N$ | $1.39 \lg N$ | ? | Yes |
| 2-3 Tree | $c \lg N$ | $c \lg N$ | $c \lg N$ | $c \lg N$ | $c \lg N$ | $c \lg N$ | Yes |
