# **Chapter 13: Specialized Trees**

> *"Specialized trees are the craftsman's tools—each designed to solve a particular class of problems with elegance and efficiency."* — Anonymous

---

## **13.1 Introduction to Specialized Trees**

While binary search trees and balanced trees provide general-purpose dynamic set operations, many applications demand data structures tailored to specific query types. This chapter explores trees designed for:

- **Efficient I/O operations** (B-Trees, B+ Trees)
- **Range queries** (Segment Trees, Fenwick Trees, Interval Trees)
- **String processing** (Tries, Suffix Trees)
- **Geometric and Cartesian problems** (Cartesian Trees)

Each of these structures exploits the hierarchical nature of trees to achieve performance that would be impossible with simpler data structures.

---

## **13.2 B-Trees and B+ Trees**

B-Trees are self-balancing tree data structures that maintain sorted data and allow searches, sequential access, insertions, and deletions in logarithmic time. They are optimized for systems that read and write large blocks of data, such as databases and file systems.

### **13.2.1 Why B-Trees?**

```
┌─────────────────────────────────────────────────────────────────────┐
│                    MOTIVATION FOR B-TREES                            │
├─────────────────────────────────────────────────────────────────────┤
│                                                                      │
│  1. DISK I/O EFFICIENCY: Disk access is orders of magnitude slower  │
│     than memory access. B-Trees minimize disk reads by storing      │
│     many keys per node (high fanout).                               │
│                                                                      │
│  2. BLOCK-ORIENTED STORAGE: Data is read/written in blocks (pages). │
│     B-Tree nodes correspond to page sizes (e.g., 4KB).              │
│                                                                      │
│  3. KEEPING HEIGHT LOW: With high fanout, height remains small      │
│     (typically 3-4 levels for billions of keys).                    │
│                                                                      │
│  4. SUPPORTS RANGE QUERIES: In B+ Trees, leaves form a linked list │
│     for efficient sequential access.                                │
│                                                                      │
│  5. USED EVERYWHERE: Almost all relational databases (MySQL,        │
│     PostgreSQL, Oracle) use B-Trees or B+ Trees for indexes.        │
│                                                                      │
└─────────────────────────────────────────────────────────────────────┘
```

### **13.2.2 B-Tree Definition**

A B-Tree of order **m** (also called minimum degree **t**, where m = 2t) satisfies:

- Every node has at most **2t** keys (and **2t+1** children).
- Every node (except root) has at least **t-1** keys.
- Root has at least 1 key.
- All leaves are at the same depth.
- Keys in a node are sorted, and children subtrees separate key ranges.

Commonly, **t** is chosen so that a node fits in one disk page (e.g., t = 100 for 4KB pages with 8-byte keys).

### **13.2.3 B-Tree Node Structure**

```python
class BTreeNode:
    def __init__(self, leaf=True):
        self.leaf = leaf          # True if leaf node
        self.keys = []            # list of keys (sorted)
        self.children = []        # list of child nodes (only for internal)
        self.n = 0                 # current number of keys
```

### **13.2.4 B-Tree Operations**

We'll implement a simplified B-Tree of minimum degree t.

```python
class BTree:
    def __init__(self, t):
        self.root = BTreeNode(leaf=True)
        self.t = t  # minimum degree

    def search(self, node, key):
        """Search for key in subtree rooted at node."""
        i = 0
        while i < node.n and key > node.keys[i]:
            i += 1
        if i < node.n and node.keys[i] == key:
            return (node, i)
        if node.leaf:
            return None
        else:
            # Read child from disk
            return self.search(node.children[i], key)

    def split_child(self, node, i):
        """Split child at index i (which is full)."""
        t = self.t
        child = node.children[i]
        new_child = BTreeNode(leaf=child.leaf)
        
        # Copy the larger half of keys to new_child
        new_child.keys = child.keys[t:]
        new_child.n = t - 1
        
        # Copy children if not leaf
        if not child.leaf:
            new_child.children = child.children[t:]
        
        # Keep smaller half in child
        child.keys = child.keys[:t-1]
        child.n = t - 1
        
        # Insert new_child into node's children
        node.children.insert(i+1, new_child)
        # Move up the median key
        node.keys.insert(i, child.keys[t-1])
        node.n += 1

    def insert(self, key):
        """Insert key into B-Tree."""
        root = self.root
        if root.n == 2*self.t - 1:
            # Root is full, create new root
            new_root = BTreeNode(leaf=False)
            new_root.children.append(root)
            self.root = new_root
            self.split_child(new_root, 0)
            self._insert_nonfull(new_root, key)
        else:
            self._insert_nonfull(root, key)

    def _insert_nonfull(self, node, key):
        """Insert key into non-full node."""
        i = node.n - 1
        if node.leaf:
            # Find position and insert
            node.keys.append(None)
            while i >= 0 and key < node.keys[i]:
                node.keys[i+1] = node.keys[i]
                i -= 1
            node.keys[i+1] = key
            node.n += 1
        else:
            # Find child to insert into
            while i >= 0 and key < node.keys[i]:
                i -= 1
            i += 1
            child = node.children[i]
            if child.n == 2*self.t - 1:
                self.split_child(node, i)
                if key > node.keys[i]:
                    i += 1
                child = node.children[i]
            self._insert_nonfull(child, key)
```

### **13.2.5 B+ Trees**

B+ Trees are a variant where:

- **All keys are stored in leaves.**
- Internal nodes contain only **routing keys** (duplicates of keys in leaves).
- Leaves are linked together for range queries.

This structure is even more efficient for databases because:

- More keys fit in internal nodes (since they don't store data pointers).
- Range scans traverse leaf links without going up/down the tree.

```
B+ Tree Structure:

           [50, 100]
         /     |     \
        /      |      \
    [10,20] [60,70] [110,120]   (internal nodes only contain keys)
       |       |        |
    data    data     data       (leaf nodes contain keys + data)
     ↓       ↓         ↓
    [10,20] [60,70]  [110,120]  (leaves linked: 20→60→70→110→120)
```

### **13.2.6 Applications in Databases**

Most relational databases use B+ Trees for indexes because:

- **Primary key indexes** are B+ Trees where leaf contains the entire row.
- **Secondary indexes** are B+ Trees where leaf contains the primary key (or rowid).
- **Clustered indexes** (InnoDB) store data rows in leaf pages of the primary key B+ Tree.

---

## **13.3 Segment Trees**

Segment trees are used for range queries and updates on arrays. They can answer queries like "sum of elements from index l to r" or "minimum in range" in O(log n) time, and also support point updates.

### **13.3.1 Definition and Structure**

A segment tree is a binary tree where each node represents an interval [l, r] of the array. The root covers the entire array [0, n-1]. Leaf nodes represent single elements. Internal nodes store aggregated information (sum, min, max, etc.) of their children.

```
Array: [1, 3, 5, 7, 9, 11]
Segment Tree for sum:

                 [0-5] 36
                /      \
           [0-2] 9     [3-5] 27
           /    \      /    \
       [0-1]4  [2]5  [3-4]16 [5]11
       /   \         /   \
     [0]1  [1]3    [3]7  [4]9
```

### **13.3.2 Implementation**

```python
class SegmentTree:
    def __init__(self, data, func=sum, default=0):
        """
        Initialize segment tree.
        func: associative operation (sum, min, max, etc.)
        default: identity element for func (0 for sum, float('inf') for min)
        """
        self.n = len(data)
        self.func = func
        self.default = default
        self.tree = [default] * (4 * self.n)  # 4*n is safe size
        self._build(data, 1, 0, self.n - 1)

    def _build(self, data, node, left, right):
        """Build tree recursively."""
        if left == right:
            self.tree[node] = data[left]
        else:
            mid = (left + right) // 2
            self._build(data, node*2, left, mid)
            self._build(data, node*2+1, mid+1, right)
            self.tree[node] = self.func([self.tree[node*2], self.tree[node*2+1]])

    def update(self, idx, value):
        """Update element at index idx to value."""
        self._update(1, 0, self.n-1, idx, value)

    def _update(self, node, left, right, idx, value):
        if left == right:
            self.tree[node] = value
        else:
            mid = (left + right) // 2
            if idx <= mid:
                self._update(node*2, left, mid, idx, value)
            else:
                self._update(node*2+1, mid+1, right, idx, value)
            self.tree[node] = self.func([self.tree[node*2], self.tree[node*2+1]])

    def query(self, ql, qr):
        """Query over range [ql, qr]."""
        return self._query(1, 0, self.n-1, ql, qr)

    def _query(self, node, left, right, ql, qr):
        if ql > right or qr < left:
            return self.default
        if ql <= left and right <= qr:
            return self.tree[node]
        mid = (left + right) // 2
        left_res = self._query(node*2, left, mid, ql, qr)
        right_res = self._query(node*2+1, mid+1, right, ql, qr)
        return self.func([left_res, right_res])
```

### **13.3.3 Range Minimum Query (RMQ) Example**

```python
arr = [2, 5, 1, 4, 9, 3]
seg_min = SegmentTree(arr, func=min, default=float('inf'))
print(seg_min.query(1, 4))  # min of [5,1,4,9] = 1
seg_min.update(2, 0)        # set index 2 to 0
print(seg_min.query(1, 4))  # min of [5,0,4,9] = 0
```

### **13.3.4 Lazy Propagation**

For range updates (adding a value to all elements in a range), we need **lazy propagation** to avoid updating every leaf.

```python
class LazySegmentTree:
    def __init__(self, data):
        self.n = len(data)
        self.tree = [0] * (4 * self.n)
        self.lazy = [0] * (4 * self.n)
        self._build(data, 1, 0, self.n-1)

    def _build(self, data, node, left, right):
        if left == right:
            self.tree[node] = data[left]
        else:
            mid = (left + right)//2
            self._build(data, node*2, left, mid)
            self._build(data, node*2+1, mid+1, right)
            self.tree[node] = self.tree[node*2] + self.tree[node*2+1]

    def _push(self, node, left, right):
        """Propagate lazy updates to children."""
        if self.lazy[node] != 0:
            self.tree[node] += (right-left+1) * self.lazy[node]
            if left != right:
                self.lazy[node*2] += self.lazy[node]
                self.lazy[node*2+1] += self.lazy[node]
            self.lazy[node] = 0

    def range_add(self, ql, qr, val):
        self._range_add(1, 0, self.n-1, ql, qr, val)

    def _range_add(self, node, left, right, ql, qr, val):
        self._push(node, left, right)
        if ql > right or qr < left:
            return
        if ql <= left and right <= qr:
            self.lazy[node] += val
            self._push(node, left, right)
        else:
            mid = (left+right)//2
            self._range_add(node*2, left, mid, ql, qr, val)
            self._range_add(node*2+1, mid+1, right, ql, qr, val)
            self.tree[node] = self.tree[node*2] + self.tree[node*2+1]

    def range_sum(self, ql, qr):
        return self._range_sum(1, 0, self.n-1, ql, qr)

    def _range_sum(self, node, left, right, ql, qr):
        self._push(node, left, right)
        if ql > right or qr < left:
            return 0
        if ql <= left and right <= qr:
            return self.tree[node]
        mid = (left+right)//2
        return (self._range_sum(node*2, left, mid, ql, qr) +
                self._range_sum(node*2+1, mid+1, right, ql, qr))
```

---

## **13.4 Fenwick Trees (Binary Indexed Trees)**

Fenwick Trees (BIT) are simpler than segment trees for prefix sum queries and point updates. They use less memory and are faster in practice.

### **13.4.1 Idea**

The Fenwick tree is based on the observation that any integer can be represented as a sum of powers of two. It stores partial sums in a way that both update and prefix query can be done in O(log n).

- Indexes are 1-based (for bit trick).
- `tree[i]` stores the sum of a range of length `lowbit(i)` ending at i.
- `lowbit(i) = i & -i` isolates the lowest set bit.

### **13.4.2 Implementation**

```python
class FenwickTree:
    def __init__(self, n):
        self.n = n
        self.tree = [0] * (n + 1)  # 1-indexed

    def update(self, i, delta):
        """Add delta to element at index i (1-based)."""
        while i <= self.n:
            self.tree[i] += delta
            i += i & -i

    def prefix_sum(self, i):
        """Sum of first i elements (1-based)."""
        s = 0
        while i > 0:
            s += self.tree[i]
            i -= i & -i
        return s

    def range_sum(self, l, r):
        """Sum of elements from l to r inclusive (1-based)."""
        return self.prefix_sum(r) - self.prefix_sum(l - 1)
```

### **13.4.3 Building from Array**

To initialize BIT from an array, we can do:

```python
def build(self, arr):
    for i, val in enumerate(arr, 1):  # 1-based index
        self.update(i, val)
```

Or in O(n) by filling tree directly:

```python
def build_fast(self, arr):
    self.n = len(arr)
    self.tree = [0] + arr[:]  # copy, 1-indexed
    for i in range(1, self.n+1):
        j = i + (i & -i)
        if j <= self.n:
            self.tree[j] += self.tree[i]
```

### **13.4.4 Range Update and Point Query**

BIT can also support range updates and point queries using difference array concept:

- To add `x` to range [l, r]: update(l, x), update(r+1, -x)
- Point query at i: prefix_sum(i)

### **13.4.5 Applications**

- Counting inversions (using BIT as frequency array)
- Order statistics (finding kth smallest with BIT)
- Maintaining prefix sums in dynamic arrays

---

## **13.5 Tries (Prefix Trees)**

A **Trie** (from re**trie**val) is a tree-like data structure for storing strings, where each node represents a common prefix. They support fast prefix queries and are used in autocomplete, spell checkers, and IP routing.

### **13.5.1 Standard Trie**

Each node contains an array (or map) of children for each possible character.

```python
class TrieNode:
    def __init__(self):
        self.children = {}
        self.is_end = False

class Trie:
    def __init__(self):
        self.root = TrieNode()

    def insert(self, word):
        node = self.root
        for ch in word:
            if ch not in node.children:
                node.children[ch] = TrieNode()
            node = node.children[ch]
        node.is_end = True

    def search(self, word):
        node = self.root
        for ch in word:
            if ch not in node.children:
                return False
            node = node.children[ch]
        return node.is_end

    def starts_with(self, prefix):
        node = self.root
        for ch in prefix:
            if ch not in node.children:
                return False
            node = node.children[ch]
        return True
```

### **13.5.2 Compressed Trie (Radix Tree)**

A radix tree compresses paths where nodes have only one child to save space.

```
Standard Trie:   "abc", "abd", "bcd"
        root
       /    \
      a      b
     /        \
    b          c
   / \          \
  c   d          d
 /     \          \
(abc) (abd)       (bcd)

Radix Tree:
        root
       /    \
     "ab"   "bcd"
     /  \
   "c"  "d"
   /     \
 (abc)  (abd)
```

Implementation can use strings as edge labels.

### **13.5.3 Applications**

- **Autocomplete**: Find all words with given prefix by traversing trie.
- **Spell checking**: Check if word exists; suggest corrections using edit distance.
- **IP routing (Longest Prefix Matching)**: Radix trees are used in routing tables.
- **String databases**: Efficient storage and retrieval of large dictionaries.

---

## **13.6 Suffix Trees and Suffix Arrays**

Suffix trees and suffix arrays are powerful data structures for string processing, enabling fast substring search, pattern matching, and other operations.

### **13.6.1 Suffix Tree**

A suffix tree is a compressed trie of all suffixes of a string. For a string S of length n, there are n suffixes. A suffix tree stores these suffixes in a way that any substring of S can be found in O(m) time (m = pattern length).

Construction algorithms: Ukkonen's algorithm (O(n) time, but complex).

```
String: "banana"
Suffixes:
0: banana
1: anana
2: nana
3: ana
4: na
5: a

Suffix tree (simplified):
        root
       /  |  \
      a   na  banana
     /    |     \
   na    a      (leaf)
   /      \
 (leaf)   (leaf)
```

Due to complexity, we often use suffix arrays instead.

### **13.6.2 Suffix Array**

A suffix array is an array of starting indices of all suffixes of a string, sorted lexicographically. It can be built in O(n log n) or O(n) time.

```python
def build_suffix_array(s):
    """Naive O(n^2 log n) for demonstration; better algorithms exist."""
    n = len(s)
    suffixes = [(s[i:], i) for i in range(n)]
    suffixes.sort()
    return [idx for (_, idx) in suffixes]
```

**LCP Array** (Longest Common Prefix) stores the length of the longest common prefix between consecutive suffixes in the suffix array. It enables powerful queries.

```python
def build_lcp(s, suffix_array):
    n = len(s)
    rank = [0] * n
    for i, idx in enumerate(suffix_array):
        rank[idx] = i
    lcp = [0] * (n-1)
    h = 0
    for i in range(n):
        if rank[i] > 0:
            j = suffix_array[rank[i]-1]
            while i+h < n and j+h < n and s[i+h] == s[j+h]:
                h += 1
            lcp[rank[i]-1] = h
            if h > 0:
                h -= 1
    return lcp
```

### **13.6.3 Applications**

- **Pattern matching**: Find all occurrences of pattern P in S in O(m + log n) using binary search on suffix array.
- **Longest repeated substring**: Scan LCP array for maximum value.
- **Longest common substring** of two strings: Concatenate with sentinel, build suffix array, check LCP across different strings.
- **Burrows-Wheeler Transform**: Used in data compression (bzip2).

---

## **13.7 Cartesian Trees**

A Cartesian tree is a tree derived from an array where:

- The root is the minimum (or maximum) element.
- Inorder traversal gives the original order.
- The left and right subtrees are Cartesian trees of left and right subarrays.

### **13.7.1 Construction**

```python
def build_cartesian_tree(arr):
    """Build Cartesian tree for minimum (root is min)."""
    n = len(arr)
    parent = [-1] * n
    left = [-1] * n
    right = [-1] * n
    stack = []
    for i in range(n):
        last = -1
        while stack and arr[stack[-1]] > arr[i]:
            last = stack.pop()
        if stack:
            right[stack[-1]] = i
            parent[i] = stack[-1]
        if last != -1:
            left[i] = last
            parent[last] = i
        stack.append(i)
    root = stack[0]
    # Reconstruct tree nodes
    nodes = [TreeNode(arr[i]) for i in range(n)]
    for i in range(n):
        if left[i] != -1:
            nodes[i].left = nodes[left[i]]
        if right[i] != -1:
            nodes[i].right = nodes[right[i]]
    return nodes[root]
```

### **13.7.2 Properties and Uses**

- **Treap relation**: A treap is a Cartesian tree where keys are values and priorities are random heap values.
- **RMQ**: The LCA of two nodes in a Cartesian tree (for min) corresponds to the minimum in the range.
- **Suffix tree construction**: Used in some suffix tree algorithms.

---

## **13.8 Summary and Complexities**

```
┌────────────────────────────┬──────────────┬──────────────────────────┐
│ Data Structure             │ Space        │ Key Operations           │
├────────────────────────────┼──────────────┼──────────────────────────┤
│ B-Tree (m=order)           │ O(n)         │ Search/Insert/Delete     │
│                            │              │ O(log_m n) I/Os          │
├────────────────────────────┼──────────────┼──────────────────────────┤
│ Segment Tree               │ O(n)         │ Range query/point update │
│                            │              │ O(log n)                 │
├────────────────────────────┼──────────────┼──────────────────────────┤
│ Fenwick Tree               │ O(n)         │ Prefix sum/point update  │
│                            │              │ O(log n)                 │
├────────────────────────────┼──────────────┼──────────────────────────┤
│ Trie (alphabet size k)     │ O(n * avg    │ Insert/Search/StartsWith │
│                            │   length)    │ O(m)                     │
├────────────────────────────┼──────────────┼──────────────────────────┤
│ Suffix Array               │ O(n)         │ Pattern search:          │
│                            │              │ O(m log n)               │
├────────────────────────────┼──────────────┼──────────────────────────┤
│ Cartesian Tree             │ O(n)         │ Construction: O(n)       │
└────────────────────────────┴──────────────┴──────────────────────────┘
```

---

## **13.9 Practice Problems**

### **B-Trees**
1. **B-Tree Insertion Simulation**: Given a sequence of insertions into a B-Tree of order 5, draw the resulting tree.
2. **Database Index Design**: Explain why B+ Trees are preferred over B-Trees for range queries.

### **Segment Trees**
3. **Range Sum with Updates**: Implement a segment tree to support range sum queries and point updates.
4. **Range Minimum Query with Updates**: Modify for min with lazy propagation.

### **Fenwick Trees**
5. **Count Inversions**: Use Fenwick tree to count inversions in an array in O(n log n).
6. **2D Fenwick Tree**: Extend BIT to 2D for submatrix sum queries.

### **Tries**
7. **Word Search II**: Given a 2D board and a list of words, find all words that can be built from letters of sequentially adjacent cells. Use a trie for efficiency.
8. **Longest Common Prefix**: Find the longest common prefix among a set of strings using a trie.

### **Suffix Arrays**
9. **Longest Repeated Substring**: Given a string, find the longest substring that appears at least twice.
10. **Pattern Matching**: Given a text and pattern, find all occurrences using suffix array and binary search.

### **Cartesian Trees**
11. **Build Cartesian Tree**: Implement construction and verify inorder traversal matches original array.

---

## **13.10 Further Reading**

1. **"Introduction to Algorithms" (CLRS)** - Chapter 18 (B-Trees), Chapter 14 (Augmenting Data Structures), Chapter 32 (String Matching)
2. **"Algorithms on Strings, Trees, and Sequences"** by Dan Gusfield - Comprehensive coverage of suffix trees and arrays
3. **"The Art of Computer Programming, Vol 3"** - Section 6.2.4 (Multiway Trees)
4. **"Data Structures and Algorithms in Python"** - Chapter 15 (Tries), Chapter 16 (Suffix Arrays)
5. **Original Papers**:
   - Bayer, McCreight (1972) - "Organization and Maintenance of Large Ordered Indices" (B-Trees)
   - Ukkonen (1995) - "On-line construction of suffix trees"
   - Manber, Myers (1993) - "Suffix arrays: a new method for on-line string searches"

---

> **Coming in Chapter 14**: **Heaps and Priority Queues** - We'll dive into binary heaps, binomial heaps, Fibonacci heaps, and their applications in scheduling, graph algorithms, and more.

---

**End of Chapter 13**