# Chapter 28: Disjoint Set Union (Union-Find)

> *"Union-Find is the glue that binds components together—literally. It elegantly tracks connectivity in a world of merging sets."* — Anonymous

---

## 28.1 Introduction to Disjoint Set Union

**Disjoint Set Union (DSU)**, also known as **Union-Find**, is a data structure that tracks a partition of a set into disjoint (non-overlapping) subsets. It supports two primary operations efficiently:

- **Find:** Determine which subset a particular element belongs to (often returning a representative or "root" of that subset).
- **Union:** Merge two subsets into a single subset.

DSU is fundamental in problems involving connectivity, especially in graphs, and is a key component of Kruskal's algorithm for minimum spanning trees.

### 28.1.1 Why DSU Matters

```
┌─────────────────────────────────────────────────────────────────────┐
│                    IMPORTANCE OF DISJOINT SET UNION                   │
├─────────────────────────────────────────────────────────────────────┤
│                                                                      │
│  1. CONNECTIVITY: Track connected components in dynamic graphs      │
│     where edges are added over time.                                 │
│                                                                      │
│  2. MINIMUM SPANNING TREES: Kruskal's algorithm uses DSU to         │
│     efficiently check for cycles.                                   │
│                                                                      │
│  3. PERCOLATION: Determine when a grid becomes connected.           │
│                                                                      │
│  4. IMAGE PROCESSING: Connected component labeling in binary images.│
│                                                                      │
│  5. NETWORK CONNECTIVITY: Check if two computers are in the same    │
│     network segment.                                                 │
│                                                                      │
│  6. SOCIAL NETWORKS: Find if two people are in the same friend      │
│     circle.                                                          │
│                                                                      │
│  7. EQUIVALENCE RELATIONS: Group elements that satisfy some         │
│     equivalence relation (e.g., same remainder modulo k).           │
│                                                                      │
└─────────────────────────────────────────────────────────────────────┘
```

### 28.1.2 Basic Operations

The DSU maintains a forest of trees, where each tree represents a set. The root of each tree is the **representative** of that set.

- **Find(x):** Follow parent pointers from x until reaching the root. Return the root.
- **Union(x, y):** Find roots of x and y. If they are different, make one root the parent of the other, merging the trees.

Without optimizations, these operations can take O(n) time in the worst case (a skewed tree). With **path compression** and **union by rank**, they become nearly constant time.

---

## 28.2 Basic Implementation

We'll start with a simple implementation that tracks an array `parent` where `parent[i]` is the parent of element i (initially each element is its own parent).

```python
class DSU:
    def __init__(self, n):
        self.parent = list(range(n))
    
    def find(self, x):
        # Follow parent pointers until reaching root
        while self.parent[x] != x:
            x = self.parent[x]
        return x
    
    def union(self, x, y):
        root_x = self.find(x)
        root_y = self.find(y)
        if root_x != root_y:
            self.parent[root_y] = root_x  # make root_x parent of root_y
```

**Time Complexity:** O(n) in worst case for both find and union.

---

## 28.3 Optimizations

### 28.3.1 Union by Rank (or Size)

To keep trees shallow, we attach the smaller tree under the larger tree. We maintain a `rank` array (or `size`) representing the approximate height (or number of elements) of each tree.

```python
class DSU:
    def __init__(self, n):
        self.parent = list(range(n))
        self.rank = [0] * n   # rank approximates tree height
    
    def find(self, x):
        while self.parent[x] != x:
            x = self.parent[x]
        return x
    
    def union(self, x, y):
        root_x = self.find(x)
        root_y = self.find(y)
        if root_x == root_y:
            return False
        # Union by rank: attach smaller rank tree under larger rank
        if self.rank[root_x] < self.rank[root_y]:
            self.parent[root_x] = root_y
        elif self.rank[root_x] > self.rank[root_y]:
            self.parent[root_y] = root_x
        else:
            self.parent[root_y] = root_x
            self.rank[root_x] += 1
        return True
```

**Union by size** is similar but tracks number of elements.

### 28.3.2 Path Compression

During `find`, we can make nodes point directly to the root, flattening the tree for future queries.

```python
def find(self, x):
    if self.parent[x] != x:
        self.parent[x] = self.find(self.parent[x])  # recursion compresses path
    return self.parent[x]
```

**Iterative version:**

```python
def find(self, x):
    # Find root
    root = x
    while self.parent[root] != root:
        root = self.parent[root]
    # Path compression
    while x != root:
        next_node = self.parent[x]
        self.parent[x] = root
        x = next_node
    return root
```

### 28.3.3 Complexity with Optimizations

With both union by rank and path compression, the amortized time per operation is nearly constant. Specifically, the inverse Ackermann function α(n) appears, which grows so slowly that for all practical n, it is ≤ 4.

- **Find:** O(α(n))
- **Union:** O(α(n))

Thus, DSU operations are effectively O(1).

---

## 28.4 Applications of DSU

### 28.4.1 Kruskal's Algorithm (MST)

As seen in Chapter 18, DSU is used to detect cycles while adding edges in order of increasing weight.

```python
def kruskal(n, edges):
    edges.sort(key=lambda e: e[2])
    dsu = DSU(n)
    mst_weight = 0
    for u, v, w in edges:
        if dsu.union(u, v):  # returns True if edge added (different components)
            mst_weight += w
    return mst_weight
```

### 28.4.2 Connected Components in a Graph

Given a static graph, we can use DSU to find connected components.

```python
def connected_components(n, edges):
    dsu = DSU(n)
    for u, v in edges:
        dsu.union(u, v)
    components = {}
    for i in range(n):
        root = dsu.find(i)
        components.setdefault(root, []).append(i)
    return components
```

### 28.4.3 Dynamic Connectivity

When edges are added over time, DSU can efficiently answer queries like "are u and v connected?" after each addition.

### 28.4.4 Number of Islands II (LeetCode 305)

You are given an empty grid and positions to add land cells. After each addition, return the number of islands. DSU can link adjacent land cells.

### 28.4.5 Redundant Connection (LeetCode 684)

Find an edge that can be removed to make the graph a tree. Use DSU: when adding an edge, if its endpoints are already connected, it's redundant.

### 28.4.6 Equations with Variables (LeetCode 990)

Given equations like "a==b" and "a!=b", check if they are consistent. Use DSU for equalities, then check inequalities.

### 28.4.7 Regions Cut by Slashes (LeetCode 959)

Represent each cell as 4 triangles and union them based on slashes. DSU tracks connectivity.

### 28.4.8 Longest Consecutive Sequence (LeetCode 128)

Given unsorted array, find longest consecutive sequence. Use DSU or union neighboring numbers.

---

## 28.5 Advanced Variants

### 28.5.1 DSU with Rollback (Persistent Union-Find)

Sometimes we need to undo unions (e.g., in offline algorithms). We can implement DSU with a stack to record changes and roll back.

```python
class RollbackDSU:
    def __init__(self, n):
        self.parent = list(range(n))
        self.rank = [0] * n
        self.stack = []  # stores (changed_node, old_parent, old_rank_of_root)
    
    def find(self, x):
        while self.parent[x] != x:
            x = self.parent[x]
        return x
    
    def union(self, x, y):
        x = self.find(x)
        y = self.find(y)
        if x == y:
            self.stack.append(None)
            return False
        if self.rank[x] < self.rank[y]:
            x, y = y, x
        self.stack.append((y, self.parent[y], self.rank[x]))
        self.parent[y] = x
        if self.rank[x] == self.rank[y]:
            self.rank[x] += 1
        return True
    
    def snapshot(self):
        return len(self.stack)
    
    def rollback(self, snap):
        while len(self.stack) > snap:
            op = self.stack.pop()
            if op:
                y, old_parent, old_rank = op
                self.parent[y] = old_parent
                # Need to know x (the root) to restore rank? We stored rank of x.
                # Actually we need x; we can deduce? Or store (x, old_rank_x, y, old_parent_y)
                # Simplified: we store (x, old_rank_x, y, old_parent_y)
                pass  # more complex, omitted for brevity
```

### 28.5.2 DSU on Trees (DSU with Heuristics)

In tree problems, we sometimes use DSU to process queries offline, merging subtrees and answering queries about nodes in the same component. This is called "DSU on trees" or "small-to-large merging" (also known as "sack" trick).

**Example:** Count number of nodes with a certain color in each subtree. We can process queries offline, merging smaller sets into larger ones to achieve O(n log n).

```python
def dsu_on_tree(adj, queries):
    # queries: list of (node, color) to ask count in subtree of node
    # returns answers
    n = len(adj)
    size = [1] * n
    # compute subtree sizes via DFS
    def dfs_sz(u, p):
        for v in adj[u]:
            if v != p:
                dfs_sz(v, u)
                size[u] += size[v]
    dfs_sz(0, -1)
    
    # DSU on tree: for each node, process its heavy child first
    def dfs(u, p, keep):
        # process light children first, clearing their data
        for v in adj[u]:
            if v != p and v != heavy[u]:
                dfs(v, u, False)
        # process heavy child, keeping its data
        if heavy[u] != -1:
            dfs(heavy[u], u, True)
        # now add current node and light children's contributions
        add(u)
        for v in adj[u]:
            if v != p and v != heavy[u]:
                add_subtree(v, u)  # add all nodes in v's subtree
        # answer queries at u
        for q in queries[u]:
            q.answer = count[q.color]
        if not keep:
            remove_subtree(u)  # clear data
```

This is a powerful technique for subtree queries.

### 28.5.3 Persistent Union-Find

For problems where we need to query connectivity at different points in time (versioning), we can build a persistent DSU using path copying or functional data structures. This is advanced and rarely needed in interviews.

---

## 28.6 Offline Queries and DSU

Many problems can be solved offline by sorting queries and using DSU to maintain connectivity.

**Example:** Given a graph with edge weights and queries asking if two nodes are connected using only edges with weight ≤ w, we can sort queries by w and add edges incrementally, using DSU to check connectivity.

```python
def offline_connectivity(n, edges, queries):
    # edges: (u, v, w)
    # queries: (w, u, v, index)
    edges.sort(key=lambda e: e[2])
    queries.sort(key=lambda q: q[0])
    dsu = DSU(n)
    answers = [False] * len(queries)
    e_idx = 0
    for w, u, v, idx in queries:
        while e_idx < len(edges) and edges[e_idx][2] <= w:
            a, b, _ = edges[e_idx]
            dsu.union(a, b)
            e_idx += 1
        answers[idx] = dsu.find(u) == dsu.find(v)
    return answers
```

---

## 28.7 Summary

```
┌─────────────────────────────────────────────────────────────────────┐
│                    DISJOINT SET UNION SUMMARY                         │
├─────────────────────────────────────────────────────────────────────┤
│                                                                      │
│  Operations: find(x), union(x, y)                                   │
│                                                                      │
│  Optimizations:                                                     │
│    • Path compression: flatten tree during find                     │
│    • Union by rank/size: attach smaller tree under larger           │
│                                                                      │
│  Time Complexity: O(α(n)) per operation (inverse Ackermann)         │
│                                                                      │
│  Applications:                                                      │
│    • Kruskal's MST                                                  │
│    • Connected components                                           │
│    • Dynamic connectivity                                           │
│    • Redundant connection detection                                 │
│    • Percolation                                                    │
│    • Image segmentation                                             │
│    • Offline queries with edge weights                              │
│    • DSU on trees (small-to-large merging)                          │
│                                                                      │
└─────────────────────────────────────────────────────────────────────┘
```

---

## 28.8 Practice Problems

### Easy/Medium
1. **Number of Provinces** (LeetCode 547) – Find number of connected components in a graph.
2. **Redundant Connection** (LeetCode 684)
3. **Friend Circles** (LeetCode 547) – same as Number of Provinces.
4. **Accounts Merge** (LeetCode 721)
5. **Number of Islands II** (LeetCode 305) – premium, but classic.
6. **Graph Valid Tree** (LeetCode 261) – check if edges form a tree.
7. **The Earliest Moment When Everyone Become Friends** (LeetCode 1101) – sort by time and union.

### Medium/Hard
8. **Redundant Connection II** (LeetCode 685) – directed graph version.
9. **Regions Cut By Slashes** (LeetCode 959)
10. **Couples Holding Hands** (LeetCode 765) – use DSU to count cycles.
11. **Bricks Falling When Hit** (LeetCode 803) – reverse time + DSU.
12. **Rank Transform of a Matrix** (LeetCode 1632) – DSU + topological sort.
13. **Swim in Rising Water** (LeetCode 778) – can be solved with DSU + binary search or BFS.
14. **Minimize Malware Spread** (LeetCode 924) – DSU to track component sizes.

### Advanced (DSU on Trees)
15. **Tree Queries** – problems requiring small-to-large merging (e.g., count nodes with certain property in subtree).
16. **Frequency of Most Frequent Element** (LeetCode 1838) – not DSU, but sliding window.
17. **Online Queries** – offline DSU with time.

---

## 28.9 Further Reading

1. **"Introduction to Algorithms" (CLRS)** – Chapter 21 (Data Structures for Disjoint Sets)
2. **"The Algorithm Design Manual"** by Steven Skiena – Section 6.1 (Union-Find)
3. **"Competitive Programming"** by Halim & Halim – Chapter 2 (Data Structures) includes DSU.
4. **Original Papers**:
   - Galler, B. A., & Fisher, M. J. (1964) – "An improved equivalence algorithm"
   - Tarjan, R. E. (1975) – "Efficiency of a good but not linear set union algorithm"
   - Tarjan, R. E., & van Leeuwen, J. (1984) – "Worst-case analysis of set union algorithms"

---

> **Coming in Chapter 29**: **Advanced Data Structures** – We'll explore treaps, splay trees, link-cut trees, and other powerful structures.

---

**End of Chapter 28**