# Union Find

Solves the problem of dynamic connectivity.

Given a set of $N$ objects:
- Union command: connects two object
- Find/connected query: is there a path connecting two objects?

Connectivity properties:
- Reflexive: $p$ is connected to $p$
- Symmetric: if $p$ is connected to $q$, then $q$ is connected to $p$
- Transitive: if $p$ is connected to $q$ and $q$ is connected to $r$, then $p$ is connected to $r$

After making a number of connections between objects, you end up with sets of mutually-connected components.

The goal of the algorithm is to design an efficient data structure for union-find - may have a huge number of objects $N$ or operations $M$, and the find queries and union commands may be intermixed.

## Quick-find: the eager approach

Quick-find uses an integer array `id[]` of length $N$, where the index represents each object, and $p$ and $q$ are connected if and only if they have the same id (same number stored in the array at their respective indices).

In general, quick-find is slow (order of growth of number of array accesses - read/write):

| Algorithm | Initialize | Union | Find |
| - | - | - | - |
| quick-find | N | N | 1|

The defect with this algorithm is that the `union` operation is too expensive - it takes $N^2$ (quadratic) array accesses to process a sequence of $N$ union commands on $N$ objects.

Quadratic algorithms aren't acceptable for problems, in general, they don't scale with new technology.

**Conclusion:** quick-find is too slow for big problems.

In [3]:
class QuickFindUF():
    def __init__(self, N):
        self.id = [i for i in range(N)]
    
    def connected(self, p, q):
        return self.id[p] == self.id[q]
    
    def union(self, p, q):
        pid = self.id[p]
        qid = self.id[q]
        for i in range(len(self.id)):
            if self.id[i] == pid:
                self.id[i] = qid
        print("ID: {}".format(self.id))

In [8]:
quick_find = QuickFindUF(10)
quick_find.union(0, 8)

ID: [8, 1, 2, 3, 4, 5, 6, 7, 8, 9]


In [9]:
quick_find.connected(0, 8)

True

In [10]:
quick_find.connected(3, 4)

False

## Quick-union: the lazy approach

Quick-union uses an integer array `id[]` of length $N$, where `id[i]` is the parent of `i`. The root of `i` is `id[id[id[...id[i]...]]]`.

The `connected` method checks whether $p$ and $q$ have the same root. The `union` method merges components by setting the `id` of $p$'s root to the `id` of $q$'s root.

| Algorithm | Initialize | Union | Find |
| - | - | - | - |
| quick-find | N | N | 1|
| quick-union | N | N\* | N|
\* includes cost of finding roots

Defect with this algorithm is that the trees can get too tall. The `find` operation is too expensive (could be $N$ array accesses).

In [12]:
class QuickUnionUF():
    def __init__(self, N):
        self.id = [i for i in range(N)]
    
    def root(self, i):
        while i != self.id[i]:
            i = self.id[i]
        return i
    
    def connected(self, p, q):
        return self.root(p) == self.root(q)
    
    def union(self, p, q):
        i = self.root(p)
        j = self.root(q)
        self.id[i] = j
        print("ID: {}".format(self.id))

In [14]:
quick_union = QuickUnionUF(10)
quick_union.union(0, 8)
quick_union.union(2, 8)

ID: [8, 1, 2, 3, 4, 5, 6, 7, 8, 9]
ID: [8, 1, 8, 3, 4, 5, 6, 7, 8, 9]


In [15]:
quick_union.connected(0, 8)

True

In [16]:
quick_union.connected(3, 4)

False

## Improvements

One possible improvement to these algorithms is using weighted trees, where you always move the smaller tree under the parent of the larger one in a union. This helps reduce tree depth.

- Running time takes time proportional to the depths of $p$ and $q$
- The `union` method takes constant time
- The depth of any node $x$ is at most $log(N)$

Proof for max depth of any node containing $x$: The depth increases by 1 when tree $T_1$ containing $x$ is merged into another (larger) tree $T_2$. The size of the tree containing $x$ at least doubles since $|T_2| \geq |T_1|$. The size of tree containing $x$ can double at most $lg(N)$ times.

| Algorithm | Initialize | Union | Find |
| - | - | - | - |
| quick-find | N | N | 1 |
| quick-union | N | N\* | N |
| weighted QU | N | lg(N) | lg(N) |
\* includes cost of finding roots

**Conclusion:** Improves the quick-find and quick-union algorithms, but can be improved upon further with path compression. Make every other node in the path point to its grandparent

Path compression for any sequence of $M$ union-find operations on $N$ objects makes $\leq c (N + M \; lg* \; N)$ array accesses, where $lg*$ is the iterative log function. In theory, weighted quick-union with path compression (WQUPC) is not quite linear, but in practice it is.

In [22]:
class WeightedQuickUnionUF():
    def __init__(self, N):
        self.id = [i for i in range(N)]
        self.sz = [1] * N
    
    def root(self, i):
        while i != self.id[i]:
            # Make every other node in path point to its grandparent
            self.id[i] = self.id[self.id[i]]
            i = self.id[i]
        return i
    
    def connected(self, p, q):
        return self.root(p) == self.root(q)
    
    def union(self, p, q):
        i = self.root(p)
        j = self.root(q)
        if i == j:
            return
        if self.sz[i] < self.sz[j]:
            self.id[i] = j
            self.sz[j] += self.sz[i]
        else:
            self.id[j] = i
            self.sz[i] += self.sz[j]

        print("ID: {}".format(self.id))

In [23]:
wtd_quick_union = WeightedQuickUnionUF(10)
wtd_quick_union.union(0, 8)
wtd_quick_union.union(2, 8)

ID: [0, 1, 2, 3, 4, 5, 6, 7, 0, 9]
ID: [0, 1, 0, 3, 4, 5, 6, 7, 0, 9]


In [24]:
wtd_quick_union.connected(0, 8)

True

In [25]:
wtd_quick_union.connected(3, 4)

False

## Summary of Union-find Algorithms

Given $M$ union-find operations on a set of $N$ objects:

| Algorithm | Worst-Case Time |
| - | - |
| quick-find | M N |
| quick-union | M N |
| weighted QU | N + M log N |
| QU + path compression | N + M log N |
| weighted QU + path compression | N + M lg\* N |

## Union-find Applications

- Percolation
- Games (Go, Hex)
- Dynamic connectivity
- Least common ancestor
- Equivalence of finite state automata
- Hoshen-Kopelman algorithm in physics
- Hinley-Milner polymorphic type inference
- Kruskal's minimum spanning tree algorithm
- Compiling equivalence statements in Fortran
- Morphological attribute openings and closings
- Matlab's `bwlabel()` function in image processing