# Chapter 07: Union-Find Structures

**Connected component**: a subset T of members of group S that are in some way related. Take for example a social network. This satisfies the following properties:

- every member is related through friendship. For x and y in T either x and y are friends or there is a chain of friendship that connects them
- no one in T is friends with anyone outside of T 

**Union-find structures**: data structures used to identify connected components in networks. Also known as **partitions**. Each member of the network is granted a set, and a union operation is performed for every pair of friends x and y that belong to different sets at the point to be considered their edge (x,y). These structures implement the following methods:

- makeSet(e): create a singleton set containing element e and name this set 'e' 
- union(A,B): update A and B to create $A \cup B$, naming the result 'A' or 'B'
- find(e): return the name of the set containing element e

## Applications of Union-Find

### Connected Components

The following algorithm identifies connected components from a given social network N, defined by a set S of people and a set E of edges defining relationships:

![union-find-algo](./res/07-union-find-algo.PNG)

This algorithm operates in $O((n+m)logn)$ time using a list-based implementation, where m is the amount of union and find operations and n is the amount of singleton sets created with the makeSet() method

### Mazes

**Maze**: two-dimensional visual puzzle, defined by cells, that can be traversed, with walls acting as barriers. Mazes may be generated through the use of the following algorithm:

![maze-gen](./res/07-maze-gen.PNG)


This algorithm runs in $O(t(n,m))$ time, which is the running time for performing m union and find operations on an initial set of n singleton lists

When generating a maze from 100 cells, the algorithm makes a list of 99 edges to remove. A wall between cell x and y may be removed if find(x) != find(y). The maze entrance and exit are each created by removing an outer cell's wall, which may be located anywhere. Additionally, this algorithm also works for three-dimensional mazes

## List-based Implementation

A simple implementation of a union-find structure uses a collection of linked lists. Each list has a head node that contains the following: 

- size of the set
- name of the set
- pointer to the first and lat nodes of a linked list containing pointers to all elements of the set

Under this architecture, find(e) takes $O(1)$ time thanks to the aforementioned pointers. makeSet() also takes $O(1)$ time since it involves creating a new head node. union(A,B) takes $O(n)$ time as two linked lists must be joined into one and the head pointers for all nodes must be updated

Performing a sequence, $\phi$, of m union and find operations starting from n singleton sets using the list-based implementation of a union-find structure takes $O(nlogn + n)$ time. The amortized running time of each union operation is $O(logn)$ and is $O(1)$ for each makeSet and find operation

## Tree-based Implementation

Alternatively, a collection of trees may be used, with each tree associated with a different set. Under this approach, union operations are performed in $O(1)$ time as the parent pointer of the root of one tree may be pointed to the root of another tree

The following heuristics are added to improve speed:

- **union-by-size**: store with each node v the size of the sub-tree rooted at v, denoted by n(v). In a union, the tree of the smaller set is made a sub-tree of the other tree, and the size field of the root is updated
- **path compression**: in a find operation, for each node v that find visits, reset the parent pointer from v to point to the root

Performing a sequence, $\phi$, of m union and find operations, starting from n singleton sets using the tree-based implementation of a union-find structure takes $O(n + mlogn)$ time

The analysis of this tree-based implementation is based on the use of $\alpha(n)$, a very slow-growing function that is the inverse of the fast-growing **Ackermann function**

If node w is the parent of node v, then r(v) < r(w)

The number of nodes of rank s, $0\leq s \leq \lfoor logn \rfloor + 2$, is at most $\frac{n}{2^{s-2}}$

Therefore, in a sequence $\alpha$ of m union and find operations performed using union-by-size and path compression, starting with a collection of n single-element sets, the total time to perform the operations in $\alpha$ is $O((n+m)\alpha(n))$