Chapter 22 (Disjoint) Sets<br>

However, there is one common data structuring problem that requires us
to store a collection of sets that can change. It is called the Disjoint Sets
problem. In this problem, we have a collection of things and the things are
grouped in to disjoint sets. We’d like to be able to quickly find if two things
are in the same group. On the other hand, we are sometimes told that two
things are in the same group, in which case, we may need change our idea
of the grouping, combining two groups into one.<br>

A data structure to store a collection of disjoint sets is often called
a Union-Find data structure.

22.1 The Disjoint Sets ADT<br>

- union(a, b) Replace the sets containing a and b with a single set
that is their union.
- find(a, b) Return True is a and b are in the same set. Otherwise,
return False.

22.2 A Simple Implementation

In [1]:
class DisjointSetsMapping:
    def __init__(self, L):
        self._map = {item : {item} for item in L}
    
    def find(self, a, b):
        return a in self._map[b]
    
    def union(self, a, b):
        if not self.find(a,b):
            union = self._map[a] | self._map[b]
            for item in union:
                self._map[item] = union

We could just label the items. If two items have
the same label, then they are in the same set. Union just relabels the items
in one of the sets. In this way, the sets themselves are implicit.

In [2]:
class DisjointSetsLabels:
    def __init__(self, L):
        self._label = {item : item for item in L}
    
    def find(self, a, b):
        return self._label[a] is self._label[b]
    
    def union(self, a, b):
        if not self.find(a,b):
            for key, value in self._label.items():
                if value is self._label[b]:
                    self._label[key] = self._label[a]

We can try to make our union
operation much faster by changing fewer labels.
Instead of mapping items to labels, we map them to an item that we’ll
call its parent. If every node has a single parent and there are no loops,
then we get a forest. It is a collection of trees.

In [3]:
class DisjointSetsForest:
    def __init__(self, L):
        self._parent = {item : item for item in L}
    
    def _root(self, item):
        while item is not self._parent[item]:
            item = self._parent[item]
        return item
    
    def find(self, a, b):
        return self._root(a) is self._root(b)
    
    def union(self, a, b):
        if not self.find(a,b):
            self._parent[self._root(b)] = self._root(a)

22.3 Path Compression<br>

If we want to avoid traversing long paths too many times, we can just go
and make them shorter each time we traverse them. A simple way to do this
is just to replace parents with gradnparents as we go up the tree. This only
requires one more line of code. The affect is that the depth of every node
on the path we traverse gets cut in half (plus one). This means that the
longest path can only get traversed O(logn) times before it is compressed
down to a single edge.

In [4]:
# Path Compression halving as we go.
# Every node on the path to root is updated to point to its grandparent.
# single pass
class DisjointSetsPathCompression:
    def __init__(self, L):
        self._parent = {item : item for item in L}
    
    def _root(self, item):
        while item is not self._parent[item]:
            parent = self._parent[item]
            item, self._parent[item] = parent, self._parent[parent]
        return item
    
    def find(self, a, b):
        return self._root(a) is self._root(b)
    
    def union(self, a, b):
        if not self.find(a,b):
            self._parent[self._root(b)] = self._root(a)

In [5]:
# Path compression with two passes.
# Retraverse the path to the root, pointing every node all the way up to the new
# two pass
class DisjointSetsTwoPassPC:
    def __init__(self, L):
        self._parent = {item : item for item in L}
    
    def _root(self, item):
        root = item
        while root is not self._parent[root]:
            root = self._parent[root]
            self._compress(item, root)
        return root
    
    def _compress(self, item, newroot):
        while item is not newroot:
            item, self._parent[item] = self._parent[item], newroot
    
    def find(self, a, b):
        return self._root(a) is self._root(b)
    
    def union(self, a, b):
        if not self.find(a,b):
            self._parent[self._root(b)] = self._root(a)

you may squeeze a little improve-
ment by removing the redundancy involved in calling _ root twice for each
item in the union method

22.4 Merge by Height<br>

Another way you might try to keep paths short is to be just a little more
careful about who gets to be the new root when doing a union operation.
The taller tree should be the new root, Then, the height will not increase
unless you are merging two trees of the same height.

In [6]:
# Merge by height
class DisjointSetsMergeByHeight:
    def __init__(self, L):
        self._parent = {item : item for item in L}
        self._height = {item : 0 for item in L}
    
    def _root(self, item):
        while item is not self._parent[item]:
            item = self._parent[item]
        return item
    
    def find(self, a, b):
        return self._root(a) is self._root(b)
    
    def union(self, a, b):
        if not self.find(a,b):
            if self._height[a] < self._height[b]:
                a,b = b,a
            self._parent[self._root(b)] = self._root(a)
            self._height[a] = max(self._height[a], self._height[b] + 1)

22.5 Merge By Weight<br>

Instead of looking at the heights of the trees, one could look at the number
of nodes in the trees. If one tree has more nodes, maybe it is also taller. The
advantage over merge by height is that this information is not affected by
path compression. Therefore we can (and will soon) combine these tricks.

In [7]:
# Merge by weight
class DisjointSetsMergeByWeight:
    def __init__(self, L):
        self._parent = {item : item for item in L}
        self._weight = {item : 1 for item in L}
    
    def _root(self, item):
        while item is not self._parent[item]:
            item = self._parent[item]
        return item
    
    def find(self, a, b):
        return self._root(a) is self._root(b)
    
    def union(self, a, b):
        if not self.find(a,b):
            if self._weight[a] < self._weight[b]:
                a,b = b,a
            self._parent[self._root(b)] = self._root(a)
            self._weight[a] += self._weight[b]

22.6 Combining Heuristics<br>

As mentioned before, we can use both heuristics, combining merge by weight
and path compression. it turns out that this is very efficient, both in theory
and in practice. The running time of n operations is (as close as you will
ever be able to tell) proportional to n.

In [8]:
# Merge by weight and path compression
class DisjointSets:
    def __init__(self, L):
        self._parent = {item : item for item in L}
        self._weight = {item : 1 for item in L}
    
    def _root(self, item):
        root = item
        while root is not self._parent[root]:
            root = self._parent[root]
        self._compress(item, root)
        return root
    
    def _compress(self, item, newroot):
        while item is not newroot:
            item, self._parent[item] = self._parent[item], newroot
    
    def find(self, a, b):
        return self._root(a) is self._root(b)
    
    def union(self, a, b):
        if not self.find(a,b):
            if self._weight[a] < self._weight[b]:
                a,b = b,a
            self._parent[self._root(b)] = self._root(a)
            self._weight[a] += self._weight[b]

22.7 Kruskall’s Algorithm<br>

It is natural to think about the union and find operations in terms of
graphs. That is, you can think of union as adding an edge to a graph and
find as answering if two vertices are connected. This is a useful perspective
and it naturally leads to an algorithm for computing minimum spanning
trees. The idea is simple, sort the edges by weight. Then try to add in the
edges, one at a time, as long as the edge doesn’t form a cycle. The result
will be an MST.

In [9]:
from ds2.disjointsets import DisjointSets

def kruskall(G):
    V = list(G.vertices())
    UF = DisjointSets(V)
    edges = sorted(G.edges(), key = lambda e :G.wt(*e))
    T = Graph(V, set())
    for u, v in edges:
        if not UF.find(u, v):
            UF.union(u, v)
            T.addedge(u, v)
    return T

In [10]:
from ds2.graph import Graph

G = Graph({1,2,3,4,5},
        {(1,2,1),
        (1,3,4),
        (2,3,2),
        (2,4,4),
        (5,3,1),
        })
MST = kruskall(G)
print(list(MST.edges()))

[frozenset({2, 4}), frozenset({2, 3}), frozenset({3, 5}), frozenset({1, 2})]
