##Union-Find Data Structure

Let's do our first data structure: Union-Find and use this with some of that sparse data matrix thing.

I guess for this one it's easiest to use object oriented programming

"In some applications we wish to know simply whether or not a vertex x is connected to a vertex y in a graph; the actual path connecting them may not be relevant. The efficient algorithms that have been developed are of independent interest because they can also be used for processing sets (collections of objects)."

"Graphs correspond to sets of objects in a natural way: vertices correspond to objects and edges mean 'is in the same set as.' Each connected component corresponds to a different set. For sets, we're interested in the fundamental question 'is x in the same set as y?' This clearly corresponds to the fundamental graph question 'is vertex x connected to vertex y?'"

"Our objective is to write a function that can check if two vertices x and y are in the same set (or, in the graph representation, the same connected component) and, if not, can put them in the same set (put an edge between them in the graph). From the correspondence with the set problem, the addition of a new edge is called a <i>union</i> operation and the queries are called <i>find</i> operations."

"Instead of building a direct adjacency-list or other representation of the graph, we'll gain efficiency by using an internal structure specifically oriented towards supporting the <i>union</i> and <i>find</i> operations. This internal structure will be a <i>forest of trees</i>, one for each connected component. We need to be able to find if out if two vertices belong to the same tree and to be able to combine two trees into one."

"It must be emphasized that the only relationship between these union-find trees and the underlying graph with the given edges is that they divide the vertices into sets in the same way. There is no correspondence between the paths that connect nodes in the tree and the paths that connect nodes in the graph."

One thing this means in practical purposes is that if two nodes are already connected, even if it is by a very long path, and if you add a new edge between them, i.e. union the two vertices, the union algorithm will do no extra work, it will simply notice that they are already connected and stop. 

In [1]:
class Graph(object):
    """Undirected Graph with Union-Find Data Structure"""
    
    def __init__(self, n):
        if(n < 1):
            raise IndexError("n must be at least one")
        if(type(n) != int):
            raise KeyError("n must be an integer")
        self.order = n
        self.points_to = [i for i in range(self.order)] #To keep track of the unions   
        
        #We won't be using the following two for union-find, but they're useful in general
        self.size = 0
        self.edges = {}
        
    def find(self, x):
        """Here we do find"""
        if(x >= self.order):
            raise IndexError("x must be less than n")
        out = x
        while(True):
            if(out == self.points_to[out]):
                break
            else:
                out = self.points_to[out]
        return out         
        
    def union(self, x, y, weight = 1):
        """Here we do union"""
        if(x >= self.order or y >= self.order):
            raise IndexError("x and y must be less than n")

        y_root = self.find(y)
        x_root = self.find(x)
        if(x_root != y_root):
            self.points_to[y_root] = x_root
        
            
        self.edges[(x,y)] = weight     
        self.edges[(y,x)] = weight
        self.size += weight
            
    def connected(self):
        """Checks to see if the graph is all connected"""
        out = True
        root = self.points_to[self.order - 1]
        for i in range(self.order - 1):
            if(self.find(i) != root):
                out = False
                break
        return out
        
        

G = Graph(5) 

G.union(0,1)
G.union(1,2)
G.union(2,3)

print(G.connected())

G.union(3,4)
G.union(4,0)

print(G.connected())
G.points_to

False
True


[0, 0, 0, 0, 0]

"The algorith described above has bad worst-case performance because the trees formed because the trees can be degenerate. " For example, just by switching the numbers as below, we can see that this " produces a long chain with 0 pointing to 1, 1 pointing to 2, etc."

In [2]:
G = Graph(5) 

G.union(1,0)
G.union(2,1)
G.union(3,2)

print(G.connected())

G.union(4,3)
#G.union(0,4)

print(G.connected())
G.points_to

False
True


[1, 2, 3, 4, 4]

"This kind of structure takes time proportional to $V^2$ to build, and has time proportional to V for an average equivalence test."

"Several methods have been suggested to deal with this problem. When a tree rooted at i is to be merged with a tree rooted at j, one of the nodes must remain a root and the other (and all its descendants) must go one level down the tree. To minimize the distance to the root for the most nodes, it makes sense to take as the root the node with more descendants. This idea is called <i>weight balancing</i>.

Also, "Ideally, we would like every node to point directly to the root of its tree.... We can approach the ideal by making all the nodes we do examine point to the root. This method is called <i>path compression</i>"

In [6]:
%pylab inline
from numpy.random import choice
#Let's do a square lattice in 2-d
def make_sq_lattice(n):
    nn = n**2
    out = [[0 for i in range(nn)] for j in range(nn)]
    for i in range(n):
        for j in range(n):
            if(j+1 < n):
                out[n*i+j][n*i+j+1] = 1
            if(j-1 >= 0):
                out[n*i+j][n*i+j-1] = 1
            if(i+1 < n):
                out[n*i+j][n*(i+1)+j] = 1
            if(i-1 >= 0):
                out[n*i+j][n*(i-1)+j] = 1
    return array(out)
G = make_sq_lattice(4)
G

Populating the interactive namespace from numpy and matplotlib


array([[0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [1, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 1, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0],
       [1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0],
       [0, 1, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0],
       [0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0],
       [0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0],
       [0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0],
       [0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0],
       [0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 1, 0],
       [0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1],
       [0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0]])

In [7]:
n = 4
G = make_sq_lattice(n)
state = [0 for i in range(n**2)]
#let's put 120 heat units on state 0
state[0] = 120
#state = array([ 0,  3,  0, 11,  1,  0, 15,  0,  0, 25,  0, 26,  6,  0, 33,  0])

def diffuse(state, steps):
    for s in range(steps):
        new_state = [0 for i in range(n**2)]
        for i in range(n**2):
            while(state[i]>0):
                state[i] -= 1
                next_pos = choice(len(G), p = G[i]/sum(G[i]))
                new_state[next_pos] += 1
        state = new_state 
        
        
    return array(state)    

diffuse(state, 5)

array([ 0, 27,  0,  9, 22,  0,  7,  0,  0, 27,  0,  9, 16,  0,  3,  0])