# Graphs

## 1. Disjoint Set

“disjoint set” data structure, also known as the “union-find” data structure. Note that others might refer to it as an algorithm. 

The primary use of disjoint sets is to address the connectivity between the components of a network. The “network“ here can be a computer network or a social network. For instance, we can use a disjoint set to determine if two people share a common ancestor.

**Implementation:**
1. The find function, finds the root node of a given vertex. 
2. The union function, unions two vertices and makes their root nodes the same.


**Quick Find:**  the time complexity of the find function will be 
O(1). However, the union function will take more time with the time complexity of 
O(N), N is the number of vertices in the graph. We need 
O(N) space to store the array of size N.

**Quick Union:** compared with the Quick Find implementation, the time complexity of the union function is better. Meanwhile, the find function will take more time in this case. The time complexity of the find function is O(N), and of the union function is O(N). 

### 1.1. QuickFind - Disjointset
The idea is to define a root list to store the roots of each subset. 
Nodes in the subset can be connected to one another or they have their own parents. However, they all have one root node, which will be stored in the root list. 
The find function can efficiently return the value of each vertext in the list. 
However, the union function is a bit more complicated. 
If two nodes have the same root, they are in the same subset. We don't need to combine them. 
If they have different nodes, we need to connect their nodes. 
To do so, we iterate over all nodes in the root list. If the root of a node is identical to the root of the one of the subsets, we update its root with the root of the other subset.

In [3]:
# UnionFind class
class UnionFind:
    def __init__(self, size):
        self.root = [i for i in range(size)] # we keep the root node of each node here.

    def find(self, x):
        return self.root[x]
    
    def union(self, x, y):
        rootX = self.find(x)
        rootY = self.find(y)
        if rootX != rootY:
            for i in range(len(self.root)):
                if self.root[i] == rootY:
                    self.root[i] = rootX

    def connected(self, x, y):
        return self.find(x) == self.find(y)
    
# Test Case
uf = UnionFind(10)
# 1-2-5-6-7 3-8-9 4
uf.union(1, 2)
uf.union(2, 5)
uf.union(5, 6)
uf.union(6, 7)
uf.union(3, 8)
uf.union(8, 9)
print(uf.connected(1, 5))  # true
print(uf.connected(5, 7))  # true
print(uf.connected(4, 9))  # false
# 1-2-5-6-7 3-8-9-4
uf.union(9, 4)
print(uf.connected(4, 9))  # true

True
True
False
True


### 1.2. Quick Union - Disjoint Set
The idea in the quick union is that the root list contains parents of each node. 
At the begining the parent each of node is iteself. 
The find function then needs to itereate over all parents of a node to find the root of its subset. 
The union function is instead more efficient. 
To combine two subsets, we find their root nodes. 
We select one of the subsets.
Then we update the parent of the other supset in the root list with the root of the selected subset. 



In [4]:
# UnionFind class
class UnionFind:
    def __init__(self, size):
        self.root = [i for i in range(size)]
    def find(self, x):
        while x != self.root[x]:
            x = self.root[x]
        return x

    def union(self, x, y):
        rootX = self.find(x)
        rootY = self.find(y)
        if rootX != rootY:
            self.root[rootY] = rootX

    def connected(self, x, y):
        return self.find(x) == self.find(y)


# Test Case
uf = UnionFind(10)
# 1-2-5-6-7 3-8-9 4
uf.union(1, 2)
uf.union(2, 5)
uf.union(5, 6)
uf.union(6, 7)
uf.union(3, 8)
uf.union(8, 9)
print(uf.connected(1, 5))  # true
print(uf.connected(5, 7))  # true
print(uf.connected(4, 9))  # false
# 1-2-5-6-7 3-8-9-4
uf.union(9, 4)
print(uf.connected(4, 9))  # true

True
True
False
True


### 3. Union by Rank - Disjoint Set
In both quick find and quick union, when we want to combine two subsets, we choose one root node as the parent randomly. This random selection is always a point to improve algorithms. To do so, we need to use some heuristics. In the Disjoint data structure, it's more efficient if we choose the root of a subset with the larger height. Therefore, the subset with smaller height will become a child of the subset with larger height. 

Time complexity

1) Overall : O(N)

2) Find time complexity: O(logN)

3) Union time complexity: O(logN)

4) Connected time complexity: O(logN)

Space complexity
1) Overall: O(N)


In [5]:
# UnionFind class
class UnionFind:
    def __init__(self, size):
        self.root = [i for i in range(size)]
        self.rank = [1] * size

    def find(self, x):
        while x != self.root[x]:
            x = self.root[x]
        return x
    
    def union(self, x, y):
        rootX = self.find(x)
        rootY = self.find(y)
        if rootX != rootY:
            if self.rank[rootX] > self.rank[rootY]:
                self.root[rootY] = rootX
            elif self.rank[rootX] < self.rank[rootY]:
                self.root[rootX] = rootY
            else:
                self.root[rootY] = rootX
                self.rank[rootX] += 1

    def connected(self, x, y):
        return self.find(x) == self.find(y)


# Test Case
uf = UnionFind(10)
# 1-2-5-6-7 3-8-9 4
uf.union(1, 2)
uf.union(2, 5)
uf.union(5, 6)
uf.union(6, 7)
uf.union(3, 8)
uf.union(8, 9)
print(uf.connected(1, 5))  # true
print(uf.connected(5, 7))  # true
print(uf.connected(4, 9))  # false
# 1-2-5-6-7 3-8-9-4
uf.union(9, 4)
print(uf.connected(4, 9))  # true

True
True
False
True


### 4. Path Compression - Disjoint Set
The main idea here is to imporve the performance of the find function. If we want to find the root of a node multiple times, then we should go through the path from that node to the root multiple times. So after finding the root node, we can update the parent node of all traversed elements to their root node. The best way to implement this idea is recursion because we want to run one function on a subsequent of inputs.

Time complexity

1) Overall : O(N)

2) Find time complexity: O(logN)

3) Union time complexity: O(logN)

4) Connected time complexity: O(logN)



In [7]:
# UnionFind class
class UnionFind:
    def __init__(self, size):
        self.root = [i for i in range(size)]
        self.rank = [1] * size
        # Use a rank array to record the height of each vertex, i.e., the "rank" of each vertex.
        # The initial "rank" of each vertex is 1, because each of them is
        # a standalone vertex with no connection to other vertices.
        

    # The find function here is the same as that in the disjoint set with path compression.
    def find(self, x):
        if x == self.root[x]:
            return x
        self.root[x] = self.find(self.root[x])
        return self.root[x]

    # The union function with union by rank
    def union(self, x, y):
        rootX = self.find(x)
        rootY = self.find(y)
        if rootX != rootY:
            if self.rank[rootX] > self.rank[rootY]:
                self.root[rootY] = rootX
            elif self.rank[rootX] < self.rank[rootY]:
                self.root[rootX] = rootY
            else:
                self.root[rootY] = rootX
                self.rank[rootX] += 1

    def connected(self, x, y):
        return self.find(x) == self.find(y)


# Test Case
uf = UnionFind(10)
# 1-2-5-6-7 3-8-9 4
uf.union(1, 2)
uf.union(2, 5)
uf.union(5, 6)
uf.union(6, 7)
uf.union(3, 8)
uf.union(8, 9)
print(uf.connected(1, 5))  # true
print(uf.connected(5, 7))  # true
print(uf.connected(4, 9))  # false
# 1-2-5-6-7 3-8-9-4
uf.union(9, 4)
print(uf.connected(4, 9))  # true

True
True
False
True


## Applications of Disjoint Set

1) **Finding the number of partitions (connected subgraphs) in a graph**
    
    **idea:** use count variable and decrease it by 1 if do a union
    
2) **Finding a loop in a graph (is the graph a tree?)**
    
    **idea:**  if the condition in union does not work, it means you are trying to connect two nodes that are already connected. So you found a loop. 
    
3)     

## Problem: Number of Provinces

There are ``n`` cities. Some of them are connected, while some are not. If city ``a`` is connected directly with city ``b``, and city ``b`` is connected directly with city ``c``, then city ``a`` is connected indirectly with city ``c``.

A province is a group of directly or indirectly connected cities and no other cities outside of the group.

You are given an ``n x n`` matrix isConnected, where ``isConnected[i][j] = 1`` if the ith city and the jth city are directly connected, and isConnected[i][j] = 0 otherwise.

Return the total number of provinces.


```
Input: isConnected = [[1,1,0],[1,1,0],[0,0,1]]
Output: 2
```

In [21]:
# solution
from typing import List
class Province:
    def __init__(self,n):
        self.root = [i for i in range(n)]
    
    def find(self, x):
        while x!= self.root[x]:
            x = self.root[x]
        return x
    
    def union(self, x, y):
        root_of_x = self.find(x)
        root_of_y = self.find(y)
        if root_of_x != root_of_y:
            self.root[root_of_x]= root_of_y

    def connected(self, x, y):
        return self.find(x)== self.find(y)
    
    def number_of_roots(self):
        count = 0
        roots = set()
        for i in range(len(self.root)):
            root_of_i = self.find(i)
            if root_of_i not in roots:
                roots.add(root_of_i)
        return len(roots)
    
class Solution:
    def __init(self):
        pass
    def findCircleNum(self, isConnected: List[List[int]]) -> int:
        n = len(isConnected)
        province = Province(n)
        for i in range(n):
            for j in range(i, n):
                if isConnected[i][j] == 1:
                    province.union(i,j)
        return province.number_of_roots()
        
# test case
isConnected= [[1,1,0],[1,1,0],[0,0,1]]
solution = Solution()
num_of_province = solution.findCircleNum(isConnected)
print(f"num_of_province: {num_of_province}, reference: 2")



isConnected = [[1,0,0,1],[0,1,1,0],[0,1,1,1],[1,0,1,1]]
solution = Solution()
num_of_province = solution.findCircleNum(isConnected)
print(f"num_of_province: {num_of_province}, reference: 1")


num_of_province: 2, reference: 2
num_of_province: 1, reference: 1


In [24]:
# Solution 2: using union by rank and path compression
class Province():
    def __init__(self, n):
        self.root = [i for i in range(n)]
        self.rank = [1] * n
        self.count = n # number of provinces at the begining is equal to the number of nodes
        
    def find(self, x): # path compression
        if x == self.root[x]:
            return x
        self.root[x] = self.find[self.root[x]] 
        return self.root[x]
    def union(self, x, y):
        rootX = self.find(x)
        rootY = self.find(y)
        if rootX != rootY:
            if self.rank[rootX] < self.rank[rootY]:
                self.root[rootX] = rootY
            elif self.rank[rootY] < self.rank[rootX]:
                self.root[rootY]= rootX
            else:
                self.root[rootX] = rootY
                self.rank[rootY] += 1
            self.count -= 1

class Solution:
    def __init(self):
        pass
    def findCircleNum(self, isConnected: List[List[int]]) -> int:
        n = len(isConnected)
        province = Province(n)
        for i in range(n):
            for j in range(i, n):
                if isConnected[i][j] == 1:
                    province.union(i,j)
        return province.count
        
# test case
isConnected= [[1,1,0],[1,1,0],[0,0,1]]
solution = Solution()
num_of_province = solution.findCircleNum(isConnected)
print(f"num_of_province: {num_of_province}, reference: 2")



isConnected = [[1,0,0,1],[0,1,1,0],[0,1,1,1],[1,0,1,1]]
solution = Solution()
num_of_province = solution.findCircleNum(isConnected)
print(f"num_of_province: {num_of_province}, reference: 1")

num_of_province: 2, reference: 2
num_of_province: 1, reference: 1


# Problem: Graph Valid Tree

You have a graph of ``n`` nodes labeled from ``0`` to ``n - 1``. You are given an integer ``n`` and a list of edges where ``edges[i] = [ai, bi]`` indicates that there is an undirected edge between nodes ``ai`` and ``bi`` in the graph.

Return true if the edges of the given graph make up a valid tree, and false otherwise.




In [27]:
from typing import List
class DS():
    def __init__(self, n):
        self.root = [i for i in range(n)]
        self.rank = [1] * n
    
    def find(self, x):
        if x == self.root[x]:
            return x
        self.root[x] = self.find(self.root[x])
        return self.root[x]
    
    def union(self, x, y):
        rootX = self.find(x)
        rootY = self.find(y)
        if rootX != rootY:
            if self.rank[rootX] < self.rank[rootY]:
                self.root[rootX] = rootY
            elif self.rank[rootY] < self.rank[rootX]:
                self.root[rootY] = rootX
            else:
                self.root[rootX] = rootY
                self.rank[rootY] += 1
                
            return False
        else:
            return True

class Solution:
    def validTree(self, n: int, edges: List[List[int]]) -> bool:
        ds = DS(n)
        if len(edges)!= n-1:
            return False
        for u,v in edges:
            if ds.union(u,v):
                return False
        return True
            

# Problem: The Earliest Moment When Everyone Become Friends

There are ``n`` people in a social group labeled from ``0`` to ``n - 1``. 
You are given an array logs where ``logs[i] = [timestampi, xi, yi]`` indicates that ``xi`` and ``yi`` will be friends at the time ``timestampi``.

Friendship is symmetric. That means if ``a`` is friend with ``b``, then ``b`` is friends with a. Also, person a is acquainted with a person b if a is friends with b, or a is a friend of someone acquainted with b.

Return the earliest time for which every person became acquainted with every other person. If there is no such earliest time, return -1.

 ```
Logs = [[0,2,0],[1,0,1],[3,0,3],[4,1,2],[7,3,1]], n = 4

Output: 3
 ```

In [23]:
# sort wrt time
# keep track of time
# union the nodes one by one
    # if nodes are not from the same group, then combine their groups + update the time
    # if nodes are from the same group, just pass 
#
from typing import List
class DS():
    def __init__(self, n):
        self.root = [i for i in range(n)]
        self.rank = [1] * n
        self.time = -1
        self.count_groups = n

    def find(self, x):
        if x == self.root[x]:
            return x
        self.root[x] = self.find(self.root[x])
        return self.root[x]
    
    def union(self, x, y, t):
        rootX = self.find(x)
        rootY = self.find(y)
        if rootX != rootY: 
            if self.rank[rootX] < self.rank[rootY]:
                self.root[rootX] = rootY
            elif self.rank[rootY] < self.rank[rootX]:
                self.root[rootY] = rootX
            else:
                self.root[rootX] = rootY
                self.rank[rootY] += 1
            self.time = t
            self.count_groups -= 1
            
            
        
class Solution:
    def earliestAcq(self, logs: List[List[int]], n: int) -> int:
        
        logs = sorted(logs, key=lambda x:x[0])
        
        ds = DS(n)
        
        for [t, u, v] in logs:
            
            ds.union(u,v,t)
        
        # check the number groups
        if ds.count_groups == 1:
            return ds.time
        else:
            print(ds.root)
            return -1       
        

        
        

In [24]:
# Test case
sl = Solution()
sl.earliestAcq(logs = [[20190101,0,1],
                       [20190104,3,4],
                       [20190107,2,3],
                       [20190211,1,5],
                       [20190224,2,4],
                       [20190301,0,3],
                       [20190312,1,2],
                       [20190322,4,5]]
, n = 6)



sl.earliestAcq(logs = [[9,3,0],
                       [0,2,1],
                       [8,0,1],
                       [1,3,2],
                       [2,2,0],
                       [3,3,1]
                      ], 
               n = 4)


2

# Problem: Smallest String With Swaps

You are given a string ``s``, and an array of pairs of indices in the string pairs where ``pairs[i] = [a, b]`` indicates 2 indices(0-indexed) of the string.

You can swap the characters at any pair of indices in the given pairs any number of times.

Return the lexicographically smallest string that ``s`` can be changed to after using the swaps.

```
Input: s = "dcab", pairs = [[0,3],[1,2]]
Output: "bacd"
Explaination: 
Swap s[0] and s[3], s = "bcad"
Swap s[1] and s[2], s = "bacd"


Input: s = "dcab", pairs = [[0,3],[1,2],[0,2]]
Output: "abcd"
Explaination: 
Swap s[0] and s[3], s = "bcad"
Swap s[0] and s[2], s = "acbd"
Swap s[1] and s[2], s = "abcd"

Input: s = "cba", pairs = [[0,1],[1,2]]
Output: "abc"
Explaination: 
Swap s[0] and s[1], s = "bca"
Swap s[1] and s[2], s = "bac"
Swap s[0] and s[1], s = "abc"

```




In [63]:
# group items that are related to each other 
# sort items in each group
class DS():
    def __init__(self, n):
        self.root = [i for i in range(n)]
        self.rank = [1] * n
        
        
    def find(self, x):
        if x == self.root[x]:
            return x
        self.root[x] = self.find(self.root[x])
        return self.root[x]
    
    def union(self, x, y):
        rootX = self.find(x)
        rootY = self.find(y)
        if rootX != rootY:
            if self.rank[rootX] < self.rank[rootY]:
                self.root[rootX] = rootY
            elif self.rank[rootY] < self.rank[rootX]:
                self.root[rootY] = rootX
            else:
                self.root[rootX] = rootY
                self.rank[rootY] += 1
            
        
class Solution:
    def smallestStringWithSwaps(self, s: str, pairs: List[List[int]]) -> str:
        ds = DS(len(s))
        for [u,v] in pairs:
            ds.union(u,v)
        output = ''
        group_ids = [ds.find(i) for i in range(len(s))]
       
        groups = {}
        for inx, char in enumerate(s):
            group_id = group_ids[inx]
            if group_id in groups:
                groups[group_id].append(char)
            else:
                groups[group_id] = [char]

        for group_id, group in groups.items():
            groups[group_id] = sorted(groups[group_id])
        
        for i in range(len(s)):
            group_id = group_ids[i]
            char = groups[group_id].pop(0)
            output += char
        return output


In [64]:
# Test case
sl = Solution()
sl.smallestStringWithSwaps(s = "dcab", pairs = [[0,3],[1,2]]) # ->"bacd"
sl.smallestStringWithSwaps(s = "dcab", pairs = [[0,3],[1,2],[0,2]]) # ->"abcd"
sl.smallestStringWithSwaps(s = "cba", pairs = [[0,1],[1,2]]) # ->"abc"



'abc'

# Problem:   Evaluate Division
You are given an array of variable pairs equations and an array of real numbers values, where ``equations[i] = [Ai, Bi]`` and ``values[i]`` represent the equation ``Ai / Bi = values[i]``. Each ``Ai`` or ``Bi`` is a string that represents a single variable.

You are also given some queries, where ``queries[j] = [Cj, Dj]`` represents the jth query where you must find the answer for ``Cj / Dj = ?``.

Return the answers to all queries. If a single answer cannot be determined, return ``-1.0``.

Note: The input is always valid. You may assume that evaluating the queries will not result in division by zero and that there is no contradiction.

```
equations = [["a","b"],["b","c"]], 
values = [2.0,3.0], 
queries = [["a","c"],["b","a"],["a","e"],["a","a"],["x","x"]]
output = [6.00000,0.50000,-1.00000,1.00000,-1.00000]


equations = [["a","b"],["b","c"],["bc","cd"]], 
values = [1.5,2.5,5.0], 
queries = [["a","c"],["c","b"],["bc","cd"],["cd","bc"]]
Output: [3.75000,0.40000,5.00000,0.20000]

equations = [["a","b"]], 
values = [0.5], 
queries = [["a","b"],["b","a"],["a","c"],["x","y"]]
output = [0.50000,2.00000,-1.00000,-1.00000]
```

In [81]:
# 1. There is any path between two variables in a query -> disjoint set
# 2. If so, we evaluate the query
# how? Let's assume that if we go in the edge direction we * the weight. If we go the oposite d
# direction we divide the weight. For any root we need to we should ensure that the weight of 
# direct edges remain constant.
class DS():
    def __init__(self, n):
        self.root = [i for i in range(n)]
        self.rank = [1] * n
        self.weight = [1] * n
        
    def find(self, x):
        if x == self.root[x]:
            return x
        self.root[x] = self.find(self.root[x])
        return self.root[x]

    def union(self, x, y, w):
        rootX = self.find(x)
        rootY = self.find(y)
        if rootX != rootY:
            if self.rank[rootX] < self.rank[rootY]:
                self.root[rootX] = rootY
                self.weight[rootX] = (self.weight[rootY] * w * self.weight[y]) /self.weight[x]
                self.udpate_weight(rootX, self.weight[rootX])
                
            elif self.rank[rootY] < self.rank[rootX]:
                self.root[rootY] = rootX
                self.weight[rootY] = (self.weight[rootX] / w / self.weight[y]) *self.weight[x]
                self.udpate_weight(rootY, self.weight[rootY])
            else:
                self.root[rootX] = rootY
                self.rank[rootY] += 1
                self.weight[rootX] = (self.weight[rootY] * w * self.weight[y]) /self.weight[x]
                self.udpate_weight(rootX, self.weight[rootX])
                
    def udpate_weight(self, x, w):
        for i in range(len(self.root)):
            if self.root[i]==x:
                self.weight[i] = self.weight[i]*w

from typing import List            
class Solution:
    def calcEquation(self, 
                     equations: List[List[str]], 
                     values: List[float], 
                     queries: List[List[str]]) -> List[float]:
        variables = {}
        for [x,y] in equations:
            if x not in variables:
                variables[x] = len(variables)
            if y not in variables:
                variables[y] = len(variables)
        n = len(variables)
        ds = DS(n)
        for inx, [x,y] in enumerate(equations):
            ds.union(variables[x],variables[y],values[inx])

        output = []
        for [x, y] in queries:
            if x not in variables or y not in variables:
                output.append(-1)
                continue
            x_id = variables[x]
            y_id = variables[y]
            if ds.find(x_id) != ds.find(y_id):
                output.append(-1)
                continue
            else:
                output.append(ds.weight[x_id]/ds.weight[y_id])
        return output     

In [82]:
# test case
sl = Solution()

equations = [["a","b"]]
values = [0.5]
queries = [["a","b"],["b","a"],["a","c"],["x","y"]]
output = [0.50000,2.00000,-1.00000,-1.00000]
print(sl.calcEquation(equations, values, queries))


equations = [["a","b"],["b","c"]]
values = [2.0,3.0]
queries = [["a","c"],["b","a"],["a","e"],["a","a"],["x","x"]]
output = [6.00000,0.50000,-1.00000,1.00000,-1.00000]
print(sl.calcEquation(equations, values, queries))

equations = [["a","b"],["b","c"],["bc","cd"]]
values = [1.5,2.5,5.0]
queries = [["a","c"],["c","b"],["bc","cd"],["cd","bc"]]
Output: [3.75000,0.40000,5.00000,0.20000]
print(sl.calcEquation(equations, values, queries))


equations = [["x1","x2"],["x2","x3"],["x3","x4"],["x4","x5"]]
values = [3.0,4.0,5.0,6.0]
queries = [["x1","x5"],["x5","x2"],["x2","x4"],["x2","x2"],["x2","x9"],["x9","x9"]]
Output:[360.00000,0.00833,20.00000,1.00000,-1.00000,-1.00000]
print(sl.calcEquation(equations, values, queries))


equations = [["a","b"],["e","f"],["b","e"]]
values =  [3.4,1.4,2.3]
queries = [["b","a"],["a","f"],["f","f"],["e","e"],["c","c"],["a","c"],["f","e"]]
Output: [0.29412,10.94800,1.00000,1.00000,-1.00000,-1.00000,0.71429]
print(sl.calcEquation(equations, values, queries))

equations = [["a","b"],["c","b"],["d","b"],["w","x"],["y","x"],["z","x"],["w","d"]]
values =  [2.0,3.0,4.0,5.0,6.0,7.0,8.0]
queries = [["a","c"],["b","c"],["a","e"],["a","a"],["x","x"],["a","z"]]
Output: [0.66667,0.33333,-1.00000,1.00000,1.00000,0.04464]
print(sl.calcEquation(equations, values, queries))
    

[0.5, 2.0, -1, -1]
[6.0, 0.5, -1, 1.0, -1]
[3.75, 0.4, 5.0, 0.2]
[360.0, 0.008333333333333333, 20.0, 1.0, -1, -1]
[0.29411764705882354, 10.947999999999999, 1.0, 1.0, -1, -1, 0.7142857142857143]
[0.6666666666666666, 0.3333333333333333, -1, 1.0, 1.0, 0.04464285714285714]
