### Graph Representation
A graph is represented by  
$$G = (V, E)$$  
Here, $V$ is a set of all vertices  
$$V = \{v_0, v_1, ..., v_k \}$$  
And, $E$ is set of ordered pairs of vertices called as edge. An edge is represented as  
$$(i, j)\ where\ i,j\ \in V$$  
Therefore,  
$$E = \{(v_a, v_b), ..., (v_x, v_y)\}$$  
A path is a sequence of vertices $v_0, v_1, ..., v_k$ where for every $i \in \{0, 1, ..., k\}$, the edge $(v_{i-1}, v_i) \in E$. This path is cyclic if $(v_k, v_0)$ is also present in $E$.  

Also, let $n$ be the number of vertices and $m$ be the number of edges.

### Common Operations
- **addEdge(i, j):** add $(i,j)$ to $E$
- **removeEdge(i, j):** remove $(i,j)$ from $E$
- **hasEdge(i, j):** check if edge $(i,j) \in E$
- **inEdge(i):** return set of all $j$ where $(j,i) \in E$
- **outEdge(i):** return set of all $j$ where $(i,j) \in E$

### Adjacency Matrix
Adjacency Matrix is graph representation through a matrix. We create a $n\times n$ boolean matrix. If `array[i][j] == true` this means that $(i,j) \in E$. In case of a undirected graph, the matrix will be symmetrical along the diagonal.
![Adj Matrix](images/qXMwUGq.png)

```java
private int n;
private boolean[][] matrix;

public AdjacencyMatrix(int n) {
    this.n = n;
    matrix = new boolean[n][n];
}

public void addEdge(int i, int j) {
    matrix[i][j] = true;
}

public void removeEdge(int i, int j) {
    matrix[i][j] = false;
}

public boolean hasEdge(int i, int j) {
    return matrix[i][j];
}
```

All the above operations take $O(1)$ time. Adjacency matrix however performs poorly for operations `inEdge` and `outEdge`. It takes $O(V)$ time for both
```java
List<Integer> inEdge(int i){
    List<Integer> list = new ArrayList<Integer>();
    for(int j=0; j<n; j++) {
        if(matrix[j][i])
            list.add(j);
    }
    return list;
}

List<Integer> outEdge(int i){
    List<Integer> list = new ArrayList<Integer>();
    for(int j=0; j<n; j++) {
        if(matrix[i][j])
            list.add(j);
    }
    return list;
}
```

The space used by matrix is $O(n^2)$.

### Adjacency Matrix Property
Let us represent adjacency matrix as $A$.  
![adj matrix](images/k54OSd6.png)  

$$A= \left(\begin{matrix}0&1&0&1&1\\0&0&0&1&0\\0&0&0&0&1\\0&0&0&0&0\\0&1&0&0&0\end{matrix}\right)$$  
$A^2$ represents the matrix of number of 2 length edges from $i$ to $j$.  
$$A^2= \left(\begin{matrix}0&1&0&1&0\\0&0&0&0&0\\0&1&0&0&0\\0&0&0&0&0\\0&0&0&1&0\end{matrix}\right)$$  
Similarly, $A^3$ represents the matrix of number of 3 length edges from $i$ to $j$.  
$$A^3= \left(\begin{matrix}0&0&0&1&0\\0&0&0&0&0\\0&0&0&1&0\\0&0&0&0&0\\0&0&0&0&0\end{matrix}\right)$$  

![adj matrix 2](images/07OflVc.png)  

$$A= \left(\begin{matrix}0&1&1&1\\1&0&1&1\\1&1&0&1\\1&1&1&0\end{matrix}\right)$$  
$$A^3= \left(\begin{matrix}6&7&7&7\\7&6&7&7\\7&7&6&7\\7&7&7&6\end{matrix}\right)$$  

This property can help us identify if there is any cycle in a directed graph or not. We calculate $A^k$, where $n=2,...,n$. If we find that any diagonal element of the matrix is non-zero, we can conclude that the graph is cyclic.
```java
public boolean isCyclic() {
    SimpleMatrix m = new SimpleMatrix(convert(matrix));
    SimpleMatrix mPower = m;
    for(int i=2; i<=n; i++) {
        mPower = mPower.mult(m);
        if(mPower.trace() != 0.0)
            return true;
    }
    return false;
}

private double[][] convert(boolean[][] input) {
    double[][] result = new double[n][n];
    for(int i=0; i<n; i++) {
        for(int j=0; j<n; j++) {
            if(input[i][j])
                result[i][j] = 1.0;
            else
                result[i][j] = 0.0;
        }
    }
    return result;
}
```

### Adjacency List
In a adjacency list we maintain a list for every vertex. This vertex contains the list of all vertices connected to it.  
![Adj List](images/vjeEDHu.png)

```java
private int n;
List<Integer>[] adj;

public AdjacencyList(int n) {
    adj = (List<Integer>[]) new List[n];
    for(int i=0; i<n; i++) {
        adj[i] = new ArrayList<Integer>();
    }
}

public void addEdge(int i, int j) {
    adj[i].add(j);
}

public void removeEdge(int i, int j) {
    Iterator<Integer> iterator = adj[i].iterator();
    while(iterator.hasNext()) {
        if(iterator.next() == j) {
            iterator.remove();
            return;
        }
    }
}

public boolean hasEdge(int i, int j) {
    return adj[i].contains(j);
}

public List<Integer> inEdges(int i) {
    List<Integer> list = new ArrayList<Integer>();
    for(int j=0; j<n; j++) {
        if(adj[j].contains(i))
            list.add(j);
    }
    return list;
}

public List<Integer> outEdges(int i) {
    return adj[i];
}
```

addEdge takes $O(1)$ time  
removeEdge takes $O(deg(i))$ time, where $deg(i)$ counts the number of edges in $E$ that have $i$ as their source  
inEdges takes $O(n+m)$ time  
outEdges takes $O(1)$ time  

Space complexity is $O(n+m)$

### Graph Traversal
**Breadth First Search** for a graph is generalization of the algorithm for a BST.
```java
public void bfs(int i, IntConsumer c) {
    // this below array required because a node can
    // be reached in more than way
    boolean[] seen = new boolean[n];
    List<Integer> q = new ArrayList<Integer>();
    q.add(i);
    seen[i] = true;
    while(!q.isEmpty()) {
        int value = q.remove(0);
        c.accept(value);
        for(Integer x: outEdges(value)) {
            if(!seen[x]) {
                q.add(x);
                seen[x] = true;
            }
        }
    }
}
```
For the example above the sequence is `0->1->3->4->2->5->6`. Breadth first traversal gives us the shortest path between two nodes in an undirected graph.

**Depth First Search** we divide graph vertices into three type: GRAY(currently being visited), WHITE(not visited) and BLACK(finished visiting).
```java
private final int GRAY = -1;
private final int WHITE = 0;
private final int BLACK = 1;
public void dfs(int i, byte[] color, IntConsumer c) {
    color[i] = GRAY;
    for(Integer j: outEdges(i)) {
        if(color[j] == WHITE) {
            dfs(j, color, c);
        }
    }
    color[i] = BLACK;
}
```
For the above example, the sequence is `6->2->1->3->5->4->0`. We need not make use of colors for doing DFS. The below algorithm illustrates this:

```java
public void dfs(int i, boolean[] visited, IntConsumer c) {
    visited[i] = true;
    for(Integer j: outEdges(i)) {
        if(visited[j] == false) {
            dfs(j, visited, c);
        }
    }
}
```

Use of three colors may be required in some algorithms like in finding if there is a cycle in a graph or not.

### Classification of Graphs
On the basis of edges:
- Bidirectional (directed graph)
- Unidirectional (undirected graph)

On the basis of edge weight:
- Weighted graph (attribute assigned to edges, attribute needs to be quantified and comparable)
- Unweighted graph

A weighted graph having all weights equal is equivalent to unweighted graph. Considering the above classification, a graph can be
- Weighted directed (Splitwise)
- Unweighted directed (Twitter)
- Weighted undirected (Metro line)
- Unweighted undirected (Facebook)

**Simple Graph:**
- has no *self loop*, an edge like $(i,i)$
- has no *multi edge*, only one edge corresponding to a vertex pair

**Clique** is a graph where each vertex is connected to every other vertex.

**Disconnected Graph:** if it is possible to pick two vertices such that there is no path between them, it is a disconnected graph.
![dg](images/b3wJjo4.png)

### Problems
**Q 1:** Detect if the given directed graph has cycle. A graph contains a cycle if and only if there is a *backedge* in the graph. A backedge is an edge that is from a node to itself or a node to any of its ancestor (grandparent and above).  
**Answer:** If we do a DFS traversal and encounter a node which we had already traversed through, this means that there is a cycle in the graph. In a directed graph, a cycle is present if and only if a node is seen again before all its descendants have been visited. In other words, if a node has a neighbor which is grey, then there is a cycle (and not when the neighbor is black).

In [1]:
# A is number of vertices
# B is list of edges
def has_cycle(A, B):
    # Using the B matrix, we form adjacency list
    adj = [0] * (A+1)
    for i in range(A+1):
        adj[i] = []
        
    # Filling the lists
    for edge in B:
        adj[edge[0]].append(edge[1])
        
    def dfs(i):
        visited[i] = 'GRAY'
        for n in adj[i]:
            if visited.get(n, 'WHITE') == 'GRAY':
                return 1
            
            if visited.get(n, 'WHITE') == 'WHITE' and dfs(n) == 1:
                return 1
        visited[i] = 'BLACK'
        return 0
        
    visited = {}
    for i in range(1, A+1):
        if visited.get(i, 'WHITE') == 'WHITE':
            if dfs(i) == 1:
                return 1
            
    return 0

A = 5
B = [
  [1, 2],
  [1, 3],
  [2, 3],
  [1, 4],
  [4, 3],
  [4, 5],
  [3, 5]
]

print(has_cycle(A,B))

A = 2
B = [
    [1, 2],
    [2, 1]
]

print(has_cycle(A,B))

0
1


The reason why we require coloring can be expressed by the below example:  
<img src="images/7bGflD8.png" width="600" height="auto">  

<img src="images/Bp41MQt.png" width="600" height="auto">  

What if the graph we have is undirected? The above solution will not work in this case. Consider the scenario when there are only two nodes connected by undirected edge. The above algorithm will give us True in that case also. So what we do is that we keep track of the nodes parent node (the node from which we traversed to the current node).

In [2]:
def has_cycle_undirected(A, B):
    # Using the B matrix, we form adjacency list
    adj = [0] * (A+1)
    for i in range(A+1):
        adj[i] = []

    # Filling the lists
    for edge in B:
        adj[edge[0]].append(edge[1])

    def dfs(i, parent):
        visited[i] = 'GRAY'
        for n in adj[i]:
            if visited.get(n, 'WHITE') == 'GRAY' and parent != n:
                return 1

            if visited.get(n, 'WHITE') == 'WHITE' and dfs(n, i) == 1:
                return 1
        visited[i] = 'BLACK'
        return 0

    visited = {}
    for i in range(1, A+1):
        if visited.get(i, 'WHITE') == 'WHITE':
            if dfs(i, 0) == 1:
                return 1

    return 0


A = 3
B = [
    [1, 2],
    [2, 1],
    [2, 3],
    [3, 2],
    [3, 1],
    [1, 3]
]

print(has_cycle_undirected(A, B))

1


It is not really required to make use of colors in this case.

In [2]:
def has_cycle_undirected(A, B):
    # Using the B matrix, we form adjacency list
    adj = [0] * (A+1)
    for i in range(A+1):
        adj[i] = []

    # Filling the lists
    for edge in B:
        adj[edge[0]].append(edge[1])

    def dfs(i, parent):
        visited[i] = True
        for n in adj[i]:
            if visited[n] and parent != n:
                return 1

            if not visited[n] and dfs(n, i) == 1:
                return 1
        return 0

    visited = [False] * (A+1)
    for i in range(1, A+1):
        if visited[i] == False:
            if dfs(i, 0) == 1:
                return 1

    return 0


A = 3
B = [
    [1, 2],
    [2, 1],
    [2, 3],
    [3, 2],
    [3, 1],
    [1, 3]
]

print(has_cycle_undirected(A, B))

1


**Q 2:** Given a `MxN` matrix containing 4 types of entries, `s` denotes starting point, `d` denotes destination, `o` denotes not traversable point and `*` denotes traversable point. Find the shortest distance between start and destination. We can only move up, down, left or right. For example, consider the matrix as:
```
o * o s
* o * *
o * * *
d * * *
```
The shortest path in this case is 6. It may also be possible that we are unable to reach destination. In that case return -1.  
**Answer:** We can represent the above matrix path as a graph and then perform BFS starting from start.

In [4]:
def shortest_path(A):
    # Find the starting cell
    start = None
    for i in range(len(A)):
        for j in range(len(A[0])):
            if A[i][j] == 's':
                start = (i,j)
                break
                
    q = []
    q.append([start, 0])
    
    visited = set()
    visited.add(start)
    
    distance = 0
    
    while len(q) > 0:
        popped = q.pop(0)
        if A[popped[0][0]][popped[0][1]] == 'd':
            return popped[1]
        
        # Up
        if popped[0][0]-1 >= 0 and A[popped[0][0]-1][popped[0][1]] != 'o':
            if (popped[0][0]-1, popped[0][1]) not in visited:
                visited.add((popped[0][0]-1, popped[0][1]))
                q.append([(popped[0][0]-1, popped[0][1]), popped[1]+1])
                
        # Down
        if popped[0][0]+1 < len(A) and A[popped[0][0]+1][popped[0][1]] != 'o':
            if (popped[0][0]+1, popped[0][1]) not in visited:
                visited.add((popped[0][0]+1, popped[0][1]))
                q.append([(popped[0][0]+1, popped[0][1]), popped[1]+1])
                
        # Left
        if popped[0][1]-1 >= 0 and A[popped[0][0]][popped[0][1]-1] != 'o':
            if (popped[0][0], popped[0][1]-1) not in visited:
                visited.add((popped[0][0], popped[0][1]-1))
                q.append([(popped[0][0], popped[0][1]-1), popped[1]+1])
                
        # Right
        if popped[0][1]+1 < len(A[0]) and A[popped[0][0]][popped[0][1]+1] != 'o':
            if (popped[0][0], popped[0][1]+1) not in visited:
                visited.add((popped[0][0], popped[0][1]+1))
                q.append([(popped[0][0], popped[0][1]+1), popped[1]+1])
                
    return -1

A = [
    ['o', '*', 'o', 's'],
    ['*', 'o', '*', '*'],
    ['o', '*', '*', '*'],
    ['d', '*', '*', '*']
]

print(shortest_path(A))

6


In general, if we have an unweighted graph, then we make use of BFS, to get the shortest path in a Graph. In addition to the path length, if we want to know all the nodes in the shortest path, then we need to maintain a list of predecessors.

In [7]:
# A is adjacency list
def print_shortest_path(A, start, end):
    pred = {}
    visited = set()
    
    def bfs(i):
        nonlocal end
        q = []
        q.append(i)
        visited.add(i)
        
        while len(q) > 0:
            popped = q.pop(0)
            if popped == end:
                return
            
            for n in A[popped]:
                if n not in visited:
                    pred[n] = popped
                    visited.add(n)
                    q.append(n)
    
    bfs(start)
    
    i = end
    path = []
    path.append(i)
    while pred.get(i, None) is not None:
        path.append(pred[i])
        i = pred[i]
        
    print(list(reversed(path)))
                    
A = [[1,3], [2], [], [4,7], [5,6,7], [6], [7]]
print_shortest_path(A, 0, 7)

[0, 3, 7]


In the previous matrix problem, we can return the shortest path by:

In [9]:
def shortest_path(A):
    # Find the starting cell
    start = None
    for i in range(len(A)):
        for j in range(len(A[0])):
            if A[i][j] == 's':
                start = (i,j)
                break
                
    q = []
    q.append([start, 0])
    
    visited = set()
    visited.add(start)
    
    # Create predecessor map
    pred = [[None for x in range(len(A))] for y in range(len(A[0]))]
    
    found_path = False
    
    while len(q) > 0:
        popped = q.pop(0)
        if A[popped[0][0]][popped[0][1]] == 'd':
            found_path = True
            end = [popped[0][0], popped[0][1]]
            break
        
        # Up
        if popped[0][0]-1 >= 0 and A[popped[0][0]-1][popped[0][1]] != 'o':
            if (popped[0][0]-1, popped[0][1]) not in visited:
                visited.add((popped[0][0]-1, popped[0][1]))
                q.append([(popped[0][0]-1, popped[0][1]), popped[1]+1])
                pred[popped[0][0]-1][ popped[0][1]] = [popped[0][0], popped[0][1]]
                
        # Down
        if popped[0][0]+1 < len(A) and A[popped[0][0]+1][popped[0][1]] != 'o':
            if (popped[0][0]+1, popped[0][1]) not in visited:
                visited.add((popped[0][0]+1, popped[0][1]))
                q.append([(popped[0][0]+1, popped[0][1]), popped[1]+1])
                pred[popped[0][0]+1][ popped[0][1]] = [popped[0][0], popped[0][1]]
                
        # Left
        if popped[0][1]-1 >= 0 and A[popped[0][0]][popped[0][1]-1] != 'o':
            if (popped[0][0], popped[0][1]-1) not in visited:
                visited.add((popped[0][0], popped[0][1]-1))
                q.append([(popped[0][0], popped[0][1]-1), popped[1]+1])
                pred[popped[0][0]][ popped[0][1]-1] = [popped[0][0], popped[0][1]]
                
        # Right
        if popped[0][1]+1 < len(A[0]) and A[popped[0][0]][popped[0][1]+1] != 'o':
            if (popped[0][0], popped[0][1]+1) not in visited:
                visited.add((popped[0][0], popped[0][1]+1))
                q.append([(popped[0][0], popped[0][1]+1), popped[1]+1])
                pred[popped[0][0]][ popped[0][1]+1] = [popped[0][0], popped[0][1]]
                
    if found_path:
        path = []
        p = end;
        while pred[p[0]][p[1]] != None:
            path.append([p[0], p[1]])
            c = pred[p[0]][p[1]]
            p[0] = c[0]
            p[1] = c[1]
            
        path.append([p[0], p[1]])
        return list(reversed(path))
    else:
        return None

A = [
    ['o', '*', 'o', 's'],
    ['*', 'o', '*', '*'],
    ['o', '*', '*', '*'],
    ['d', '*', '*', '*']
]

print(shortest_path(A))

[[0, 3], [1, 3], [2, 3], [3, 3], [3, 2], [3, 1], [3, 0]]


**Q 3:** Given a `MxN` array, where each row can either be 0 or 1, return a new array where each cell represents its distance from a 1 cell. For example, if the array is 
```
0 0 0 1
0 0 1 1
0 1 1 0
```
then return
```
3 2 1 0
2 1 0 0
1 0 0 1
```
**Answer:** The easy way, is to do BFS for each 0 node and hence get the shortest distance to a 1. However there is a better way. Instead of starting from a 0, we can start from a 1. Every immediate neighbour to a 1 cell will have a distance 1, and so on.

In [1]:
def distance_to_one(A):
    q = []
    visited = set()
    
    # Iterate through the matrix and add all 1s to the queue
    for i in range(len(A)):
        for j in range(len(A[0])):
            if A[i][j] == 1:
                q.append([(i,j), 0])
                visited.add((i,j))
                
    while len(q) > 0:
        popped = q.pop(0)
        A[popped[0][0]][popped[0][1]] = popped[1]
        
        # Up
        if popped[0][0]-1 >= 0:
            if (popped[0][0]-1, popped[0][1]) not in visited:
                visited.add((popped[0][0]-1, popped[0][1]))
                q.append([(popped[0][0]-1, popped[0][1]), popped[1]+1])
        
        # Down
        if popped[0][0]+1 < len(A):
            if (popped[0][0]+1, popped[0][1]) not in visited:
                visited.add((popped[0][0]+1, popped[0][1]))
                q.append([(popped[0][0]+1, popped[0][1]), popped[1]+1])
                
        # Left
        if popped[0][1]-1 >= 0:
            if (popped[0][0], popped[0][1]-1) not in visited:
                visited.add((popped[0][0], popped[0][1]-1))
                q.append([(popped[0][0], popped[0][1]-1), popped[1]+1])
                
        # Right
        if popped[0][1]+1 < len(A[0]):
            if (popped[0][0], popped[0][1]+1) not in visited:
                visited.add((popped[0][0], popped[0][1]+1))
                q.append([(popped[0][0], popped[0][1]+1), popped[1]+1])
                
    return A

A = [[0,0,0,1],[0,0,1,1],[0,1,1,0]]
print(distance_to_one(A))

[[3, 2, 1, 0], [2, 1, 0, 0], [1, 0, 0, 1]]


**Q 4:** A matrix of size `MxN` contains these three values 0 - no orange, 1 - fresh orange, 2 - rotten orange. A rotten orange will rot all the adjacent oranges. Find the time taken to rot all oranges. If all oranges cannot rot, return -1.  
**Answer:** Do multi source BFS as in the above problem

In [6]:
def rotten_orange(A):
    q = []
    
    # count contains number of fresh and rotten oranges
    count = 0
    # Iterate through the matrix and add all rotten oranges to the queue
    for i in range(len(A)):
        for j in range(len(A[0])):
            if A[i][j] == 2:
                q.append([(i, j), 0])
                count += 1
            elif A[i][j] == 1:
                count += 1

    time = 0
    c = 0
    while len(q) > 0:
        popped = q.pop(0)
        # Make the cell we are visiting a rotten orange cell
        # this way we need not use a visited set
        c += 1
        if A[popped[0][0]][popped[0][1]] == 2:
            time = popped[1]

        # Up
        if popped[0][0]-1 >= 0 and A[popped[0][0]-1][popped[0][1]] == 1:
            A[popped[0][0]-1][popped[0][1]] = 2
            q.append([(popped[0][0]-1, popped[0][1]), popped[1]+1])

        # Down
        if popped[0][0]+1 < len(A) and A[popped[0][0]+1][popped[0][1]] == 1:
            A[popped[0][0]+1][popped[0][1]] = 2
            q.append([(popped[0][0]+1, popped[0][1]), popped[1]+1])

        # Left
        if popped[0][1]-1 >= 0 and A[popped[0][0]][popped[0][1]-1] == 1:
            A[popped[0][0]][popped[0][1]-1] = 2
            q.append([(popped[0][0], popped[0][1]-1), popped[1]+1])

        # Right
        if popped[0][1]+1 < len(A[0]) and A[popped[0][0]][popped[0][1]+1] == 1:
            A[popped[0][0]][popped[0][1]+1] = 2
            q.append([(popped[0][0], popped[0][1]+1), popped[1]+1])

    # Now we need to check if all oranges were rotten or not. If all the
    # fresh and rotten oranges were iterated upon, then it means all the
    # oranges eventually rotted. Or we could have iterated over the matrix
    # and checked if there are any fresh orange left
    if c != count:
        return -1

    return time

# A case where all oranges rot
A = [[2,1,0,2,1], [1,0,1,2,1], [1,0,0,2,1]]
print(rotten_orange(A))

# A case where all oranges do not rot
A = [[1,0,2], [0,0,1], [0,0,1]]
print(rotten_orange(A))

2
-1


**Q 5:** Given an undirected tree, find the longest path in the tree.  
**Answer:** If we pick any node in the tree and do BFS from this node, we will reach the node farthest away. Now pick the farthest node and do BFS from there. In this two step process, we have found the longest path in the tree.

In [9]:
def longest_path_in_tree(A):
    def bfs(i):
        visited = set()
        last = None
        distance = 0
        
        q = []
        q.append((i, 0))
        visited.add(i)
        
        while len(q) > 0:
            popped = q.pop(0)
            last = popped[0]
            distance = popped[1]
            
            for n in A[last]:
                if n not in visited:
                    visited.add(n)
                    q.append((n, distance + 1))
                    
        return (last, distance)
        
    root = 0
    start, distance = bfs(root)
    start2, distance2 = bfs(start)
    
    return distance2

A = [[1], [0,2,6], [1,3,4,9], [2], [2,5], [4], [1,7,8], [6], [6], [2]]
print(longest_path_in_tree(A))

5


**Q 6:** Given a graph, clone it. We represent a node of graph as:

In [11]:
class UndirectedGraphNode:
    def __init__(self, x):
        self.label = x
        self.neighbors = []

**Answer:** Do a BFS and create nodes as required

In [None]:
def clone_graph(node):
    # Starting Node of the cloned Graph
    start = UndirectedGraphNode(node.label)

    # We will do a BFS on original
    q_org = []
    q_new = []
    visited = set()

    q_org.append(node)
    q_new.append(start)

    visited.add(node.label)

    # Map which stores all nodes
    nodes = {}
    nodes[start.label] = start

    while len(q_org) > 0:
        popped_org = q_org.pop(0)
        popped_new = q_new.pop(0)

        for n in popped_org.neighbors:
            if n.label in nodes:
                n_new = nodes[n.label]
            else:
                n_new = UndirectedGraphNode(n.label)
                nodes[n.label] = n_new

            popped_new.neighbors.append(n_new)

            if n.label not in visited:    
                visited.add(n.label)

                q_org.append(n)
                q_new.append(n_new)

    return start

Or we can do a DFS

In [None]:
public UndirectedGraphNode cloneGraph(UndirectedGraphNode start) {
    Set<UndirectedGraphNode> visited = new HashSet<>();

    UndirectedGraphNode newStart = new UndirectedGraphNode(start.label);

    HashMap<Integer, UndirectedGraphNode> nodes = new HashMap<>();
    nodes.put(newStart.label, newStart);

    class Helper {
        public void dfs(UndirectedGraphNode oldNode, UndirectedGraphNode newNode) {
            visited.add(oldNode);
            for (UndirectedGraphNode n : oldNode.neighbors) {
                if (nodes.get(n.label) != null) {
                    newNode.neighbors.add(nodes.get(n.label));
                }

                if (!visited.contains(n)) {
                    UndirectedGraphNode temp = new UndirectedGraphNode(n.label);
                    nodes.put(temp.label, temp);
                    newNode.neighbors.add(temp);
                    dfs(n, temp);
                }
            }
        }
    }

    Helper h = new Helper();
    h.dfs(start, newStart);

    return newStart;
}

**Q 7** Given a weighted undirected graph having $A$ nodes, a source node $C$ and destination node $D$. Find the shortest distance from $C$ to $D$ and if it is impossible to reach node $D$ from $C$ then return -1.  
**Answer:** In this case, we make use of dummy nodes. For example, graph `A-2->B-1->C` will be converted to `A-1->d0-1->B-1->C`. 

In [1]:
def shortest_path(A, B, C, D):
    if C == D:
        return 0

    # Find the gcd of all weights, that will be the
    # weight between any 2 vertex
    g = B[0][2]
    import math
    for i in range(len(B)):
        g = math.gcd(g, B[i][2])

    # Form the adjacency matrix, add dummy nodes
    adj = {}
    for i in range(A):
        adj[str(i)] = []

    dc = 0  # dummy counter
    while len(B) > 0:
        edge = B.pop(0)
        start = edge[0]
        end = edge[1]
        weight = edge[2]

        if weight != g:
            # add weight/g - 1 dummy nodes
            i = weight // g - 1
            while i >= 1:
                # Dummy node to previous node and
                # prev node to dummy node
                adj[str(start)].append('d' + str(dc))
                if 'd' + str(dc) not in adj:
                    adj['d' + str(dc)] = []
                adj['d' + str(dc)].append(str(start))

                # Last dummy node
                if i == 1:
                    adj[str(end)].append('d' + str(dc))
                    adj['d' + str(dc)].append(str(end))
                start = 'd' + str(dc)

                dc += 1
                i -= 1
        else:
            adj[str(start)].append(str(end))
            adj[str(end)].append(str(start))

    # Do BFS.
    visited = set()
    q = []
    q.append((str(C), 0))
    visited.add(str(C))

    while len(q) > 0:
        popped = q.pop(0)

        if popped[0] in adj:
            for n in adj[popped[0]]:
                if n == str(D):
                    return (popped[1] + 1) * g
                if n not in visited:
                    q.append((n, popped[1] + 1))
                    visited.add(n)

    return -1

A = 6
B = [   [2, 5, 1] ,
        [1, 3, 1] ,
        [0, 5, 2] ,
        [0, 2, 2] ,
        [1, 4, 1] ,
        [0, 1, 1] ] 
C = 3
D = 2

print(shortest_path(A,B,C,D))

4


**Q 8** Given a 2d matrix, count the number of islands. An island is formed by contiguos `1`s. For example, if the array is:
```   
[1, 1, 0, 0, 0]
[0, 1, 0, 0, 0]
[1, 0, 0, 1, 1]
[0, 0, 0, 0, 0]
[1, 0, 1, 0, 1] 
```
Then it contains 5 islands (look diagonally as well).  
**Answer** We pick a cell and do BFS from it. The number of times BFS is called is the number of islands

In [5]:
def islands(A):

    def dfs(i, j):
        if A[i][j] == 0:
            return
        else:
            A[i][j] = 0

        if i+1 < len(A):
            dfs(i+1, j)
        if j+1 < len(A[0]):
            dfs(i, j+1)
        if i-1 >= 0:
            dfs(i-1, j)
        if j-1 >= 0:
            dfs(i, j-1)
        if i+1 < len(A) and j+1 < len(A[0]):
            dfs(i+1, j+1)
        if i+1 < len(A) and j-1 >= 0:
            dfs(i+1, j-1)
        if i-1 >= 0 and j+1 < len(A[0]):
            dfs(i-1, j+1)
        if i-1 >= 0 and j-1 >= 0:
            dfs(i-1, j-1)

    islands = 0
    for i in range(len(A)):
        for j in range(len(A[0])):
            if A[i][j] == 0:
                continue
            else:
                dfs(i, j)
                islands += 1

    return islands

A = [[1, 1, 0, 0, 0],
[0, 1, 0, 0, 0],
[1, 0, 0, 1, 1],
[0, 0, 0, 0, 0],
[1, 0, 1, 0, 1] ]

print(islands(A))

5


In the above solution we can replace DFS with BFS.

**Q 9** Given a $N\times N$ chess board. How many steps will a knight need to reach destination cell from starting cell on board?  
**Answer:** Do BFS. At any time we can move in 8 directions.

**Q 10** Given a 2-D board $A$ of size $N \times M$ containing 'X' and 'O', capture all regions surrounded by 'X'. A region is captured by flipping all 'O's into 'X's in that surrounded region. For example:
```
[X, X, X, X]             [X, X, X, X]
[X, O, O, X] converts to [X, X, X, X] 
[X, X, O, X]             [X, X, X, X]
[X, O, X, X]             [X, O, X, X]
```
Another example:
```
[X, O, O]             [X, O, O]
[X, O, X] converts to [X, O, X]
[O, O, O]             [O, O, O]
```
**Answer** We pick all the Os in the boundary and start multi source BFS from the selected sources

In [8]:
def capture(A):
    q = []

    # Iterate over boundary and add all Os to the queue
    for j in range(len(A[0])):
        if(A[0][j] == 'O'):
            q.append([ 0, j ])
            A[0][j] = '-'

        if (A[len(A) - 1][j] == 'O'):
            q.append([len(A) - 1, j ])
            A[len(A) - 1][j] = '-'

    for i in range(len(A)) :
        if(A[i][0] == 'O'):
            q.append([ i, 0 ])
            A[i][0] = '-'

        if (A[i][len(A[0]) - 1] == 'O'):
            q.append([i, len(A[0]) - 1])
            A[i][len(A[0]) - 1] = '-'


    # Do BFS
    while len(q) > 0:
        popped = q.pop(0)
        i = popped[0];
        j = popped[1];

        if (i - 1 >= 0 and A[i - 1][j] == 'O'):
            A[i - 1][j] = '-';
            q.append([ i - 1, j ])

        if (i + 1 < len(A) and A[i + 1][j] == 'O'):
            A[i + 1][j] = '-'
            q.append([i + 1, j ])

        if (j - 1 >= 0 and A[i][j - 1] == 'O'):
            A[i][j - 1] = '-';
            q.append([ i, j - 1 ])

        if (j + 1 < len(A[0]) and A[i][j + 1] == 'O'):
            A[i][j + 1] = '-'
            q.append([ i, j + 1 ])


    for i in range(len(A)):
        for j in range(len(A[0])):
            if (A[i][j] == '-'):
                A[i][j] = 'O';
            elif (A[i][j] == 'O'):
                A[i][j] = 'X'



    return A

A = [
   ['X', 'X', 'X', 'X'],
   ['X', 'X', 'X', 'X'],
   ['X', 'X', 'X', 'X'],
   ['X', 'O', 'X', 'X']
 ]
    
print(capture(A))

[['X', 'X', 'X', 'X'], ['X', 'X', 'X', 'X'], ['X', 'X', 'X', 'X'], ['X', 'O', 'X', 'X']]


### Dijkstra Algorithm
To find the shortest path in a weighted graph, we use Dijkstra Algorithm which gives the shortest path from a source node to all the other nodes from the source node.

In [1]:
def dijkstra(A, B, C):
    # Need a minheap to get the neighbouring node with minimum
    # weight in logarithmic time
    import heapq as hq

    # Form the adjacency list with the given nodes
    adj = []
    for i in range(A):
        adj.append([])

    # Fill the adjacency list, (weight, node)
    for edge in B:
        adj[edge[0]].append((edge[2], edge[1]))
        adj[edge[1]].append((edge[2], edge[0]))

    # Visited array stores what node has been visited already
    visited = [False] * A

    # Weight array
    import sys
    weight = [sys.maxsize] * A
    # Weight from source node to source node is 0
    weight[C] = 0

    q = [] # This q will store the heap
    # Push the source node to the heap
    hq.heappush(q, (0, C))

    # Pred array stores what node we come from,
    # don't really need for this problem
    pred = [None] * A

    while len(q) > 0:
        popped = hq.heappop(q)
        popped_v = popped[1]
        popped_w = popped[0]

        visited[popped_v] = True

        # Iterate through all the neighbouring nodes of popped
        for node in adj[popped_v]:
            node_v = node[1]
            node_w = node[0]

            if not visited[node_v]:
                # Weight for a node will be the sum of node's weight
                # + node's pred weight
                total_w = popped_w + node_w
                if total_w < weight[node_v]:
                    weight[node_v] = total_w
                    pred[node_v] = popped_v
                    hq.heappush(q, (total_w, node_v))

    # If a node can't be visited from the source
    # node, we set weight as -1
    for i in range(len(weight)):
        if weight[i] == sys.maxsize:
            weight[i] = -1

    return weight

A = 6
B = [
    [0, 4, 9],
    [3, 4, 6],
    [1, 2, 1],
    [2, 5, 1],
    [2, 4, 5],
    [0, 3, 7],
    [0, 1, 1],
    [4, 5, 7],
    [0, 5, 1]
]
C = 4

print(dijkstra(A, B, C))

[7, 6, 5, 6, 0, 6]


### Disjoint Set
Suppose we have $N$ nodes, all unconnected. This means that we have $N$ connected components. We can form a undirected edge between any two nodes. We call this $union(x, y)$ operation. It can be illustrated by diagrams below:

Initially all there is nt edge between any two vertices:  
<img src="https://he-s3.s3.amazonaws.com/media/uploads/039a333.jpg" width="600" height="auto">

We use array `Arr` to represent connectivity. Indices of this array represent the nodes and `Arr[A] == Arr[B]` then `A` and `B` nodes are connected
<img src="https://he-s3.s3.amazonaws.com/media/uploads/1539ad6.jpg" width="600" height="auto">

On performing $union(2, 1)$, we get:  
<img src="https://he-s3.s3.amazonaws.com/media/uploads/32f6a91.jpg" width="600" height="auto">  

On performing $union(4, 3)$, $union(8,4)$ and $union(9,3)$ we get:  
<img src="https://he-s3.s3.amazonaws.com/media/uploads/4c11a99.jpg" width="600" height="auto">
<img src="https://he-s3.s3.amazonaws.com/media/uploads/6a7bc9a.jpg" width="600" height="auto"><br>  

After performing $union(6,5)$, we get:  
<img src="https://he-s3.s3.amazonaws.com/media/uploads/66d9b5d.jpg" width="600" height="auto">
<img src="https://he-s3.s3.amazonaws.com/media/uploads/7439d01.jpg" width="600" height="auto"><br>  

We can define another operation $find(x,y)$ which will return true if nodes $x$ and $y$ are in the same connected component. In the above graph, $find(0,7)$ will be false. Now if we do $union(5,2)$, we get:  
<img src="https://he-s3.s3.amazonaws.com/media/uploads/a7f5551.jpg" width="600" height="auto">
<img src="https://he-s3.s3.amazonaws.com/media/uploads/8538800.jpg" width="600" height="auto"><br>  

In total, we are left with 4 sets or 4 connected components.

In [5]:
def init(arr, N):
    for i in range(N):
        arr.append(i)
        
def find(arr, x, y):
    return arr[x] == arr[y]

def union(arr, N, x, y):
    if not find(arr, x, y):
        temp = arr[x]
        for i in range(N):
            if arr[i] == temp:
                arr[i] = arr[y]
            
N = 10
arr = []
init(arr, N)

union(arr,N,2,1); union(arr,N,4,3); union(arr,N,8,4); union(arr,N,9,3); union(arr,N,6,5); union(arr,N,5,2)
print(arr)

[0, 1, 1, 3, 3, 1, 1, 7, 3, 3]


There is a better approach for the $union$ operation. Here the term $Arr[A]$ contains the parent node of A. Consider 6 nodes. Initially, since all nodes are unconnected, we have:  
<img src="https://he-s3.s3.amazonaws.com/media/uploads/fd8e878.jpg" width="600" height="auto">  

Also we define a new operation $root(x)$ which will give the root node of a node $x$. Once we do $union(1,0)$, the node 0 becomes root node of node 1.  
<img src="https://he-s3.s3.amazonaws.com/media/uploads/1ddeb33.jpg" width="600" height="auto">  

$union(0,2)$:  
<img src="https://he-s3.s3.amazonaws.com/media/uploads/201a115.jpg" width="600" height="auto"> 

$union(3,4)$:  
<img src="https://he-s3.s3.amazonaws.com/media/uploads/4e3e47b.jpg" width="600" height="auto"> 

$union(1,4)$:  
<img src="https://he-s3.s3.amazonaws.com/media/uploads/62016dc.jpg" width="600" height="auto"> 

In [None]:
def root(arr, x):
    while arr[x] != x:
        x = arr[x]
        
    return x

If the tree formed in each case is balanced, then it would be much better. The find operation will take $O(logN)$ time complexity. So whenever, we connect two root nodes, we connect the smaller component to the larger connected component.  
<img src="https://he-s3.s3.amazonaws.com/media/uploads/a1f5858.jpg" width="600" height="auto">

Illustrated example:  
<img src="https://he-s3.s3.amazonaws.com/media/uploads/d95f145.jpg" width="600" height="auto">
<img src="https://he-s3.s3.amazonaws.com/media/uploads/e8364d0.jpg" width="600" height="auto">
<img src="https://he-s3.s3.amazonaws.com/media/uploads/ff4f029.jpg" width="600" height="auto">

In [None]:
# We make use of a size array which will contain size of tree rooted at i
def init(arr, size, N):
    for i in range(N):
        arr.append(i)
        size.append(1)
        
def root(arr, x):
    while arr[x] != x:
        x = arr[x]
        
    return x

def find(arr, x, y):
    return root(arr, x) == root(arr, y)
    
def union(arr, size, N, x, y):
    root_x = root(arr, x)
    root_y = root(arr, y)
    
    if size[root_x] >= size[root_y]:
        arr[root_y] = root_x
        size[root_x] += size[root_y]
    else:
        arr[root_x] = root_y
        size[root_y] += size[root_x]

What if the nodes are not numeric? Just use a map instead of array

In [13]:
arr = {}
size = {}
vertices = ['a', 'b', 'c', 'd']

def init(arr, size, vertices):
    for v in vertices:
        arr[v] = v
        size[v] = 1
        
def root(arr, x):
    while arr[x] != x:
        x = arr[x]
        
    return x

def find(arr, x, y):
    return root(arr, x) == root(arr, y)

def union(arr, size, x, y):
    root_x = root(arr, x)
    root_y = root(arr, y)
    
    if size[root_x] >= size[root_y]:
        arr[root_y] = root_x
        size[root_x] += size[root_y]
    else:
        arr[root_x] = root_y
        size[root_y] += size[root_x]
        
init(arr, size, vertices)
union(arr, size, 'a', 'b'); union(arr, size, 'c', 'd')
print(find(arr, 'a', 'c'))
print(find(arr, 'a', 'b'))

False
True


### Kruskal's Algorithm
Given a connected and undirected graph, a **spanning tree** of that graph is a subgraph that is a tree and connects all the vertices together. A **minimum spanning tree (MST)** of a connected and undirected graph is a spanning tree with weight less than or equal to the weight of every other spanning tree. 

Kruskal's algorithm finds the MST. Given a set of vertices and edges of a weighted undirected tree, we sort the edges in ascending order. Each time we pick an edge such that picking that edge doesn't form a cycle.

In [14]:
# Vertex count and edges
vertices = 4
edges = [(0, 1, 10), (0, 2, 6) ,(0, 3, 5), (1, 3, 15) ,(2, 3, 4)]

# Finds root of a vertex x
def root(arr, x):
    while arr[x] != x:
        x = arr[x]
        
    return x

# Do union operation
def union(arr, size, N, x, y):
    root_x = root(arr, x)
    root_y = root(arr, y)
    
    if size[root_x] >= size[root_y]:
        arr[root_y] = root_x
        size[root_x] += size[root_y]
    else:
        arr[root_x] = root_y
        size[root_y] += size[root_x]
        
def kruskal(N, edges):
    result = []
    
    # Sort the edges
    g = edges
    g = sorted(g, key=lambda x: x[2])
    
    arr = []
    size = []
    
    # Form arr and size arrays
    for i in range(N):
        arr.append(i)
        size.append(1)
        
    # Number of edges in MST will be N-1
    j = 0
    k = 0
    while j < N-1:
        f, t, w = g[k]
        
        root_f = root(arr, f)
        root_t = root(arr, t)
        
        if root_f != root_t:
            result.append((f, t, w))
            j += 1
            union(arr, size, N, f, t)
            
        k += 1
        
    min_cost = 0
    for e in result:
        min_cost += e[2]
        
    return min_cost

print(kruskal(4, edges))

19


Time Complexity is $O(ElogE)$ or $O(ElogV)$. Sorting of edges takes $O(ELogE)$ time. After sorting, we iterate through all edges and apply find-union algorithm. The find and union operations can take atmost $O(LogV)$ time. So overall complexity is $O(ELogE + ELogV)$ time. The value of $E$ can be atmost $O(V^2)$, so $O(LogV)$ are $O(LogE)$ same. Therefore, overall time complexity is $O(ElogE)$ or $O(ElogV)$