## Strongly Connected Component (SCC)

The file contains the edges of a directed graph. Vertices are labeled as positive integers from 1 to 875714. Every row indicates an edge, the vertex label in first column is the tail and the vertex label in second column is the head (recall the graph is directed, and the edges are directed from the first column vertex to the second column vertex). So for example, the 11th row looks liks : "2 47646". This just means that the vertex with label 2 has an outgoing edge to the vertex with label 47646

Your task is to code up the algorithm from the video lectures for computing strongly connected components (SCCs), and to run this algorithm on the given graph.

Output Format: You should output the sizes of the 5 largest SCCs in the given graph, in decreasing order of sizes, separated by commas (avoid any spaces). So if your algorithm computes the sizes of the five largest SCCs to be 500, 400, 300, 200 and 100, then your answer should be "500,400,300,200,100" (without the quotes). If your algorithm finds less than 5 SCCs, then write 0 for the remaining terms. Thus, if your algorithm computes only 3 SCCs whose sizes are 400, 300, and 100, then your answer should be "400,300,100,0,0" (without the quotes). (Note also that your answer should not have any spaces in it.)

WARNING: This is the most challenging programming assignment of the course. Because of the size of the graph you may have to manage memory carefully. The best way to do this depends on your programming language and environment, and we strongly suggest that you exchange tips for doing this on the discussion forums.


In [None]:
import sys
sys.path.insert(1, '/Users/gauravbakale/projects/practice/Algorithms')

from common.graphs import (
    Vertex, 
    Edge, 
    Graph, 
    create_graph
)
from common.data_structures import Queue

In [None]:
nodes = "a,b,c,d,e,f,g,h,i,j,k"
edges = "a-c,b-a,c-b,b-j,b-k,j-i,j-h,i-h,h-k,k-j,j-e,e-f,f-g,g-e,c-d,d-f,d-e"
G = create_graph("G", nodes, edges, False)

### Breadth First Search

    Given,
        - graph G 
        - node n(can be random to start the algo)
        - Q = FIFO(queue), 
    1. assume all nodes unexplored
    2. mark n explored
    3. loop through all the adjacent nodes of n and if its not already explored or in the queue then add it to queue
    4. loop through all the nodes in queue and its adjacent nodes until they are added to the explored list nodes list and the queue is empty 

In [None]:

def breadth_first_search(G: Graph, v: Vertex):
    explored_vertexes = []
    fifo_queue = Queue('FIFO')
    fifo_queue.push(v)
    while len(fifo_queue) > 0:
        v = fifo_queue.pop()
        for edge_name in G.edges:
            edge = G.edges[edge_name]
            if edge.source_vertex == v:
                if edge.destination_vertex not in explored_vertexes and edge.destination_vertex not in fifo_queue.get():
                    fifo_queue.push(edge.destination_vertex)
        explored_vertexes.append(v)
    
    return explored_vertexes


breadth_first_search(G, G.vertexes['a'])

##### Application : Shortest Path
    
    Since we explore nodes in layers if the node n is in i_th layer from s, then the distance from s to n is i. This is also the shortest path.
    Proof: If lets say the distance from s to n given by bfs is 4. Now, it can't be less than 3 because if that was the case then node n would have been explored in the 3rd layer itself.

    This is an additional property of BFS. 

    __Proof__ : Review

In [None]:
def bread_first_search_with_shortest_distance(G: Graph, s: Vertex, n: Vertex):
    explored_vertexes = []
    fifo_queue = Queue('FIFO')
    fifo_queue.push(s)
    distance = {}
    distance[s] = 0
    while len(fifo_queue) > 0:
        v = fifo_queue.pop()
        for edge_name in G.edges:
            edge = G.edges[edge_name]
            if edge.source_vertex == v:
                if edge.destination_vertex not in explored_vertexes and edge.destination_vertex not in fifo_queue.get():
                    fifo_queue.push(edge.destination_vertex)
                    distance[edge.destination_vertex] = distance[v] + 1

        explored_vertexes.append(v)
    
    return distance[n]

bread_first_search_with_shortest_distance(G, G.vertexes['a'], G.vertexes['h'])

### Depth First Search

    Depth first search works on basis of aggresive exploring. For example, if we start with node s, then we got to an adjacent node of s, lets say r, then we go to one of the adjacent nodes of r, lets say c. We backtrack from c only if c is already explored or is a end node. Otherwise we go to the next adjacent node of c.

It can either be implemented same as bfs but instead of using FIFO use LIFO. (OR) we can use recursive method.

In [None]:
def depth_first_search(G: Graph, s: Vertex):
    Q = Queue('LIFO')
    explored_vertexes = []
    Q.push(s)
    while len(Q):
        v = Q.pop()
        Q.push(v)
        explored_vertexes.append(v)
        explored_vertexes_len = len(explored_vertexes)
        for edge_name in G.all_edges:
            edge = G.all_edges[edge_name]
            if edge.source_vertex == v:
                if edge.destination_vertex not in explored_vertexes and edge.destination_vertex not in Q.get():
                    Q.push(edge.destination_vertex)
                    break
        if explored_vertexes_len == len(explored_vertexes):
            Q.pop()
        


depth_first_search(G,G.vertexes['a'])    
    
    
    

In [None]:

fifo_queue.queue

In [None]:
vertexes = []
edges = []
with open('SCC_small.txt','r') as file:
    for row in file:
        
        row = row.split()
        
        # add the vertexes
        if row[0] not in vertexes: vertexes.append(row[0])
        if row[1] not in vertexes: vertexes.append(row[1])

        # add the edges
        edge = f"{row[0]}-{row[1]}"
        if edge not in edges: edges.append(edge)

        