# Programming Assignment 1 Graph Algorithms
<br>
Your task is to code up the algorithm from the video lectures for computing strongly connected components (SCCs), and to run this algorithm on the given graph.
<br>
Output Format: You should output the sizes of the 5 largest SCCs in the given graph, in decreasing order of sizes, separated by commas (avoid any spaces). So if your algorithm computes the sizes of the five largest SCCs to be 500, 400, 300, 200 and 100, then your answer should be "500,400,300,200,100" (without the quotes). If your algorithm finds less than 5 SCCs, then write 0 for the remaining terms. Thus, if your algorithm computes only 3 SCCs whose sizes are 400, 300, and 100, then your answer should be "400,300,100,0,0" (without the quotes). (Note also that your answer should not have any spaces in it.)

## Preparation

In [1]:
#various common imports
import numpy as np
import sys
import pandas as pd
import random as rnd
import copy
#interesting to create more easily dictionaries, from https://stackoverflow.com/questions/26367812/appending-to-list-in-python-dictionary
from collections import defaultdict
from collections import Counter
#import resource
import threading
#import resources

In [2]:
#Careful recursion and stack preparation. Apparently absolutely necessary
sys.setrecursionlimit(4000)
#hardlimit = resource.getrlimit(resource.RLIMIT_STACK)[1]

### File import
<br>
The file contains the edges of a directed graph. Vertices are labeled as positive integers from 1 to 875714. Every row indicates an edge, the vertex label in first column is the tail and the vertex label in second column is the head (recall the graph is directed, and the edges are directed from the first column vertex to the second column vertex). So for example, the 11^{th}11 
th row looks liks : "2 47646". This just means that the vertex with label 2 has an outgoing edge to the vertex with label 47646

In [3]:
input_file = 'SCC.txt'
with open(input_file, 'r') as data:
    line = data.read().strip().split("\n")
#produces correctly elements with 2 values

In [4]:
len(line)

5105043

In [5]:
line[1].strip()

'1 2'

In [6]:
line[1].split()

['1', '2']

Maybe turn into a **tuple** ?
<br>
Suggested to represent as adjacency list
<br>

In [7]:
prbNet=[[int(s) for s in lin.split()] for lin in line]

In [8]:
#Adjacency list
prbNet2=np.array(prbNet)
uniqueID=np.unique(prbNet2)
print(uniqueID[:30])

[ 1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
 26 27 28 29 30]


In [9]:
#All unique values in the head of vertices
uniqueHead=np.unique(prbNet2[:,0])

In [10]:
len(uniqueHead)

739454

In [11]:
uniqueID[:20]

array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17,
       18, 19, 20])

In [12]:
#All unique values in the tail of vertices
uniqueTail=np.unique(prbNet2[:,1])

In [13]:
len(uniqueTail)

714547

In [14]:
print(uniqueID.tolist()==list(range(1,875715)))

True


In [15]:
#Unique values that are in Head but not Tail, meaning nodes from which vertices only depart and never arrive to
diffinHead=np.setdiff1d(uniqueHead,uniqueTail)

In [16]:
len(diffinHead)

161167

In [17]:
diffinHead[:10]

array([ 253,  435,  637, 1181, 1218, 1254, 1522, 1656, 1904, 2136])

In [18]:
#Unique values that are in Tail but not Head, meaning nodes from which vertices only arrive to and never arrive from
diffinTail=np.setdiff1d(uniqueTail,uniqueHead)

In [19]:
len(diffinTail)

136260

In [20]:
diffinTail[:10]

array([  4,  21,  37,  39, 108, 150, 174, 223, 230, 232])

There is a clear difference in the size of the two, so <br>
- Value that are unique in the heads,but are not in tails are nodes from which the graph is only outgoing
- On the reverse, values that are unique in tails and that are not in heads are nodes from which the graph is only ingoing
<br>
The dictionary generating routine will need to add for the normal graph those that are unique in the tails because they don't ever have their value on the first position of the vertices, and to the reverse graphs those that are unique in the heads,because they don't ever have their value on the second position.
<br>
These should be added or not? Hard to say, as we can still keep the list of uniqueIDs as a list of nodes.

In [21]:
#Remember how the problem is structured
prbNet[:10]

[[1, 1],
 [1, 2],
 [1, 5],
 [1, 6],
 [1, 7],
 [1, 3],
 [1, 8],
 [1, 4],
 [2, 47646],
 [2, 47647]]

In [22]:
prbNet2[prbNet2[:,0] == 2,1].tolist()

[47646, 47647, 13019, 47648, 47649, 47650, 7700, 47651, 47652]

In [23]:
list(filter(lambda x : x[0]==2,prbNet))

[[2, 47646],
 [2, 47647],
 [2, 13019],
 [2, 47648],
 [2, 47649],
 [2, 47650],
 [2, 7700],
 [2, 47651],
 [2, 47652]]

In [24]:
#Quite an interesting form of high performance container https://docs.python.org/2/library/collections.html#collections.defaultdict

In [25]:
test=defaultdict(list)
print(test)
test['D'].append(5)
print(test)
test['D'].append(6)
print(test)
test['C'].append(8)
print(test)
print(test['D'])
print(type(test['D']))

defaultdict(<class 'list'>, {})
defaultdict(<class 'list'>, {'D': [5]})
defaultdict(<class 'list'>, {'D': [5, 6]})
defaultdict(<class 'list'>, {'D': [5, 6], 'C': [8]})
[5, 6]
<class 'list'>


In [26]:
#What are we producing here? A dictionary that has structure
# node ID: [list of outgoing vertices]
#However, constructed like this, it includes no nodes that have no outgoing vertex
prbDict=defaultdict(list)
for i in prbNet:
    prbDict[str(i[0])].append(i[1])

In [27]:
#Solution.  Add those that only receive, so tail
for i in diffinTail.tolist():
    prbDict[str(i)].append(None)

In [28]:
#This one is present
prbDict['2']

[47646, 47647, 13019, 47648, 47649, 47650, 7700, 47651, 47652]

In [29]:
#Test with one that should have no ingoing
prbDict[str(diffinHead[0])]

[254, 255, 256]

In [30]:
#Show that it is the same as selecting for the same head
list(filter(lambda x : x[0]==diffinHead[0],prbNet))

[[253, 254], [253, 255], [253, 256]]

In [75]:
#Show that it is the same as selecting for the same tail
list(filter(lambda x : x[0]==diffinTail[0],prbNet))

[]

In [31]:
#One vertex that has no outgoing
diffinTail[0]

4

In [32]:
#missing initially, but now none should be present
prbDict['4']

[None]

In [33]:
#see if those other are present
list(filter(lambda x : x[0]==diffinTail[0],prbNet))

[]

In [34]:
#Let's check out if it also works for reversal
#Want to see ALL nodes that go to 9, instead of where 9 goes to
list(filter(lambda x : x[1]==9,prbNet))

[[5, 9],
 [11, 9],
 [26, 9],
 [32, 9],
 [71106, 9],
 [71107, 9],
 [71110, 9],
 [71112, 9],
 [104769, 9],
 [104773, 9],
 [104807, 9],
 [115331, 9],
 [280789, 9],
 [292544, 9],
 [297779, 9],
 [467260, 9],
 [547599, 9],
 [675019, 9],
 [732157, 9],
 [832536, 9],
 [851820, 9],
 [874197, 9]]

In [35]:
#where 9 goes to
list(filter(lambda x : x[0]==9,prbNet))

[[9, 71107],
 [9, 71108],
 [9, 71109],
 [9, 21],
 [9, 71110],
 [9, 71111],
 [9, 39350],
 [9, 71112],
 [9, 71113]]

In [36]:
#Same problem as above. Solve with head
graphRev=defaultdict(list)
for i in prbNet:
    graphRev[str(i[1])].append(i[0])
    #Solution.  Add those that only receive, so head
for i in diffinHead.tolist():
    graphRev[str(i)].append(None)

In [37]:
graphRev['9']

[5,
 11,
 26,
 32,
 71106,
 71107,
 71110,
 71112,
 104769,
 104773,
 104807,
 115331,
 280789,
 292544,
 297779,
 467260,
 547599,
 675019,
 732157,
 832536,
 851820,
 874197]

In [38]:
len(graphRev)

875714

In [39]:
#Let's check for sanity
print(prbDict.keys()==graphRev.keys())

True


In [40]:
len(uniqueID)

875714

In [41]:
type(uniqueID)

numpy.ndarray

In [42]:
uniqueID[:10]

array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10])

In [43]:
len(prbDict.keys())

875714

In [44]:
sorted(map(int,list(prbDict.keys())))[:10]

[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

The following check that **all** nodes are present in the graph

In [45]:
print(uniqueID.tolist()==sorted(map(int,list(prbDict.keys()))))

True


In [46]:
print(uniqueID.tolist()==sorted(map(int,list(graphRev.keys()))))

True


In [153]:
list(graphRev.values())[5]

[1, 511593, 840125]

In [147]:
graphRev['3']

[1, 511593, 840125]

In [152]:
graphRev['1']

[1,
 5,
 6,
 7,
 8,
 10,
 12,
 13,
 16,
 17,
 18,
 19,
 20,
 23,
 24,
 25,
 26,
 27,
 28,
 29,
 30,
 31,
 32,
 34,
 35,
 36,
 3121,
 53206,
 53211,
 71113,
 88664,
 88665,
 104772,
 104782,
 104783,
 104784,
 104786,
 104787,
 104788,
 104789,
 104790,
 104791,
 104809,
 104812,
 104813,
 104814,
 104815,
 104817,
 104818,
 104819,
 104820,
 104821,
 104822,
 110437,
 110438,
 110439,
 110440,
 110441,
 110442,
 115314,
 115336,
 115342,
 115355,
 115399,
 115401,
 124672,
 124673,
 133568,
 171168,
 171169,
 171170,
 171172,
 171175,
 171176,
 171177,
 171178,
 171179,
 171182,
 171188,
 176657,
 176658,
 176659,
 176660,
 176661,
 176662,
 176663,
 176664,
 176667,
 212282,
 212283,
 212284,
 212285,
 212286,
 212287,
 212288,
 212290,
 212291,
 212292,
 212293,
 212294,
 212295,
 224091,
 240432,
 280787,
 280871,
 297779,
 303368,
 303369,
 303370,
 303371,
 303372,
 307622,
 307624,
 307625,
 307627,
 307628,
 307629,
 307630,
 313160,
 313976,
 318620,
 320269,
 320270,
 347557,


In [151]:
list(graphRev.values())[:2]

[[1,
  5,
  6,
  7,
  8,
  10,
  12,
  13,
  16,
  17,
  18,
  19,
  20,
  23,
  24,
  25,
  26,
  27,
  28,
  29,
  30,
  31,
  32,
  34,
  35,
  36,
  3121,
  53206,
  53211,
  71113,
  88664,
  88665,
  104772,
  104782,
  104783,
  104784,
  104786,
  104787,
  104788,
  104789,
  104790,
  104791,
  104809,
  104812,
  104813,
  104814,
  104815,
  104817,
  104818,
  104819,
  104820,
  104821,
  104822,
  110437,
  110438,
  110439,
  110440,
  110441,
  110442,
  115314,
  115336,
  115342,
  115355,
  115399,
  115401,
  124672,
  124673,
  133568,
  171168,
  171169,
  171170,
  171172,
  171175,
  171176,
  171177,
  171178,
  171179,
  171182,
  171188,
  176657,
  176658,
  176659,
  176660,
  176661,
  176662,
  176663,
  176664,
  176667,
  212282,
  212283,
  212284,
  212285,
  212286,
  212287,
  212288,
  212290,
  212291,
  212292,
  212293,
  212294,
  212295,
  224091,
  240432,
  280787,
  280871,
  297779,
  303368,
  303369,
  303370,
  303371,
  303372,
  3076

In [158]:
dictM={'1':[3,6,7],'2':[10,5,30],'4':[100,200]}
print(dictM.values())

dict_values([[3, 6, 7], [10, 5, 30], [100, 200]])


In [112]:
def loaddata(filename):

    with open(filename,"r") as f:
        edges = [list(map(int,line.split())) for line in f]
    
    nodes = list(set([v for edge in edges for v in edge]))
    G = {i: [] for i in range(1,len(nodes)+1)}
    Grev = {i: [] for i in range(1,len(nodes)+1)}
    for edge in edges:
        G[edge[0]] += [edge[1]]
        Grev[edge[1]] += [edge[0]]

    return G,Grev,nodes

#let's see what it can do

In [95]:
with open('SCC.txt',"r") as f:
    edges = [list(map(int,line.split())) for line in f]
print(edges[:20])    

[[1, 1], [1, 2], [1, 5], [1, 6], [1, 7], [1, 3], [1, 8], [1, 4], [2, 47646], [2, 47647], [2, 13019], [2, 47648], [2, 47649], [2, 47650], [2, 7700], [2, 47651], [2, 47652], [3, 511596], [5, 1], [5, 9]]


In [96]:
[v-1 for edge in edges for v in edge][:30]

[0,
 0,
 0,
 1,
 0,
 4,
 0,
 5,
 0,
 6,
 0,
 2,
 0,
 7,
 0,
 3,
 1,
 47645,
 1,
 47646,
 1,
 13018,
 1,
 47647,
 1,
 47648,
 1,
 47649,
 1,
 7699]

In [97]:
nodes = list(set([v-1 for edge in edges for v in edge]))

In [98]:
nodes[-30:]

[875684,
 875685,
 875686,
 875687,
 875688,
 875689,
 875690,
 875691,
 875692,
 875693,
 875694,
 875695,
 875696,
 875697,
 875698,
 875699,
 875700,
 875701,
 875702,
 875703,
 875704,
 875705,
 875706,
 875707,
 875708,
 875709,
 875710,
 875711,
 875712,
 875713]

In [99]:
print(nodes[:20])
print(len(nodes))

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]
875714


In [113]:
graph1,graph2,nods=loaddata('SCC.txt')

In [114]:
graph1[1]

[1, 2, 5, 6, 7, 3, 8, 4]

In [115]:
graph1[4]

[]

In [116]:
type(graph1)

dict

In [117]:
graph2[4]

[1]

In [118]:
len(list(graph1.keys()))

875714

In [119]:
len(list(graph2.keys()))

875714

In [120]:
print(type(nods))

<class 'list'>


In [121]:
len(nods)

875714

In [122]:
nods==list(range(875714))

False

In [123]:
nods[:20]

[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20]

In [124]:
nods[-20:]

[875695,
 875696,
 875697,
 875698,
 875699,
 875700,
 875701,
 875702,
 875703,
 875704,
 875705,
 875706,
 875707,
 875708,
 875709,
 875710,
 875711,
 875712,
 875713,
 875714]

Note, the above type is from a working implementation and it produces a graph similar to ours, so it is not in the graph creation the problem. The problem is clearly in the sorted graph. 

### Test Cases
<br>
Test cases imported from the forums


In [125]:
ans_Dict={
    'A':[3,3,3,0,0],
    'B':[3,3,2,0,0],
    'C':[3,3,1,1,0],
    'D':[7,1,0,0,0],
    'E':[6,3,2,1,0]
}

In [126]:
testLists={'A' :[[1, 4],[2, 8],[3, 6],[4, 7],[5 ,2],[6, 9],[7, 1],[8, 5],[8, 6],[9, 7],[9, 3]], 
           'B':[[1, 2],[2 ,6],[2, 3],[2, 4],[3, 1],[3, 4],[4, 5],[5, 4],[6, 5],[6, 7],[7, 6],[7, 8],[8, 5],[8, 7]],
           'C':[[1, 2],[2, 3],[3, 1],[3, 4],[5, 4],[6, 4],[8, 6],[6, 7],[7, 8]],
           'D':[[1, 2],[2, 3],[3, 1],[3, 4],[5, 4],[6, 4],[8, 6],[6, 7],[7, 8],[4, 3],[4, 6]],
           'E':[[1,2],[2,3],[2,4],[2,5],[3,6],[4,5],[4,7],[5,2],[5,6],[5,7],[6,3],[6,8],[7,8],[7,10],[8,7],[9,7],[10,9],
                [10,11],[11,12],[12,10]]
}

In [127]:
#not needed
'''#Turn into adjacency list
#First get all unique values
testIDs=defaultdict(list)
for i,k in test_Dict.items():
    testIDs.append(np.unique(np.array(k)).tolist())
print(testIDs)'''

'#Turn into adjacency list\n#First get all unique values\ntestIDs=defaultdict(list)\nfor i,k in test_Dict.items():\n    testIDs.append(np.unique(np.array(k)).tolist())\nprint(testIDs)'

In [58]:
#Turn all lists into dictionaries

In [59]:
#Useful
nodeN=len(uniqueID)
print(nodeN)

875714


In [60]:
list(range(10,0,-1))

[10, 9, 8, 7, 6, 5, 4, 3, 2, 1]

## Actual Code

Look at
https://github.com/ladamalina/coursera-algo/blob/master/PQ4.%20SCCs/kosaraju.py
<br
Here a coherent use is made of global variables, making it more similar to the variant written down in the course
<br>
The following version is also quite interesting <br>
https://github.com/hennymac/Algorithms1/blob/master/Week4/scc.py
<br>
Furthermore here
<br>
https://teacode.wordpress.com/2013/07/27/algo-week-4-graph-search-and-kosaraju-ssc-finder/ 
<br>
and here
<br>
https://codereview.stackexchange.com/questions/29404/calculating-strongly-connected-components-in-a-directed-graph-using-dfs
Why are global variables used?
We are trying to refer to a graph only by reference, as it is very large, with especially many vertices.
Passing back and forth temporary variables might not be the best option.

In [61]:
#Note. Could push a list of nodes in either of the two

In [159]:
#Define L

def SSC(graph,graphRev,nNodes):
    #Define these in the outer loop, indexed by STRING
    global finishing
    global leader
    global totN
    totN =nNodes
    #print('Total nodes',totN)
    finishing={}
    leader={}
    #DFS on reverse
    DFSLoop(graphRev)
    #DFS on normal graph, but needs to go into the right order.
    graphSort={}
    #indexed by int
    #this is likely to not be adequate, as it does not preserve any information. We need to do a reverse graph
    #the vertices might not actually correspond to the values, only if they get added at the same time!!!
    #print(nodeN)
    
    
    
    #this range still seems ok
    for l in range(1,totN+1):
        ##print(graphV[int(l)-1])
        #print(finishing[l])
        if not graph[l]:
            graphSort[finishing[l]]=[]
        else:
            graphSort[finishing[l]]=sorted([finishing[k] for k in graph[l]])
        
        
        
        
        
        
        '''
        if not graphV[l-1]:
            #what to do if it shows empty?
            #Means it has no assigned finishing times
            #what to attribute?
            #print('This is the graph transposed',graphV[l-1])
            
            graphSort[finishing[l]]=[]
        else:    
            graphSort[finishing[l]]=[finishing[k] for k in graphV[l-1]]
         '''   
    #print('This is graphsort',graphSort)    
    #Based on finishing time
    DFSLoop(graphSort)
    return leader
    
#Depth First Search Loop
def DFSLoop(graph):
    #These must be inherited by all loops below
    #T is a time variable, for exploration
    global t
    #s is the current leader node
    global s
    #List of explored nodes
    global exp
    t=0
    s=0
    #tried list, but dictionaries are easier retrieval and faster
    exp={}
    for i in range(1,totN+1):
        exp[i]=False
    #print(list(exp.keys()))
    #as it appears in lectures
    for node in range(totN,0,-1):
        if exp[node]==False:
            s = node
            DFSInner(graph,node)
            
    #Empty return as we work with global variables
    return
        
#Depth First search instance
def DFSInner(graph,node):
    #since it's modified, it needs to be restated
    global t
    #exploration
    exp[node]=True
    #Will determine order in Sorted graph
    leader[node]=s
    #Added None condition
    for vrt in graph[node]:
        #check if we have gone along this way
        #print('This is vrt', vrt)
        #print(exp[str(vrt)])
        if exp[vrt]==False:
        #Explore with a smaller recursive call
            DFSInner(graph,vrt)

    t+=1
    #debug

    finishing[node]=t
    #Empty , as we work with global variables
    return


Neither case works for 4th test case. What is the difference with the other ones?

In [63]:
alpha=[5]
alpha[-1]

5

In [64]:
alpha=[True, True, True]
if all(el==True for el in alpha):
    print('Worked')
else:
    print('Didnt work')

Worked


Interesting solution from Stack exchange at <br>
https://stackoverflow.com/questions/24051386/kosaraju-finding-finishing-time-using-iterative-dfs <br>
Note the importance of using a stack <br>
Let's rewrite this to use an order of visit

In [163]:
#Fully iterative

def SSC(graph,graphRev,nNodes):
    #Define these in the outer loop, indexed by STRING
    global finishing
    global leader
    global totN
    totN =nNodes
    #print('Total nodes',totN)
    finishing={}
    leader={}
    #DFS on reverse
    DFSLoopIterative(graphRev)
    #DFS on normal graph, but needs to go into the right order.
    graphSort={}
    #indexed by int
    graphV=list(graph.values())
    #print(nodeN)
    #print('Finishing times',finishing)
    
    
    #avoid this, just give new order
    
    #this range still seems ok
    for l in range(1,totN+1):
        ##print(graphV[int(l)-1])
        #print(finishing[l])
        if not graph[l]:
            graphSort[finishing[l]]=[]
        else:
            graphSort[finishing[l]]=sorted([finishing[k] for k in graph[l]])

    #Based on finishing time
    DFSLoopIterative(graphSort)
    return leader

#Depth First Search Loop
def DFSLoopIterative(graph):
    #These must be inherited by all loops below
    #However now it's not necessary to have them as global anymore
    #T is a time variable, for exploration
    #s is the current leader node
    #List of explored nodes
    exp={}

    t=0
    s=0
    #tried list, but dictionaries are easier retrieval and faster
    for i in range(1,totN+1):
        exp[i]=False
    #print(list(exp.keys()))
    #this is the outer loop, rewritten iteratively
        
    for node in range(totN,0,-1):
        if exp[node]==False:
            #establish this node as current leader
            s = node
            #add to stack
            stack = [node]
            while stack:
                #
                node = stack[-1]
                #This node is explored
                exp[node] = True
                leader[node]=s
                # Check if everything coming out of v has been explored
                graphexp=[True]
                if node in graph:
                    for vrt in graph[node]:
                        if not exp[vrt]:
                            graphexp.append(False)
                        
                #[i for i in graph[node] if exp[i]!=False]        
                #This is the equivalent of the recursive call
                if not all(l==True for l in graphexp):
                    for vrt in graph[node]:

                    # Explore the vertex before others attached to v if it's not explored
                        if not exp[vrt]:
                            stack.append(vrt)
                            break

                        # We have explored vertices findable from v
                else:
                    stack.pop()
                    t += 1
                    finishing[node]= t            
    #Empty return as we work with global variables
    return


In [164]:
alpha={'A':0,'B':2}
print('A' in alpha)

True


In [165]:
testKeys=list(testLists.keys())
print(testKeys)

['A', 'B', 'C', 'D', 'E']


In [166]:
#Test cases
sys.setrecursionlimit(800000)
threading.stack_size(67108864)


def main():
    for k in testKeys:
        testNet=testLists[k]
        #print('network being tested \n',testNet)
        #Adjacency list
        testNet2=np.array(testNet)
        uniqueIDTest=np.unique(testNet2)
        netNodes=len(uniqueIDTest)
        uniqueHeadTest=np.unique(testNet2[:,0])
        uniqueTailTest=np.unique(testNet2[:,1])
        diffinHeadTest=np.setdiff1d(uniqueHeadTest,uniqueTailTest)
        diffinTailTest=np.setdiff1d(uniqueTailTest,uniqueHeadTest)
        #Make normal graph and reverse graph
        graphT=defaultdict(list)
        graphRevT=defaultdict(list)
        for i in testNet:
            graphT[i[0]].append(i[1])
            graphRevT[i[1]].append(i[0])
        
        #Add those without nodes? Make work with empty list
        
        for i in diffinTailTest.tolist():
            graphT[str(i)]=[]   
        #make Reverse graph
            
        for i in diffinHeadTest.tolist():
            graphRevT[str(i)]=[]  
        print('graph to be tested',graphT)
        #Run the actual algorithm
        ldTest=SSC(graphT,graphRevT,netNodes)
        #This has given us a graph, reordered.
        ldCount=Counter(list(ldTest.values())).most_common(5)
        solution=[]
        for i in ldCount:
            solution.append(i[1])
            
        print('net solution \n',solution)
        print('test solution \n',ans_Dict[k])
thread = threading.Thread(target=main)
thread.start()

graph to be tested defaultdict(<class 'list'>, {1: [4], 2: [8], 3: [6], 4: [7], 5: [2], 6: [9], 7: [1], 8: [5, 6], 9: [7, 3]})
net solution 
 [3, 3, 3]
test solution 
 [3, 3, 3, 0, 0]
graph to be tested defaultdict(<class 'list'>, {1: [2], 2: [6, 3, 4], 3: [1, 4], 4: [5], 5: [4], 6: [5, 7], 7: [6, 8], 8: [5, 7]})
net solution 
 [3, 3, 2]
test solution 
 [3, 3, 2, 0, 0]
graph to be tested defaultdict(<class 'list'>, {1: [2], 2: [3], 3: [1, 4], 5: [4], 6: [4, 7], 8: [6], 7: [8], '4': []})
net solution 
 [3, 3, 1, 1]
test solution 
 [3, 3, 1, 1, 0]
graph to be tested defaultdict(<class 'list'>, {1: [2], 2: [3], 3: [1, 4], 5: [4], 6: [4, 7], 8: [6], 7: [8], 4: [3, 6]})
net solution 
 [7, 1]
test solution 
 [7, 1, 0, 0, 0]
graph to be tested defaultdict(<class 'list'>, {1: [2], 2: [3, 4, 5], 3: [6], 4: [5, 7], 5: [2, 6, 7], 6: [3, 8], 7: [8, 10], 8: [7], 9: [7], 10: [9, 11], 11: [12], 12: [10]})
net solution 
 [6, 3, 2, 1]
test solution 
 [6, 3, 2, 1, 0]


### Problem Case
<br>
Some suggestions from the forum, using threading and improved recursion limits<br>
Huge thing on threading<br>
http://chriskiehl.com/article/parallelism-in-one-line/

In [167]:
sys.setrecursionlimit(800000)
threading.stack_size(67108864)

def main():
    

    prbNet2=np.array(prbNet)
    uniqueID=np.unique(prbNet2)
    netNodes=len(uniqueID)
    
    #Build Graphs
    prbDict=defaultdict(list)
    graphRev=defaultdict(list)
    for i in prbNet:
        prbDict[i[0]].append(i[1])
        graphRev[i[1]].append(i[0])
    #Solution.  Add those that only receive, so head
    for i in diffinHead.tolist():
        graphRev[i]=[]
    for i in diffinTail.tolist():
        prbDict[i]=[]
    
    #Run the actual algorithm
    ldTest=SSC(prbDict,graphRev,netNodes)
    #This has given us a graph, reordered.
    ldCount=Counter(list(ldTest.values()))
    print('net solution \n',ldCount.most_common(5))
        
thread = threading.Thread(target=main)
thread.start()

net solution 
 [(615986, 434821), (617403, 968), (798411, 459), (43840, 313), (709991, 211)]
