# Minimum Spanning Tree & Shortest Path
The goal of this assignment is to develop a better understanding of the minimum spanning tree and shortest path algorithms and their use in practice.

## Exercise – Is an edge e in some MST?
To begin with, we consider a simple solution, where we compute the MST of an undirected, connectwed, weighted (with weights >=0) graph G, then check if e belongs to the MST of G. We practice implementing Prim's algorithm. Try also implementing Kruskal's algorithm.

In [2]:
class Graph:
    def __init__(self, edges):
        self.adjList = dict()
        self.mstEdges = []

        for (u, v), weight in edges.items():
            if  u in self.adjList:
                self.adjList[u].append((v, weight)) 
            else:
                self.adjList[u] = [(v, weight)]
                
            if v in self.adjList:
                self.adjList[v].append((u, weight)) 
            else:
                self.adjList[v] = [(u, weight)]

    def primMST(self):
        mstNodes = dict()
        mstcost = 0
        
        source = None
        for k in self.adjList.keys():
            if source is None:
                source = k
            mstNodes[k] = False
                    
        priority_queue = { source : [0, None] }
 
        while priority_queue :
            # choose the adjacent node with the least edge cost
            # remove it from the priority list
            node = min(priority_queue, key = lambda x : priority_queue[x][0])
            cost = priority_queue[node][0]
            parent = priority_queue[node][1]
            
            del priority_queue[node]

            if mstNodes[node] == False :
                mstcost = mstcost + cost
                mstNodes[node] = True
                if parent is not None:
                    self.mstEdges.append((parent, node, cost))

                for (v, w) in self.adjList[node]:
                    if mstNodes[v] == False :
                        priority_queue[v] = (w, node)
                        
        print("MST edges:", self.mstEdges)                
        return mstcost
    
    def isEdgeInMST(self, edge):
        u = edge[0]
        v = edge[1]
        w = edge[2]
        if (u, v, w) in self.mstEdges:
            return True
        else:
            return False

            #debug code
    def printGraph(self):
        for u, l in self.adjList.items():
            print("from", u, "i can reach:", l)


In [3]:
#test code

testedges = {(0,1):3, (0,2):2, (1,2):1, (1,4):7, (1,3):10, (3,4):9}
testG = Graph(testedges)

testG.printGraph()
cost = testG.primMST()

# expected output: 19, True, False

print("MST cost is:", cost)
print("(0,2,2) in MST?", testG.isEdgeInMST((0,2,2)))
print("(0,1,3) in MST?", testG.isEdgeInMST((0,1,3)))


from 0 i can reach: [(1, 3), (2, 2)]
from 1 i can reach: [(0, 3), (2, 1), (4, 7), (3, 10)]
from 2 i can reach: [(0, 2), (1, 1)]
from 4 i can reach: [(1, 7), (3, 9)]
from 3 i can reach: [(1, 10), (4, 9)]
MST edges: [(0, 2, 2), (2, 1, 1), (1, 4, 7), (4, 3, 9)]
MST cost is: 19
(0,2,2) in MST? True
(0,1,3) in MST? False


Because we explicitely built the MST, the above code has linearithmic complexity E log E. To make it run in linear time, consider working with a graph G' that only contains the edges of G with weight strictly less than that of edge e = (u, v, w). We can now check whether (u, v) are still connected (for example, using DFS or BFS). If they are, then e does not belong to any minimum spanning tree. If they are not connected, then e is in some minimum spanning tree. This takes only linear time to check in the worst case scenario (for running DFS / BFS).

In [None]:
from collections import deque

class Graph:
    def __init__(self, edges):
        self.adjList = dict()
        self.mstEdges = []

        for (u, v), weight in edges.items():
            if  u in self.adjList:
                self.adjList[u].append((v, weight)) 
            else:
                self.adjList[u] = [(v, weight)]
                
            if v in self.adjList:
                self.adjList[v].append((u, weight)) 
            else:
                self.adjList[v] = [(u, weight)]
                
    def printGraph(self):
        for u, l in self.adjList.items():
            print("from", u, "i can reach:", l)

            
            
def isEdgeInMSTLinear(graph, edge):
    source = edge[0]
    dest = edge[1]
    maxweight = edge[2]

    visited = dict()      
    visited[source] = True

    q = deque()
    q.append(source)

    while len(q)>0:

        u = q.popleft()

        # do for every edge u --> (v, w)
        for v, w in graph.adjList[u]:
            # if  u has not been visited yet
            if v not in visited and w < maxweight:
                visited[v] = True
                q.append(v)
                  
    if (source in visited) and (dest in visited):
        return False
    else:
        return True

In [None]:
testedges = {(0,1):3, (0,2):2, (1,2):1, (1,4):7, (1,3):10, (3,4):9}
testG = Graph(testedges)

# expectd output: F / T / T / T / F / T
print("is (0,1,3) in MST?", isEdgeInMSTLinear(testG, (0,1,3)))
print("is (0,2,2) in MST?", isEdgeInMSTLinear(testG, (0,2,2)))
print("is (1,2,1) in MST?", isEdgeInMSTLinear(testG, (1,2,1)))
print("is (1,4,7) in MST?", isEdgeInMSTLinear(testG, (1,4,7)))
print("is (1,3,10) in MST?", isEdgeInMSTLinear(testG, (1,3,10)))
print("is (3,4,9) in MST?", isEdgeInMSTLinear(testG, (3,4,9)))



## Exercise - Monotonic Shortest Path
To find shortest paths that are monotonic (with edge weights that are strictly increasing or strictly decreasing), we implement a sightly modified version of Dijkstra's shortest paths algorithm, where we relax edges only if they respect an overall ascending / descending order along the relevant path. 

In [5]:
import sys 

class Graph:
    def __init__(self, edges):
        self.adjList = dict()
        self.mstEdges = []

        for (u, v), weight in edges.items():
            if  u in self.adjList:
                self.adjList[u].append((v, weight)) 
            else:
                self.adjList[u] = [(v, weight)]
                
            if v in self.adjList:
                self.adjList[v].append((u, weight)) 
            else:
                self.adjList[v] = [(u, weight)]
                
    def printGraph(self):
        for u, l in self.adjList.items():
            print("from", u, "i can reach:", l)

            
    # original version
    def relax(self, u, v, w, d_edge, d_dist, pq):
        if d_dist[v] > d_dist[u] + w:
            d_dist[v] = d_dist[u] + w
            d_edge[v] = (u,v,w) # latest edge in SP to v
            pq[v] = d_dist[v]

    def dijkstra(self, source):

        spEdgeTo = dict()
        spDistTo = dict()
        
        for k in self.adjList.keys():
            spEdgeTo[k] = None
            spDistTo[k] = sys.maxsize
        spDistTo[source] = 0
        
        priority_queue = { source : 0 }
 
        while priority_queue :
            node = min(priority_queue, key = lambda x : priority_queue[x])
            del priority_queue[node]
            for v, w in self.adjList[node]: 
                self.relax(node, v, w, spEdgeTo, spDistTo, priority_queue)

        return spEdgeTo
    
    # monotonic version 
    def monotonic_relax(self, u, v, w, d_edge, d_dist, pq):
        if u in d_edge:
            # monotonic condition check: the weight of the SP edge into u (d_edge[u][2])
            # is less than the next weight in the path (u,v):w
            if d_edge[u][2] < w and d_dist[v] > d_dist[u] + w:
                d_dist[v] = d_dist[u] + w
                d_edge[v] = (u,v,w) # latest edge in SP to v
                pq[v] = d_dist[v]

    def monotonic_dijkstra(self, source):
        spEdgeTo = dict()
        spDistTo = dict()
        
        for k in self.adjList.keys():
            spEdgeTo[k]= None
            spDistTo[k] = sys.maxsize
        spDistTo[source] = 0
        # to be able to check the monotonic condiction, for each node x in the SP
        # we maintain [from, x, weight(from,x)]
        # for the root node, we set from=None and w=-1
        spEdgeTo[source] = (None, 0, -1) 
        
        priority_queue = { source : 0 }
 
        while priority_queue :
            node = min(priority_queue, key = lambda x : priority_queue[x])
            del priority_queue[node]
            for v, w in self.adjList[node]: 
                self.monotonic_relax(node, v, w, spEdgeTo, spDistTo, priority_queue)

        return spEdgeTo


In [6]:
# test code (Dijktra and monotinic Dijkstra give same SP)

edges = {(0,1):0, (0,2):50, (1,2):30, (1,4):35, (1,3):10, (3,4):15}
g = Graph(edges)

sp = g.dijkstra(0)
print("EdgeTo array using Dijkstra:", sp)

spm = g.monotonic_dijkstra(0)
print("EdgeTo array using monotonic Dijkstra:", spm)

EdgeTo array using Dijkstra: {0: None, 1: (0, 1, 0), 2: (1, 2, 30), 4: (3, 4, 15), 3: (1, 3, 10)}
EdgeTo array using monotonic Dijkstra: {0: (None, 0, -1), 1: (0, 1, 0), 2: (1, 2, 30), 4: (3, 4, 15), 3: (1, 3, 10)}


In [7]:
# test code (Dijktra and monotinic Dijkstra give different SP)

edges = {(0,1):40, (0,2):50, (1,2):30, (1,4):20, (1,3):10, (3,4):15}
g = Graph(edges)

sp = g.dijkstra(0)
print("EdgeTo array using Dijkstra:", sp)

spm = g.monotonic_dijkstra(0)
print("EdgeTo array using monotonic Dijkstra:", spm)

EdgeTo array using Dijkstra: {0: None, 1: (0, 1, 40), 2: (0, 2, 50), 4: (1, 4, 20), 3: (1, 3, 10)}
EdgeTo array using monotonic Dijkstra: {0: (None, 0, -1), 1: (0, 1, 40), 2: (0, 2, 50), 4: None, 3: None}


## Exercise – Centrality metrics in the IMDB actor graph
We use the same representation of the IMDB graph as used in the previous exercise sheet (week 7). Using that representation, we can directly compute degree centrality. For closeness centrality, we compute Dikstra's shortest paths, then return the average length of all paths discovered.

In [8]:
import sys 
from collections import deque

class Graph:
    def __init__(self, edges):
        # we assume "egdges" is a list, with each element in the list being a tuple [(actor a, actor b), count]
        # we use a dict() so to use actor names as indices
        self.adjList = dict()

        for (a1, a2), count in edges.items():
            if  a1 in self.adjList:
                self.adjList[a1].append((a2, count)) 
            else:
                self.adjList[a1] = [(a2, count)]
                
            if a2 in self.adjList:
                self.adjList[a2].append((a1, count)) 
            else:
                self.adjList[a2] = [(a1, count)]

    def printGraph(self):
        for a, costars in self.adjList.items():
            print("actor", a, "played with:", costars)

    def relax(self, u, v, w, d_edge, d_dist, pq):
        if d_dist[v] > d_dist[u] + w:
            d_dist[v] = d_dist[u] + w
            d_edge[v] = (u,v,w) # latest edge in SP to v
            pq[v] = d_dist[v]

    def dijkstra(self, source):
        spEdgeTo = dict()
        spDistTo = dict()
        
        for k in self.adjList.keys():
            spEdgeTo[k] = None
            spDistTo[k] = sys.maxsize
        spDistTo[source] = 0
        spEdgeTo[source] = ("root node", source, 0)
        
        priority_queue = { source : 0 }
 
        while priority_queue :
            node = min(priority_queue, key = lambda x : priority_queue[x])
            del priority_queue[node]
            for v, w in self.adjList[node]: 
                self.relax(node, v, w, spEdgeTo, spDistTo, priority_queue)

        return spEdgeTo
    
    def degreeCentrality(self, actor):
        if actor in self.adjList:
            return len(self.adjList[actor])
        else:
            return 0

    def closenessCentrality(self, actor):
        if actor not in self.adjList:
            return sys.maxsize
        
        spEdges = self.dijkstra(actor)
 
        distances = dict()
        distances[actor] = 0

        q = deque()
        q.append(actor)
        while len(q) > 0:
            u = q.popleft()            
            nextiter = {a:sp for a, sp in spEdges.items() if sp and sp[0] == u}
            for k, v in nextiter.items():
                q.append(k)
                distances[k] = distances[u] + v[2]
        
        sumdist = 0
        for a,d in distances.items():
            sumdist = sumdist + d
        return len(distances.keys())/sumdist
        

In [9]:
# debug code on a small graph

edges = {("A", "B"): 10, ("C", "B"): 5, ("D", "B"): 20, ("C", "D"): 15, ("C", "E"): 30}
testg = Graph(edges)
testg.printGraph()
print("degree A", testg.degreeCentrality("A"))
print("degree B", testg.degreeCentrality("B"))
print("degree C", testg.degreeCentrality("C"))
print("degree D", testg.degreeCentrality("D"))
print("degree E", testg.degreeCentrality("E"))

print("closeness A", testg.closenessCentrality("A"))
print("closeness B", testg.closenessCentrality("B"))
print("closeness C", testg.closenessCentrality("C"))
print("closeness D", testg.closenessCentrality("D"))
print("closeness E", testg.closenessCentrality("E"))

actor A played with: [('B', 10)]
actor B played with: [('A', 10), ('C', 5), ('D', 20)]
actor C played with: [('B', 5), ('D', 15), ('E', 30)]
actor D played with: [('B', 20), ('C', 15)]
actor E played with: [('C', 30)]
degree A 1
degree B 3
degree C 3
degree D 2
degree E 1
closeness A 0.05
closeness B 0.07142857142857142
closeness C 0.07692307692307693
closeness D 0.045454545454545456
closeness E 0.03225806451612903


In [10]:
#driver code on the whole IMDB graph

print("1. reading the IMDB file ...")
with open('4-imdbcostars.txt','r') as f:
    line = f.readline()
    edges =dict()
    while line != '':
        triplet = line.split("/")
        a1 = triplet[0]
        a2 = triplet[1]
        count = triplet[2]
        edges[(a1,a2)] = int(count)
        line = f.readline()
             
print("2. building the IMDB graph ...")
imdbGraph = Graph(edges)


print("3. computing degrees ...")
degrees = dict()
maxdegree = -1
actor_d = ""

for actor in imdbGraph.adjList.keys():
    d = imdbGraph.degreeCentrality(actor)
    degrees[actor] = d
    if d > maxdegree:
        maxdegree = d
        actor_d = actor
print("actor with highest degree:", actor_d, maxdegree)


print("4. computing closeness ...")
closenness = dict()
maxcloseness = -1
actor_c = ""

#for actor in imdbGraph.adjList.keys():
actornames = ["Gray, Ian (I)", "Haywood, Chris (I)", "Flowers, Bess", "Venora, Diane", "Free, Christopher"]
for actor in actornames:
    c = imdbGraph.closenessCentrality(actor)
    closenness[actor] = c
    if c > maxcloseness:
        maxcloseness = c
        actor_c = actor    
print("actor with highest closeness:", actor_c, maxcloseness)


1. reading the IMDB file ...


FileNotFoundError: [Errno 2] No such file or directory: '4-imdbcostars.txt'