In [1]:
# setup
from IPython.core.display import display,HTML
display(HTML('<style>.prompt{width: 0px; min-width: 0px; visibility: collapse}</style>'))
display(HTML(open('../rise.css').read()))

# imports
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
sns.set(style="whitegrid", font_scale=1.5, rc={'figure.figsize':(12, 6)})


  from IPython.core.display import display,HTML


# CMPS 2200
# Introduction to Algorithms

## Minimal Spanning Trees (MST) - Cont'd


 <center>
<img src="figures/muddy_city2.png" width=70%/>
</center>

<span style="color:red">**Problem 1:**</span> [**Traveling Salesperson Problem**] To find a "tour" of the sites with shorest path without twice stops except the starting site.
    
<span style="color:red">**Problem 2:**</span> [**Single-Source Shortest Path**] To find the shortest path given the starting site to other sites.
        
<span style="color:red">**Problem 3:**</span> [**Minimal Spanning Tree**] To find a shortest path to connect all sites.





### Spanning Tree
For a connected undirected graph $G = (V,E)$, a **spanning tree** is a tree $T = (V,E')$ with $E' \subseteq E$. **spanning tree** includes all vertices.

> A tree is a type of graph that is connected, acyclic (meaning it has no cycles or loops), and has a single root node. 

- **minimum spanning tree (MST)**

| Domain         | Application                            |
| ---------------| -------------------------------------- |
| Networks       | Cable layout, power grids, water pipes |
| Data science   | Clustering, segmentation               |
| Algorithms     | TSP approximation, Steiner heuristics  |
| Hardware       | VLSI layout optimization               |
| Logistics      | Road/pipeline planning                 |
| Graph theory   | Bottleneck edges, connectivity         |
| Robotics       | Mapping & motion planning              |


### Graph Cut: Light-edge Property

We can view the $visited$ and $frontier$ sets as defining a **graph cut**.


A **graph cut** of a graph $(G,V)$ is a partitioning of vertices $V_1 \subset V$, $V_2 = V - V_1$.

Each vertex set $V_i \subset V$ defines a **vertex-induced subgraph** consisting of edges where both endpoints are in $V_i$.

For example:

<center>
    <img src="figures/cut1.jpg"/>
</center>

In this partition, we have:

- $G_1 = (V_1, E_1)~~~~V_1=\{a,b,c,d\}, E_1 = \{(a,b), (a,c), (b,d), (c, d)\}$
- $G_2 = (V_2, E_2)~~~~V_2=\{e,f\}, E_2 = \{(e,f)\}$


The **cut edges** are those that join the two subgraphs, e.g., $\{(b,e), (d,f)\}$.


We want to know if the **lightest edge** between the $visited$ and $frontier$ sets will be in the MST.



## Prim's Algorithm

Perform **priority-first search** on $G$ starting from an <span style="color:red">**arbitrary**</span> vertex $s$.

To select the next edge to expand the frontier $X$, use priority:
- $p(v) = \min_{x \in X} w(x,v)$
- Add the chosen edge $(u,v)$ to the tree.



<center>
    <img src="figures/prim.jpg" width=40%/>
</center>

- Edge $(c, f)$ has minimum weight across the cut $(X,Y)$.
- So, we visit $f$ by adding it to the frontier


This sounds very similar to Dijkstra's algorithm. **What's the difference?**

In [1]:
from heapq import heappush, heappop 

def dijkstra(graph, source):
    def dijkstra_helper(visited, frontier):
        if len(frontier) == 0:
            return visited
        else:
            distance, node = heappop(frontier)
            if node in visited:
                return dijkstra_helper(visited, frontier)
            else:
                print('visiting', node)
                visited[node] = distance
                for neighbor, weight in graph[node]:
                    heappush(frontier, (distance + weight, neighbor))                
                return dijkstra_helper(visited, frontier)
        
    frontier = []
    heappush(frontier, (0, source))
    visited = dict()  # store the final shortest paths for each node.
    return dijkstra_helper(visited, frontier)

graph = {
            's': {('a', 4), ('b', 8)},
            'a': {('s', 4), ('b', 2), ('c', 5)},
            'b': {('s', 8), ('a', 2), ('c', 3)}, 
            'c': {('a', 5), ('b', 3), ('d', 3)},
            'd': {('c', 3)},
        }
dijkstra(graph, 's')

visiting s
visiting a
visiting b
visiting c
visiting d


{'s': 0, 'a': 4, 'b': 6, 'c': 9, 'd': 12}

In [4]:
from heapq import heappush, heappop 

def prim(graph, source):
    def prim_helper(visited, frontier):
        if len(frontier) == 0:
            return visited
        else:
            weight, node = heappop(frontier)
            if node in visited:
                return dijkstra_helper(visited, frontier)
            else:
                print('visiting', node)
                visited[node] = weight
                for neighbor, weight in graph[node]:
                    heappush(frontier, (weight, neighbor))                
                return dijkstra_helper(visited, frontier)
        
    frontier = []
    heappush(frontier, (0, source))
    visited = dict()  # store the final shortest paths for each node.
    return prim_helper(visited, frontier)

graph = {
            's': {('a', 4), ('b', 8)},
            'a': {('s', 4), ('b', 2), ('c', 5)},
            'b': {('s', 8), ('a', 2), ('c', 3)}, 
            'c': {('a', 5), ('b', 3), ('d', 3)},
            'd': {('c', 3)},
        }
dijkstra(graph, 's')

visiting s
visiting a
visiting b
visiting c
visiting d


{'s': 0, 'a': 4, 'b': 2, 'c': 3, 'd': 3}

In [5]:
def prim(graph):
    def prim_helper(visited, frontier, tree):
        if len(frontier) == 0:
            return tree
        else:
            weight, node, parent = heappop(frontier)
            if node in visited:
                return prim_helper(visited, frontier, tree)
            else:
                print('visiting', node)
                # record this edge in the tree
                tree.add((weight, node, parent))
                visited.add(node)
                for neighbor, w in graph[node]:
                    heappush(frontier, (w, neighbor, node))    
                    # compare with dijkstra:
                    # heappush(frontier, (distance + weight, neighbor))                

                return prim_helper(visited, frontier, tree)
        
    # pick first node as source arbitrarily
    source = list(graph.keys())[0]
    frontier = []
    heappush(frontier, (0, source, source))
    visited = set()  # store the visited nodes (don't need distance anymore)
    tree = set()
    prim_helper(visited, frontier, tree)
    return tree

prim(graph)

visiting s
visiting a
visiting b
visiting c
visiting d


{(0, 's', 's'), (2, 'b', 'a'), (3, 'c', 'b'), (3, 'd', 'c'), (4, 'a', 's')}

<span style="color:red">**Question**:</span> Suppose you run Dijkstra’s algorithm from node s, and the shortest distance to node t is 10. If all edge weights are increased by 5, what is the new shortest distance to t?

<span style="color:blue">**Answer**:</span> Cannot determine without re-running the algorithm

<br>

<br>

<br>


<span style="color:orange">**Question**:</span> What about minimal spanning tree, if all edge weights are increased by 5?

## Work of Prim's Algorithm

This does identical work to Dijkstra, so $O(|E| \log |E|)$

Can we just pick an arbitrary source node? Why or why not?

What about directed graphs? Will this work?

No - if source node is not connected to all other nodes.

Even if it is, we may have a suboptimal solution:

![figures/prim-fail.png](figures/prim-fail.png)


<span style="color:red">**Note:**</span> When all nodes are within one strongly connected componet (SCC), we can pick any source node.

### Parallelism: Can we start from all nodes at the same time?


 <center>
<img src="figures/muddy_city2.png" width=70%/>
</center>


Consider a trivial cut: **one vertex in one partition, everything else in the other.**

We know that lowest weight edge from a vertex must be in MST. Call these the **vertex-bridges** of the graph.

<center>
<img src="figures/bridges.jpg" width=50%/>
</center>

Are we done? 

We haven't necessarily selected $n-1$ edges, which we need to have a MST.

The problem is some edges are selected by multiple vertices -- e.g., $a$ and $b$ both pick edge $(a,b)$.

So, we need to repeat this somehow, efficiently.

If we could collapse together the vertices connected by the selected edges, then we could solve a smaller version of this problem.

This is exactly what **contraction** is for!

<center>
    <img src="figures/borukva1.jpg" width=40%/>
    <img src="figures/borukva2.jpg" width=10%/>    
</center>

<br><br>

Due to light edge property, we know we should select $(e,f)$ which has minimum weight of $4$.

<br>

Notice that by collapsing vertices, we ignore internal edges -- e.g., if there were an edge from $c$ to $f$, we would ignore it when collapsing $c,d,f$. Why is this okay? 

## Borůvka's Algorithm

While there are edges remaining:

- select the minimum weight edge out of each vertex and contract each part defined by these edges into a vertex;

- remove self edges, and when there are redundant edges keep the minimum weight edge; and

- add all selected edges to the MST.


How many vertices will we contract at each iteration? Consider the example again:

<center>
    <img src="figures/borukva1.jpg" width=40%/>
</center>



**Best-case**: contract away $n-1$ vertices. We found the MST in one iteration!


<center>
    <img src="figures/borukva3.jpg" width=40%/>
</center>

**Worst-case**: contract away $\frac{n}{2}$ vertices. Each edge removes a single vertex.
<center>
    <img src="figures/borukva4.png" width=40%/>
</center>

<br>

So, we're guaranteed to remove $n/2$ vertices at each iteration.

Total number of contraction iterations is $O(\lg n)$

## Contracting in Borůvka's Algorithm

How can we contract these partitions?

<center>
    <img src="figures/borukva1.jpg" width=40%/>
</center>



## Implementing Borůvka's Algorithm

While there are edges remaining:

- select the minimum weight edge out of each vertex and contract each part defined by these edges into a vertex;
  - Can implement with a min `reduce` at each vertex: span $\Rightarrow O(\lg |V|)$.
  
  
- remove self edges, and when there are redundant edges keep the minimum weight edge;
  - One run of contraction: span $\Rightarrow O(\lg |V|)$.


- add all selected edges to the MST.
  - `filter`: span $\Rightarrow O(\lg |E|) \in O(\lg |V|)$


<br>

$S(|V|) = S(\frac{3|V|}{4}) + \lg |V| \in O(\lg^2 |V|)$