In [1]:
# setup
from IPython.core.display import display,HTML
display(HTML('<style>.prompt{width: 0px; min-width: 0px; visibility: collapse}</style>'))
display(HTML(open('../rise.css').read()))

# imports
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
sns.set(style="whitegrid", font_scale=1.5, rc={'figure.figsize':(12, 6)})


  from IPython.core.display import display,HTML


# CMPS 2200
# Introduction to Algorithms

## Minimum Spanning Tree:  Kruskal Algorithm


Prim's algorithm does a graph search while computing the MST based on **light-edge** property. <br><br>

<center>
    <img src="figures/mst_example.jpeg" width=50%/>
</center>

<br>
<br>

<span style="color:red">**Question:**</span>  Can we just greedily add edges in increasing order of weight?

### Kruskal's Algorithm

“Perform the following step as many times as possible: Among the edges of $G$ not yet chosen, choose the shortest edge which does not form any loops with those edges already chosen.” [Kruskal, 1956]



**Implementation:** 
- **Step 1**: Edge Sorting 
- **Step 2**: For an edge $(u,v)$, we must check if $u$ and $v$ are in the same connected component, based on the edges added so far.


<span style="color:red">**Question:**</span>  When will we stop? 

<span style="color:blue">**Question:**</span>  How can we check if $u$ and $v$ from an edge $(u,v)$ are in the same connected component or not?



We can run BFS or DFS starting at $u$ or $v$, but this would be expensive: $O(|V| + |E|)$ work at each iteration.

### Illustration Example

<center>
<img src="figures/kruskal2.png" width="90%"/>
</center>




### Better Implementation

If we think of each connected component as a set of nodes, we need an efficient way of:

- checking which set $u$ and $v$ are in
- determining if these two sets are equal
- if they are not equal, then we need to take their union




To make checking set equality fast, we will assign a **representative** node in each set (**root**).

E.g., suppose we have two sets $\mathbf{S} = \{S_1, S_2\}$ where:

$S_1 = \{\mathbf{a},b,c\}~~~~ S_2 = \{\mathbf{s}, d, e\}$

We can (arbitrarily) assign the representative of $S_1$ to be $a$, and the representative of $S_2$ to be $s$.

$r(S_1) = a ~~~ r(S_2) = s$

<br>

If $S_u$ is the set containing $u$ and $S_v$ is the set containing $v$, then we can check if $u$ and $v$ are in the same set by checking if $r(S_u) == r(S_v)$





## Implementation with three operations:

1. `make_set(u)`: create a new set containing the single element $u$ 
- $u$ will be the representative of this set

  
2. `find_set(u)`: returns the representative of the set containing $u$:  $r(S_u)$


3. `union(u,v)`: replace $S_u$ and $S_v$ with $S_u \cup S_v$ in the collection of sets $\mathbf{S}$

What data structures can we use to represent each set?

## Data Structure of Balanced Forests

<center>
    <img src="figures/forest.png"/>
</center>

Each set is a balanced tree, where the root is the representative.

Assuming we represent a tree node with a pointer to its parent, what is the work of `find_set(u)` (to find the representative of $S_u$)?

$O(\log n)$, assuming a balanced tree, to walk from a node to its root.

How about `union(u,v)`?


<center>
    <img src="figures/merge.png" width=50%/>
</center>

- find representative of $u ~~~~ O(\log n)$
- find representative of $v ~~~~ O(\log n)$
- link root of one tree to the root of another



### How to keep balanced?

<center>
    <img src="figures/rank.png" width="70%">
</center>

Add "shorter" tree to the "taller" tree.
- store the "rank" of each tree as its depth

<br>

Thus, if $height(S_v) < height(S_u)$, then the height the union of $S_u \cup S_v$ is 

$\max \{ height(S_u), height(S_v)+1\}$

<br>
Using similar arguments as in leftist heaps, we can ensure that height of any tree is $O(\log n)$

### Kruskal's Algorithm

0. Initialize tree $T \leftarrow \emptyset$  

<br>


1. For each $v \in V$, run `make_set(v)`

<br>

2. Sort edges in increasing order of weight

<br>

3. For each edge $e=(u,v)$ in sorted set:
  - if `find_set(u)` $\ne$ `find_set(v)`:
    - $T \leftarrow T \cup \{(u,v)\}$
    - `union(u,v)`


In [1]:
# Python program for Kruskal's algorithm to find
# Minimum Spanning Tree of a given connected,
# undirected and weighted graph

from collections import defaultdict

# Class to represent a graph


class Graph:
    def __init__(self, vertices):
        self.V = vertices # No. of vertices
        self.graph = [] # default dictionary
        # to store graph

    # function to add an edge to graph
    def edge_graph(self, u, v, w):
        self.graph.append([u, v, w])

    # A utility function to find set of an element i
    # (uses path compression technique)
    def find_set(self, parent, i):
        if parent[i] == i:
            return i
        return self.find_set(parent, parent[i])

    # A function that does union of two sets of x and y
    # (uses union by rank)
    def union(self, parent, rank, x, y):
        xroot = self.find_set(parent, x)
        yroot = self.find_set(parent, y)
        
        # Attach smaller rank tree under root of
        # high rank tree (Union by Rank)
        if rank[xroot] < rank[yroot]:
            parent[xroot] = yroot
        elif rank[xroot] > rank[yroot]:
            parent[yroot] = xroot

            
        # If ranks are same, then make one as root
        # and increment its rank by one
        else:
            parent[yroot] = xroot
            rank[xroot] += 1

    # The main function to construct MST using Kruskal's # algorithm
    def KruskalMST(self):
        
        # Create V subsets with single elements
        def make_set(self):
            for node in range(self.V):
                parent.append(node)
                rank.append(0)
            return parent, rank
        
        result = [] # This will store the resultant MST
        
        # An index variable, used for sorted edges
        i = 0        
        # An index variable, used for result[]
        e = 0

        # Step 1: Sort all the edges in non-decreasing order of their weight. 
        # If we are not allowed to change the given graph, we can create a copy of graph
        
        self.graph = sorted(self.graph, key=lambda item: item[2])

        ## create the parent for each node with rank 0
        parent = []
        rank = []        
        parent, rank = make_set(self)

        # Number of edges to be taken is equal to V-1
        while e < self.V - 1:

            # Step 2: Pick the smallest edge and increment the index for next iteration
            u, v, w = self.graph[i]
            i = i + 1
            x = self.find_set(parent, u)
            y = self.find_set(parent, v)

            # If including this edge does't cause cycle, include it in result and increment the indexof result for next edge
            if x != y:
                e = e + 1
                result.append([u, v, w])
                self.union(parent, rank, x, y)
            # Else discard the edge

        minimumCost = 0
        print ("Edges in the constructed MST")
        for u, v, weight in result:
            minimumCost += weight
            print("%d -- %d == %d" % (u, v, weight))
        print("Minimum Spanning Tree" , minimumCost)

# Driver code
g = Graph(4)
## node, node, weight
g.edge_graph(0, 1, 10)
g.edge_graph(0, 2, 6)
g.edge_graph(0, 3, 5)
g.edge_graph(1, 3, 15)
g.edge_graph(2, 3, 4)

# Function call
g.KruskalMST()

Edges in the constructed MST
2 -- 3 == 4
0 -- 3 == 5
0 -- 1 == 10
Minimum Spanning Tree 19


### Kruskal's Algorithm - work

0. Initialize tree $T \leftarrow \emptyset~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~O(1)$


1. For each $v \in V$, run `make_set(v)` $~~~~~O(|V|)$


2. Sort edges in increasing order of weight $~~O(|E|\lg |E|)$


3. For each edge $e=(u,v)$ in sorted set: $~~O(|E|\lg |V|)$
  - if `find_set(u)` $\ne$ `find_set(v)`: $~~O(\lg |V|)$
    - $T \leftarrow T \cup \{(u,v)\}$
    - `union(u,v)`
    
    
Thus, total work is $O(|E|\lg |E|)$, since we have $|E| \in O(|V|^2)$ and $|E| \in \Omega(|V|)$


Comparison: Prim's Algorithm: $O(|E| \log |E|)$


### Questions # 1

Which of the following statements about Minimum Spanning Trees is always true?

- A) MST is always unique.
- B) MST always includes the smallest weight edge incident to every vertex.
- C) MST always minimizes the total edge weight among all spanning trees.
- D) MST is a subgraph that may contain cycles.


Answer: C

### Questions # 2

Which of the following actions will definitely not change the MST?

- A) Adding a new edge with a very large weight.
- B) Decreasing the weight of an edge in the MST.
- C) Increasing the weight of an edge not in the MST.
- D) Decreasing the weight of an edge not in the MST.


Answer: C

### Questions # 3

Which of the following statements is TRUE?

- A) A graph’s MST always contains the edge with the smallest weight.

- B) A graph’s MST always contains the edge with the largest weight.

- C) If all edges have the same weight, then every spanning tree is an MST.

- D) If the graph is not connected, MST still exists but only for each component separately.

Answer: C

### Questions # 4

In Prim's algorithm, if implemented with an adjacency matrix (no heap), what is the time complexity?

- A) $O(V^2)$
 
- B) $O(E\log V)$

- C) $O(E+V\log V)$

- D) $O(V+E)$

Answer: A



## Traveling Salesperson Problem [Optional]

Consider a slight variant of the MST problem:

Given a graph $G=(V,E)$, find a tour that visits each node exactly once and then returns to the origin node.
 - every node is visited
 - no edges are repeated

<center>
<img src="https://upload.wikimedia.org/wikipedia/commons/thumb/1/11/GLPK_solution_of_a_travelling_salesman_problem.svg/480px-GLPK_solution_of_a_travelling_salesman_problem.svg.png"/>
</center>

Often, we assume the graph is *complete* (fully connected) and edge weights are distance between each city.

<br>

How does this differ from the MST problem?

- TSP solution has one more edge than MST solution (graph instead of a tree)


- Therefore, weight(MST solution) < weight(TSP solution)


Thus, MST solution provides a lower bound on the TSP solution.

Can we also use MST to find an upper bound?

### Euclidean TSP

Variant of TSP where triangle inequality holds:

$w(u,v) + w(v,w) \ge w(u,w)$

where all weights are non-negative.

Consider a MST solution for the graph:

<center>
    <img src="figures/tsp1.jpg"/>
</center>

<br>

How could we convert this tree into a tour for TSP?

<br><br>


We need to determine an order to visit the nodes in the MST solution.

Let's try depth-first search:

<center>
    <img src="figures/tsp2.jpg"/>
</center>

This is called the **Euler tour** of the tree:

 - a cycle in a graph that visits every edge exactly once.
 - Since $T$ spans the graph, the Euler tour will visit every vertex at least once, but possibly multiple times.

<br>

This is close to a TSP solution, but: 

- it visits each edge twice

- $(d,f)$ should have an edge

- The weight of the Euler tour is equal to twice the MST weight (since we visit each edge twice).


How can we convert this to a proper solution to TSP?

**idea**: 

Compute DFS order, but when we find a repeated edge, instead find the next unvisited vertex.

<center>
    <img src="figures/tsp3.jpg"/>
</center>

The red edges above are called *shortcut edges*.

Because of triangle inequality, we know that the shortcut edges are no longer than the paths they replace

$w(f,c) \le w(\langle f,e,b,a,c \rangle)$
  
<br>
  
Since we know:


  - $weight(MST) < weight(TSP)$ , and
  
  - $weight(Euler) = 2 \cdot weight(MST)$
  
then we know

 - $weight(MST) \le weight(TSP) \le 2 \cdot weight(MST)$
 
 
###  Thus, we now have a polynomial-time algortithm to solve TSP that is no worse than 2 times the optimal solution!
 

For such NP-hard problems, approximations are often the best we can do.
 