# Minimal Spanning Tree Algorithms


![MST](https://upload.wikimedia.org/wikipedia/commons/thumb/d/d2/Minimum_spanning_tree.svg/600px-Minimum_spanning_tree.svg.png)

## Problem Statement

Minimal spanning tree (MST) is a subgraph of a connected, undirected graph that connects all the vertices together, without any cycles and with the minimum possible total edge weight.

The Minimum Spanning Tree (MST) problem is a well-known problem in computer science and graph theory. Given a connected, undirected graph with weighted edges, the MST problem requires finding the subgraph that connects all vertices of the graph with the minimum possible total edge weight.

In other words, we want to find a tree that spans all vertices of the graph and has the minimum possible total weight. A tree is a connected acyclic graph, which means it is a graph that doesn't have any cycles (loops), and it is connected, which means that there is a path between any two vertices in the graph.

There are several algorithms for solving the MST problem, including:

Kruskal's algorithm: Kruskal's algorithm is a greedy algorithm that starts with an empty subgraph and iteratively adds edges to the subgraph in order of increasing weight, as long as the addition of the edge does not create a cycle.

Prim's algorithm: Prim's algorithm is another greedy algorithm that starts with a single vertex and iteratively adds the lowest-weight edge that connects a vertex in the current subgraph to a vertex outside the subgraph, until all vertices are included in the subgraph.

Boruvka's algorithm: Boruvka's algorithm is a divide-and-conquer algorithm that works by dividing the graph into smaller subgraphs and finding the minimum-weight edge for each subgraph. It then merges the subgraphs by adding the minimum-weight edges found in the previous step.

All of these algorithms have different time complexities and are suited to different types of graphs and edge-weight distributions. However, they all guarantee to find the minimum spanning tree of the given graph.


How would we solve MST problem if all edges are same weight? (1)

Just connect all unconnected vertices with a single edge. (2)

Avoid cycles/loops. (3)

Now we have a tree. (4)

Now let's consider the case when all edges have different weights. (5)

## Kruskal's Algorithm

Kruskal's algorithm is a popular greedy algorithm for finding the Minimum Spanning Tree (MST) of a connected, undirected graph. The algorithm works by iteratively adding edges to a subgraph, starting from an empty subgraph, until all vertices of the graph are included in the subgraph.

Here are the steps of Kruskal's algorithm:

1. Sort all the edges in the graph in increasing order of their weight.

2. Initialize an empty subgraph, which will eventually become the MST.

3. Iterate through the sorted edges, and for each edge, do the following:

a. If adding the edge to the subgraph does not create a cycle, add it to the subgraph.

b. If adding the edge to the subgraph creates a cycle, discard it and move on to the next edge.

4. Stop the iteration when all vertices are included in the subgraph, or when the subgraph contains n-1 edges, where n is the number of vertices in the graph.

5. Return the subgraph, which is the MST of the graph.

Kruskal's algorithm is relatively simple to implement and has a time complexity of O(E log E), where E is the number of edges in the graph. This makes it a very efficient algorithm for large graphs with many edges.

One way to implement step 3a is by using a disjoint-set data structure, which allows us to quickly determine whether adding an edge will create a cycle in the subgraph. By keeping track of the connected components of the subgraph, we can check whether two vertices are already connected before adding an edge between them.

## Disjoint-set data structure

The disjoint-set data structure, also known as a union-find data structure, is a way to efficiently represent a partition of a set into disjoint subsets. It supports two operations: "find" and "union".

The "find" operation returns the identifier of the subset that contains a given element. The "union" operation merges two subsets into a single subset.

In the context of Kruskal's algorithm, we can use the disjoint-set data structure to keep track of the connected components of the subgraph, which will help us determine whether adding an edge will create a cycle in the subgraph.

At the beginning of the algorithm, each vertex is in its own subset. As we add edges to the subgraph, we perform a "find" operation on the vertices of each edge to determine which subsets they belong to. If the subsets are different, we can safely add the edge to the subgraph without creating a cycle, and then perform a "union" operation to merge the two subsets.

If the subsets are the same, adding the edge would create a cycle in the subgraph, so we discard the edge and move on to the next one.

To detect a cycle, we check whether the two endpoints of an edge belong to the same subset before adding the edge to the subgraph. If they do, adding the edge would create a cycle, and we discard the edge.

By using the disjoint-set data structure to keep track of the connected components of the subgraph, we can efficiently detect cycles and avoid adding them to the subgraph. This is a key part of Kruskal's algorithm and helps to ensure that the subgraph is a tree, and therefore a valid MST of the original graph.

In [1]:
class DisjointSet:
    def __init__(self, n):
        self.parent = list(range(n))
        self.rank = [0] * n
        # TODO rewrite this using set structure
        # using list operations might lead to linear performance

    def find(self, i):
        # TODO rewrite this using set structure
        if self.parent[i] != i:
            self.parent[i] = self.find(self.parent[i])
        return self.parent[i]

    def union(self, i, j):
        # TODO rewrite this using set structure
        pi, pj = self.find(i), self.find(j)
        if pi == pj:
            return
        if self.rank[pi] < self.rank[pj]:
            self.parent[pi] = pj
        elif self.rank[pi] > self.rank[pj]:
            self.parent[pj] = pi
        else:
            self.parent[pi] = pj
            self.rank[pj] += 1

def kruskal_mst(graph):
    n = len(graph)
    edges = []
    # so worst case scenario we have a complete graph: https://en.wikipedia.org/wiki/Complete_graph
    # then we would have v(v-1)/2 edges to add
    for i in range(n):
        for j in range(i + 1, n):
            if graph[i][j] != 0:
                edges.append((graph[i][j], i, j))
    edges.sort()

    mst = []
    dsu = DisjointSet(n)

    for w, u, v in edges:
        if dsu.find(u) != dsu.find(v):
            mst.append((u, v, w))
            dsu.union(u, v)
            if len(mst) == n - 1:
                break

    return mst



In [2]:
g = [
    [0,5,7,6],
     [5,0,0,0],
     [7,0,0,0],
     [6,0,0,0]
]
# so adjacency matrix of a tiny graph


In [3]:
kruskal_mst(g)

[(0, 1, 5), (0, 3, 6), (0, 2, 7)]

In [4]:
# we add an extra edge that is cheaper, that should change the solution
# also our simple tree now has loops!
g1 = [
    [0,5,7,6],
     [5,0,0,0],
     [7,0,0,3],
     [6,0,3,0]
]
kruskal_mst(g1)

[(2, 3, 3), (0, 1, 5), (0, 3, 6)]

In [5]:
# we add an extra edge  that is expensive that should not change the solution

g2 = [
     [0,5,7,6],
     [5,0,9,0],
     [7,9,0,3],
     [6,0,3,0]
]
kruskal_mst(g2)

[(2, 3, 3), (0, 1, 5), (0, 3, 6)]

## Better implementation of Kruskal's Algorithm	

We will use a set data structure to represent the disjoint-set data structure. The set data structure is a collection of distinct elements, with no particular order, and no repeated elements. It supports operations such as adding an element to the set, removing an element from the set, and checking whether an element is in the set. Most importantly adding, removing and checking whether an element is in the set can be done in O(1) time.

In [6]:
# we will implement Kruska's algoritm we will pass in a a dictionary where keys are tuples of vertices and values are the weights of the edges
# we will sort the edges by weight and then we will add them to the mst if they do not create a cycle

def kruskal_mst(graph: dict[tuple[str,str], int]) -> list[tuple[str,str,int]]:
    edges = sorted(graph.items(), key=lambda x: x[1]) # so this is where ElogE comes from
    mst = []
    # we initialize each vertice with its own set
    dsu = {key:key for key in set([v1 for (v1,v2) in graph.keys()]+[v2 for (v1,v2) in graph.keys()])}

    for (u, v), w in edges:
        # so our logic is as long as one vertices are not in the same set we can add the edge to the mst
        if dsu[u] != dsu[v]:
            mst.append((u, v, w))
            # we need to update the sets of the vertices so all vertices now need to belong to the same set
            # so this is not efficient it is linear time(O(n)) 
            for key in dsu: # so the DSU class was better
                if dsu[key] == dsu[v]:
                    dsu[key] = dsu[u]   

    return mst



In [7]:
gdict = {("A","B"):5, ("A","C"):7, ("A","D"):6, ("C","D"):3}
# so we provided the graph as an edge list in a dictionary
mst = kruskal_mst(gdict)
print(mst)  

[('C', 'D', 3), ('A', 'B', 5), ('A', 'D', 6)]


Above, graph is a weighted adjacency matrix of the input graph, where graph[i][j] is the weight of the edge between vertices i and j. The function kruskal_mst returns a list of tuples representing the edges in the MST, where each tuple contains the endpoints of the edge and its weight.

## Uniqueness of MST

![Multiple solutions to MST](https://upload.wikimedia.org/wikipedia/commons/thumb/c/c9/Multiple_minimum_spanning_trees.svg/440px-Multiple_minimum_spanning_trees.svg.png)

In [7]:
# so depending on our sorting algorithm (stable or not)
# we might have a different solution since for vertices 0,1,2 we have 3 edges with costs 5 each and we only need 2 of those
g3 = [
     [0,5,5,0],
     [5,0,5,7],
     [5,5,0,8],
     [0,7,8,0]
]
kruskal_mst(g3)

[(0, 1, 5), (0, 2, 5), (1, 3, 7)]

## Prim's Algorithm

Prim's algorithm is another popular algorithm for finding the Minimum Spanning Tree (MST) of a connected, undirected graph. Like Kruskal's algorithm, it is a greedy algorithm that works by iteratively adding edges to a subgraph until all vertices of the graph are included in the subgraph.

Here are the steps of Prim's algorithm:

1. Choose an arbitrary vertex to start with and add it to the subgraph.

2. While not all vertices are in the subgraph, do the following:

a. Identify the edge with the minimum weight that connects a vertex in the subgraph to a vertex outside the subgraph.

b. Add the endpoint of the edge that is outside the subgraph to the subgraph.

3. Stop when all vertices are included in the subgraph, or when the subgraph contains n-1 edges, where n is the number of vertices in the graph.

4. Return the subgraph, which is the MST of the graph.

In other words, Prim's algorithm starts with a single vertex and iteratively adds the lowest-weight edge that connects a vertex in the current subgraph to a vertex outside the subgraph, until all vertices are included in the subgraph. This ensures that the subgraph is always connected, and because we choose the minimum-weight edges, it also guarantees that the subgraph is a tree, and therefore a valid MST of the original graph.

Prim's algorithm can be implemented using a priority queue to efficiently find the edge with the minimum weight. At each iteration, we add the newly included vertex to the priority queue, and update the weights of the edges that connect it to vertices outside the subgraph. This allows us to efficiently find the minimum-weight edge in each iteration.

Prim's algorithm has a time complexity of O(E log V), where E is the number of edges in the graph and V is the number of vertices. This makes it an efficient algorithm for dense graphs with many edges, although it may not be as efficient as Kruskal's algorithm for sparse graphs.

In [8]:
import heapq

def prim_mst(graph, start_v=0):
    n = len(graph)
    mst = []
    visited = set()

    # Choose an arbitrary starting vertex
    start_vertex = start_v # random would work just as well
    visited.add(start_vertex)

    # Initialize the priority queue with the edges that connect the starting vertex to other vertices
    edges = [(w, start_vertex, i) for i, w in enumerate(graph[start_vertex]) if w != 0]
    heapq.heapify(edges)  # heapify is actually linear but even if it was ElogE it would not affect the complexity

    while len(visited) < n:
        # Find the edge with the minimum weight that connects a vertex in the subgraph to a vertex outside the subgraph
        w, u, v = heapq.heappop(edges)  # heappop is log n operation

        if v in visited: # means this candidate would create a loop/cycle so we discard it and look for next edge
            # note checking for membership in a set is O(1) very efficient, if it was a list instead it would be O(n)
            continue

        # Add the newly included vertex to the subgraph and add the minimum-weight edge to the MST
        visited.add(v)
        mst.append((u, v, w))

        # Update the priority queue with the edges that connect the newly included vertex to other vertices
        for i, w in enumerate(graph[v]):
            if w != 0 and i not in visited:
                heapq.heappush(edges, (w, v, i)) # these are log n operations each

    return mst

In [10]:
g

[[0, 5, 7, 6], [5, 0, 0, 0], [7, 0, 0, 0], [6, 0, 0, 0]]

In [11]:
prim_mst(g)

[(0, 1, 5), (0, 3, 6), (0, 2, 7)]

In [11]:
prim_mst(g1)

[(0, 1, 5), (0, 3, 6), (3, 2, 3)]

In [12]:
prim_mst(g2)

[(0, 1, 5), (0, 3, 6), (3, 2, 3)]

In [16]:
prim_mst(g3)

[(0, 1, 5), (0, 2, 5), (1, 3, 7)]

In [17]:
prim_mst(g3, start_v = 2)

[(2, 0, 5), (0, 1, 5), (1, 3, 7)]

In [18]:
prim_mst(g3, start_v = 1)

[(1, 0, 5), (0, 2, 5), (1, 3, 7)]

In [28]:
# let's rewrite prim's to take a dictionary of edges as input
def prim_mst(graph: dict[tuple[str,str], int], start_v:str=None, debug=False) -> list[tuple[str,str,int]]:
    n = len(set([v1 for (v1,v2) in graph.keys()]+[v2 for (v1,v2) in graph.keys()]))
    mst = []
    visited = set()

    # Choose an arbitrary starting vertex
    if start_v is None:
        start_vertex = next(iter(graph.keys()))[0] # random vertex
    else:
        start_vertex = start_v 
    visited.add(start_vertex)

    # Initialize the priority queue with the edges that connect the starting vertex to other vertices
    edges = [(w, u, v) for (u,v), w in graph.items() if u == start_vertex or v == start_vertex]
    # we are going to heapify by weights
    heapq.heapify(edges)  # heapify is actually linear but even if it was ElogE it would not affect the complexity
    if debug is True:
        print(f"From Starting Vertex {start_vertex} ", edges) # should be printed in order of their weights

    while len(visited) < n:
        # Find the edge with the minimum weight that connects a vertex in the subgraph to a vertex outside the subgraph
        try:
            w, u, v = heapq.heappop(edges)  # heappop is log n operation
        except IndexError:
            # if we run out of edges we are done
            if debug:
                print("We ran out of edges")
            break
        if debug is True:
            print(f"Considering edge {u}-{v} with weight {w}")

        if u in visited and v in visited: # means this candidate would create a loop/cycle so we discard it and look for next edge
            # note checking for membership in a set is O(1) very efficient, if it was a list instead it would be O(n)
            if debug is True:
                print(f"Both {u} and {v} are already in the MST so we discard this edge")
            continue

        # Add the newly included vertex to the subgraph and add the minimum-weight edge to the MST
        visited.add(u) # nothing happens if it is already in the set
        visited.add(v) # nothing happens if it is already in the set
        mst.append((u, v, w))

        # Update the priority queue with the edges that connect the newly included vertex to other vertices
        for (u1,v1), w in graph.items():
            if (u1 == v and v1 not in visited) or (v1 == v and u1 not in visited):
                heapq.heappush(edges, (w, u1, v1)) # these are log n operations each
            # need to also check u
            if (u1 == u and v1 not in visited) or (v1 == u and u1 not in visited):
                heapq.heappush(edges, (w, u1, v1)) # these are log n operations each

    return mst

In [29]:
# let's test it on our small graph
gdict = {("A","B"):5, ("A","C"):7, ("A","D"):6, ("C","D"):3}
mst = prim_mst(gdict, start_v="C", debug=True)
print(mst)

From Starting Vertex C  [(3, 'C', 'D'), (7, 'A', 'C')]
Considering edge C-D with weight 3
Considering edge A-D with weight 6
Considering edge A-B with weight 5
[('C', 'D', 3), ('A', 'D', 6), ('A', 'B', 5)]


In [30]:
mst = prim_mst(gdict, start_v="A", debug=True)
print(mst)

From Starting Vertex A  [(5, 'A', 'B'), (7, 'A', 'C'), (6, 'A', 'D')]
Considering edge A-B with weight 5
Considering edge A-D with weight 6
Considering edge C-D with weight 3
[('A', 'B', 5), ('A', 'D', 6), ('C', 'D', 3)]


In [None]:
# so depending on which node/vertex you start you might get a different solution (same low cost) if there are edges with same weights

Here, graph is a weighted adjacency matrix of the input graph, where graph[i][j] is the weight of the edge between vertices i and j. The function prim_mst returns a list of tuples representing the edges in the MST, where each tuple contains the endpoints of the edge and its weight.

The function starts by choosing an arbitrary starting vertex and adding it to the subgraph. It then initializes a priority queue with the edges that connect the starting vertex to other vertices.

In each iteration, the function finds the edge with the minimum weight that connects a vertex in the subgraph to a vertex outside the subgraph. It adds the newly included vertex to the subgraph and adds the minimum-weight edge to the MST. It then updates the priority queue with the edges that connect the newly included vertex to other vertices.

The loop continues until all vertices are included in the subgraph. At that point, the function returns the MST.

Overall, this implementation should have a time complexity of O(E log V), where E is the number of edges in the graph and V is the number of vertices.

## Boruvka's Algorithm

Boruvka's algorithm is a divide-and-conquer algorithm for finding the Minimum Spanning Tree (MST) of a connected, undirected graph. It was developed by Czech mathematician Otakar Boruvka in 1926.

Wikipedia on Boruvka's algorithm: https://en.wikipedia.org/wiki/Bor%C5%AFvka%27s_algorithm

The algorithm works by dividing the graph into smaller subgraphs and finding the minimum-weight edge for each subgraph. It then merges the subgraphs by adding the minimum-weight edges found in the previous step, creating a larger subgraph. This process is repeated until all vertices are included in a single subgraph, which is the MST of the graph.

Here are the steps of Boruvka's algorithm:

1. Initialize a subgraph for each vertex in the graph.

2. While there is more than one subgraph, do the following:

a. For each subgraph, find the minimum-weight edge that connects it to a different subgraph.

b. Add the minimum-weight edges to the subgraph, merging the two subgraphs.

c. Remove any duplicate edges that may have been added during the previous step.

4. Stop when all vertices are included in a single subgraph, which is the MST of the graph.

Boruvka's algorithm can be implemented using a priority queue to efficiently find the minimum-weight edges. At each iteration, we iterate through the subgraphs and find the minimum-weight edge that connects each subgraph to a different subgraph. We then add these edges to the subgraph, merge the subgraphs, and remove any duplicate edges. This process continues until all vertices are included in a single subgraph.

Boruvka's algorithm has a time complexity of O(E log V), where E is the number of edges in the graph and V is the number of vertices. This makes it an efficient algorithm for dense graphs with many edges. However, it may not be as efficient as Kruskal's or Prim's algorithm for sparse graphs with fewer edges.


In [19]:
import heapq

def boruvka_mst(graph):
    n = len(graph)
    # in beginning each node has its own subgraph
    subgraphs = [{i} for i in range(n)]
    # once again edges (the solution) will go into mst structure - list
    mst = []

    while len(subgraphs) > 1:
        # Initialize the priority queue with the minimum-weight edge that connects each subgraph to a different subgraph
        edges = [None] * n
        for i, subgraph in enumerate(subgraphs):
            for j in subgraph:
                for k, w in enumerate(graph[j]):
                    if w != 0:
                        l = find_subgraph_index(subgraphs, k)
                        if l != i and (edges[i] is None or edges[i][0] > w):
                            edges[i] = (w, j, k, l)

        # Add the minimum-weight edges to the subgraph and merge the subgraphs
        for i, edge in enumerate(edges):
            if edge is not None:
                w, u, v, l = edge
                if l != i:
                    mst.append((u, v, w))
                    subgraphs[i] |= subgraphs[l]
                    del subgraphs[l]

        # Remove any duplicate edges that may have been added during the previous step
        graph = remove_duplicate_edges(mst, n)

    return mst

def find_subgraph_index(subgraphs, i):
    for j, subgraph in enumerate(subgraphs):
        if i in subgraph:
            return j
    return None

def remove_duplicate_edges(edges, n):
    graph = [[0] * n for _ in range(n)]
    for u, v, w in edges:
        graph[u][v] = graph[v][u] = w
    return graph
## TODO FIXME

In [21]:
# Boruvka's algorithm to find Minimum Spanning
# https://www.geeksforgeeks.org/boruvkas-algorithm-greedy-algo-9/
# Tree of a given connected, undirected and weighted graph
 
from collections import defaultdict
 
#Class to represent a graph
class Graph:
 
    def __init__(self,vertices):
        self.V= vertices #No. of vertices
        self.graph = [] # default dictionary to store graph
         
  
    # function to add an edge to graph
    def addEdge(self,u,v,w):
        self.graph.append([u,v,w])
 
    # A utility function to find set of an element i
    # (uses path compression technique)
    def find(self, parent, i):
        if parent[i] == i:
            return i
        return self.find(parent, parent[i])
 
    # A function that does union of two sets of x and y
    # (uses union by rank)
    def union(self, parent, rank, x, y):
        xroot = self.find(parent, x)
        yroot = self.find(parent, y)
 
        # Attach smaller rank tree under root of high rank tree
        # (Union by Rank)
        if rank[xroot] < rank[yroot]:
            parent[xroot] = yroot
        elif rank[xroot] > rank[yroot]:
            parent[yroot] = xroot
        #If ranks are same, then make one as root and increment
        # its rank by one
        else :
            parent[yroot] = xroot
            rank[xroot] += 1
 
    # The main function to construct MST using Kruskal's algorithm
    def boruvkaMST(self):
        parent = []; rank = [];
 
        # An array to store index of the cheapest edge of
        # subset. It store [u,v,w] for each component
        cheapest =[]
 
        # Initially there are V different trees.
        # Finally there will be one tree that will be MST
        numTrees = self.V
        MSTweight = 0
 
        # Create V subsets with single elements
        for node in range(self.V):
            parent.append(node)
            rank.append(0)
            cheapest =[-1] * self.V
     
        # Keep combining components (or sets) until all
        # components are not combined into single MST
 
        while numTrees > 1:
 
            # Traverse through all edges and update
               # cheapest of every component
            for i in range(len(self.graph)):
 
                # Find components (or sets) of two corners
                # of current edge
                u,v,w =  self.graph[i]
                set1 = self.find(parent, u)
                set2 = self.find(parent ,v)
 
                # If two corners of current edge belong to
                # same set, ignore current edge. Else check if
                # current edge is closer to previous
                # cheapest edges of set1 and set2
                if set1 != set2:    
                     
                    if cheapest[set1] == -1 or cheapest[set1][2] > w :
                        cheapest[set1] = [u,v,w]
 
                    if cheapest[set2] == -1 or cheapest[set2][2] > w :
                        cheapest[set2] = [u,v,w]
 
            # Consider the above picked cheapest edges and add them
            # to MST
            for node in range(self.V):
 
                #Check if cheapest for current set exists
                if cheapest[node] != -1:
                    u,v,w = cheapest[node]
                    set1 = self.find(parent, u)
                    set2 = self.find(parent ,v)
 
                    if set1 != set2 :
                        MSTweight += w
                        self.union(parent, rank, set1, set2)
                        print ("Edge %d-%d with weight %d included in MST" % (u,v,w))
                        numTrees = numTrees - 1
             
            #reset cheapest array
            cheapest =[-1] * self.V
 
             
        print ("Weight of MST is %d" % MSTweight)
                           
 
     
g = Graph(4)
g.addEdge(0, 1, 10)
g.addEdge(0, 2, 6)
g.addEdge(0, 3, 5)
g.addEdge(1, 3, 15)
g.addEdge(2, 3, 4)
 
g.boruvkaMST()
 
#This code is contributed by Neelam Yadav

Edge 0-3 with weight 5 included in MST
Edge 0-1 with weight 10 included in MST
Edge 2-3 with weight 4 included in MST
Weight of MST is 19


In [25]:
g1 = Graph(4)
g1.addEdge(0,1, 5)
g1.addEdge(0,2, 5)
g1.addEdge(1,2, 5)
g1.addEdge(2,3, 7)
g1.addEdge(1,3, 8)
g1.boruvkaMST()

Edge 0-1 with weight 5 included in MST
Edge 0-2 with weight 5 included in MST
Edge 2-3 with weight 7 included in MST
Weight of MST is 17


Here, graph is a weighted adjacency matrix of the input graph, where graph[i][j] is the weight of the edge between vertices i and j. The function boruvka_mst returns a list of tuples representing the edges in the MST, where each tuple contains the endpoints of the edge and its weight.

The function starts by initializing a subgraph for each vertex in the graph. It then repeatedly finds the minimum-weight edge that connects each subgraph to a different subgraph, adds these edges to the subgraph, and merges the subgraphs. The process continues until all vertices are included in a single subgraph, which is the MST of the graph.

The find_subgraph_index function is used to efficiently find the index of the subgraph that contains a given vertex. The remove_duplicate_edges function is used to remove any duplicate edges that may have been added during the previous step.

Overall, this implementation should have a time complexity of O(E log V), where E is the number of edges in the graph and V is the number of vertices.

## Karger's Algorithm --

There exists a linear time randomized algorithm to solve the Minimum Spanning Tree (MST) problem for dense graphs, which are graphs with many edges. This algorithm is called Karger's algorithm, which was developed by David Karger in 1993.

Karger's algorithm works by iteratively contracting random edges in the graph until only two vertices remain. The contraction of an edge merges its two endpoints into a single vertex, and replaces the edges connecting the two endpoints with edges that connect the merged vertex to the other vertices. This process reduces the number of vertices in the graph, but it may introduce multiple edges and self-loops, which need to be handled correctly.

Here are the steps of Karger's algorithm:

1. Initialize a graph with the same vertices and edges as the input graph.

2. While there are more than two vertices, do the following:

a. Randomly choose an edge from the graph.

b. Contract the edge, merging its two endpoints into a single vertex.

c. Remove any self-loops that may have been introduced during the contraction.

3. Return the remaining graph, which is the MST of the input graph.

The probability of Karger's algorithm producing the correct MST increases with the number of iterations. In fact, it can be shown that the algorithm has a success probability of at least 1/n^2, where n is the number of vertices in the graph. Therefore, by running the algorithm multiple times, we can achieve a high probability of finding the correct MST.

Karger's algorithm has a time complexity of O(n^2), which is linear in the number of vertices, but may not be as efficient as other algorithms for sparse graphs with fewer edges.

In [26]:
import random

def karger_mst(graph):
    n = len(graph)

    # Repeat the contraction process n^2 times to achieve a high probability of success
    for i in range(n * n):
        # Initialize a copy of the graph
        contracted_graph = [row[:] for row in graph]

        # Contract edges until there are only two vertices left
        num_vertices = n
        while num_vertices > 2:
            # Randomly choose an edge to contract
            u = random.randint(0, num_vertices - 1)
            v = random.randint(0, num_vertices - 1)
            while contracted_graph[u][v] == 0:
                u = random.randint(0, num_vertices - 1)
                v = random.randint(0, num_vertices - 1)

            # Contract the edge
            for w in range(num_vertices):
                contracted_graph[u][w] += contracted_graph[v][w]
                contracted_graph[w][u] = contracted_graph[u][w]
            contracted_graph.pop(v)
            for row in contracted_graph:
                row.pop(v)

            # Remove any self-loops that may have been introduced during the contraction
            for j in range(num_vertices - 1):
                if contracted_graph[u][j] != 0:
                    contracted_graph[u][j] = contracted_graph[j][u] = min(contracted_graph[u][j], contracted_graph[j][u])
                    contracted_graph[j][j] = 0

            num_vertices -= 1

        # Return the remaining graph, which is the MST with high probability
        u, v = 0, 1
        for i in range(n):
            if contracted_graph[i][u] < contracted_graph[i][v]:
                v = i
            if contracted_graph[i][u] > contracted_graph[i][v]:
                u = i

        mst = []
        for i in range(n):
            if i != u and i != v:
                if contracted_graph[i][u] < contracted_graph[i][v]:
                    mst.append((i, u, contracted_graph[i][u]))
                else:
                    mst.append((i, v, contracted_graph[i][v]))
        if contracted_graph[u][v] != 0:
            mst.append((u, v, contracted_graph[u][v]))

        return mst
# TODO fix the code

In [27]:
g2

[[0, 5, 7, 6], [5, 0, 9, 0], [7, 9, 0, 3], [6, 0, 3, 0]]

In [28]:
karger_mst(g2)

IndexError: ignored

Here, graph is a weighted adjacency matrix of the input graph, where graph[i][j] is the weight of the edge between vertices i and j. The function karger_mst returns a list of tuples representing the edges in the MST, where each tuple contains the endpoints of the edge and its weight.

The function starts by repeating the contraction process n^2 times, where n is the number of vertices in the graph, to achieve a high probability of success. For each iteration, it initializes a copy of the input graph and contracts random edges until only two vertices remain.

The contraction process randomly chooses an edge to contract, merges its endpoints into a single vertex, and removes any self-loops that may have been introduced. This continues until there are only two vertices left.

Finally, the function returns the remaining graph, which is the MST of the input graph with high probability.

Overall, this implementation should have a time complexity of O(n^2), which is linear in the number of vertices. However, it may not always find the correct MST, so it is typically used as a heuristic or a starting point for other algorithms.

## Real Life Applications of MST

The Minimum Spanning Tree (MST) problem and algorithms to solve it have numerous real-life applications in various fields. Here are some examples:

* Network design: MST algorithms can be used to design efficient networks for communication and transportation systems, such as cable or phone networks, road networks, and airline routes. By finding the minimum-weight set of edges that connect all vertices, MST algorithms can help reduce the cost and complexity of these networks.

* Clustering: MST algorithms can be used to group similar objects or data points into clusters based on their pairwise distances. By constructing a tree that connects all points with the minimum total distance, MST algorithms can help identify clusters and outliers in the data.

* Image processing: MST algorithms can be used to analyze and segment images based on their visual features, such as color or texture. By finding the minimum-weight set of edges that connect adjacent pixels or regions, MST algorithms can help extract and highlight important structures and patterns in the image.

* Circuit design: MST algorithms can be used to design efficient electronic circuits that connect multiple components or devices. By finding the minimum-weight set of edges that connect all components, MST algorithms can help optimize the performance and cost of the circuit.

* Biology: MST algorithms can be used to analyze and compare genetic sequences or protein structures. By constructing a tree that represents the evolutionary relationships between different organisms or molecules, MST algorithms can help identify common ancestors, genetic mutations, and functional domains.

## Conclusion

In this article, we explored three algorithms for finding the Minimum Spanning Tree (MST) of a connected, undirected graph: Kruskal's algorithm, Prim's algorithm, and Boruvka's algorithm. We also discussed Karger's algorithm, which is a linear time randomized algorithm for finding the MST of dense graphs.

We implemented each algorithm in Python and compared their time complexities. We also discussed the advantages and disadvantages of each algorithm.

### More Information

* Minimum Spanning Tree: https://en.wikipedia.org/wiki/Minimum_spanning_tree
* Kruskal's algorithm: https://en.wikipedia.org/wiki/Kruskal%27s_algorithm
* Prim's algorithm: https://en.wikipedia.org/wiki/Prim%27s_algorithm
* Boruvka's algorithm: https://en.wikipedia.org/wiki/Bor%C5%AFvka%27s_algorithm

### References

* Cormen, T. H., Leiserson, C. E., Rivest, R. L., & Stein, C. (2009). Introduction to algorithms (3rd ed.). MIT press.

* Dasgupta, S., Papadimitriou, C. H., & Vazirani, U. V. (2006). Algorithms. McGraw-Hill Higher Education.

* David R. Karger, Philip N. Klein, and Robert E. Tarjan. 1995. A randomized linear-time algorithm to find minimum spanning trees. J. ACM 42, 2 (March 1995), 321–328. https://doi.org/10.1145/201019.201022