<a href="https://colab.research.google.com/github/davidludington/comp363assignments/blob/main/Safe_Edges_and_MST_David_Ludington.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Safe edges and MSTs

A safe edge is a fundamental building block for a minimum spanning tree (MST). MSTs are important aspects of undirected, weighted graphs. A spanning tree of a graph is a tree that reaches every vertex in the graph, i.e., it spans the graph.

An undirected, weighted graph may have several spanning trees. One (or more) of them could be a *minimum* spanning tree, i.e., a tree whose path's total weight is less than that of the other spanning trees. The example below shows a graph and its edge weights, and three of its spanning trees. For each tree, we add the weights of its edges: the green tree has a total edge weight 66, the magenta 31, and the brown 71. The magenta is the minimum spanning tree of the graph.

![Picture](https://drive.google.com/uc?id=1ZUqOJPDr6JfvBzDt-LWBcYm3np7pf_ev)


Let's get a bit technical: remember that a graph $G$ is a pair of two sets: a set $V$ of vertices and a set $E$ of edges. For a weighted graph, we also need a function $w: E↦\mathbb{R}$ to represent the weights. For example, in the graph above $w((0,3)) = w((3,0)) = 5$, where $(0,3)$ and $(3,0)$ are the edges between vertices $0$ and $3$.

A spanning tree of graph $G$, denoted $T_G$ is also a graph. It has a set of vertices $V_T$ and a set of edges $E_T$. The vertices of the spanning tree are the same as the vertices of the graph: $V_T=V$. The edges of the tree are a subset of the edges of the graph: $E_T⊆E$. The tree also has a weight function $w_T: E_T ↦\mathbb{R}$.


**Any attempt to find the MST of a graph** must begin by copying the graph's vertices. That's because the MST has the same vertices as the graph.



In terms of implementation, weighted graphs are best represented by an adjacency matrix. The matrix for the example graph above, is given below.

```python
_ = float('inf')  # In Python _ is a legal variable name.

adj_G = [  
          [ _,  _,  _,  5,  1,  _],
          [ _,  _, 20,  5,  _, 10],
          [ _, 20,  _, 10,  _,  _],
          [ 5,  5, 10,  _,  _, 15],
          [ 1,  _,  _,  _,  _, 20],
          [ _,  10, _, 15, 20,  _]  
]
```

Given the representation above, we can construct the adjacency matrix for what will eventually become the minimum spanning tree.

```python
adj_T = [  
          [ _,  _,  _,  _,  _,  _],
          [ _,  _,  _,  _,  _,  _],
          [ _,  _,  _,  _,  _,  _],
          [ _,  _,  _,  _,  _,  _],
          [ _,  _,  _,  _,  _,  _],
          [ _,  _,  _,  _,  _,  _]  
]
```

Matrix ``adj_T`` above represents a graph with 6 vertices (the matrix has has 6 rows). There are no edges in this graph for now: every element in the matrix has an infinite value. Every vertex is its own little component in the graph $T$. We can write that $T=(V,\emptyset)$, where $V$ is the set of vertices for graph G, and $\emptyset$ is the empty set.

When we are done with graph $T$, its edges will be those with the smallest weights and they will form a tree, i.e., an acyclic graph. But how do we get there?

## How to get there?

Minimum spanning trees trace their beginnings to [Otakar Borůvka](https://en.wikipedia.org/wiki/Bor%C5%AFvka%27s_algorithm), in the 1920s. In today's terms, his algorithm is as simple as this:

**Put together all safe edges and recurse.** Or, as Erickson states in his book (Chapter 7): "add all the safe edges and recurse".

### Safe edge

A component of T may have one or more edges to other components of the graph. Among these edges, the one with the minimum weight is the safe edge. It's important to realize that a safe edge spans two components. Consider the earlier graph, but now organized in two components: dark blue to the left and right blue to the bottom-right.

![Picture](https://drive.google.com/uc?id=1ZVWhaBtS3djxhhMx4Nxzjob-IIdLx6LD)


A safe edge is an edge between two components with minimum weight. No edge between any two dark blue vertices can be a safe edge. Neither can a edge between any two light blue vertices be a safe edge.

Here, there are three edges between the two components. Their weights are $\{20, 15, 10\}$. The edge with the minimum weight is between vertices $5$ and $1$, with $w_{51}=10$. That's the safe edge between the dark and the light blue components.

### Organizing vertices into components

In the light/dark blue example above, we were able to discover the safe edge between two components as follows:

```text
for every edge (u,v) in G whose vertices are in different components of T:
  safe edge out of component with vertex u := edge with least weight
  safe edge out of component with vertex v := edge with least weight
```

The pseudocode above underscores some practical needs.

* Separate vertices by component. In this simple example, we refer to components by color. For a computational implementation we need something more manageable.

* Track the safe edge between two components.

#### Label vertices by component

We've seen already how to [count the components in graph](https://colab.research.google.com/drive/100poDx0uk7L69y9T9OhDFN_y-E9JzYgy?usp=sharing). We can use each component's count value as a label for its vertices. Returning to our example, there are two components in the graph -- we called them dark and light blue, respectively. Now we'll call them components 1 and 2.  

![Picture](https://drive.google.com/uc?id=1Zg5UaM3McUHvR5lAqrMlm6qq0rMRLbeA)


We could use a humble array (list) to track the component label for each vertex. For example:

```Python
comp[0] = 1
comp[1] = 1
comp[2] = 1
comp[3] = 1
comp[4] = 1
comp[5] = 2
```

Using this labeling, we can explore the edges between any vertex in component 1 and any vertex in component 2, and find the shortest one. **Wait, what edges?** The graph above has two components (1 and 2). If there were any edges between these two components, there wouldn't be two components but one.

The graph above is a *copy* of the graph for which we wish to find a minimum spanning tree. The situation is illustrated below, where both the original graph (left) and its copy (left) are shown.

![Picture](https://drive.google.com/uc?id=1ZdNcELnkOmqrBYhDm-bXHfkKUHFbquLm)


Somehow -- and we're about to see how -- we connected vertices 0, 1, 2, 3, and 4 in one component, and vertex 5 in another component. Now we need to find which of the edges **in the original graph can be added** to the copy graph, to connect these two components.
For that, we need to check every edge in the original graph with endpoints in components 1 and 2 respectively. The brute force pseudocode is below:

```text
for every edge (u,v) in the original graph:
  if comp[u] != comp[v]:
    edge (u,v) is a safe edge candidate
```

Among the candidate safe edges, the actual safe edge is the one with the minimum weight. To determine which candidate becomes the safe edge, we employ a *greedy* strategy. We assume that the first candidate safe edge is the safe edge. And then, we compare the safe edge to the remaining candidates. If a candidate has smaller weight, it becomes the safe edge.


```text
for every edge (u,v) in the original graph:
  if comp[u] != comp[v]:
    edge u,v is a safe edge candidate
    if comp[u] has no safe edge yet:
      safe edge for comp[u]: u,v
    else
      compare comp[u]'s existing safe edge
      with edge u,v. If edge u,v has smaller
      weight than existing component, make u,v
      the safe edge for comp[u].
```

In the example above, there are three candidate edges out of the component with vertex 5: their weights are: $w_{45}=20$, $w_{35}=15$, and $w_{15}=10$. The ``if-else`` block in the pseudocode above picks one of these edges as the safe edge -- let's say it's edge $(4,5)$, then moves to the next candidate edge which is $(3,5)$, finds that its weight is less than the currently assumed safe edge, makes it the safe edge, moves to the next edge, finds again that its weight is less, and makes it the safe edge. At the end of these comparisons, the assumed safe edge is the actual safe edge.

![Picture](https://drive.google.com/uc?id=1ZkNBtVnpLXebVeoIzMqkekShKeMgtn-a)


The technique we applied above to discover the safe edge between two components and connect them through it, can be applied to any two components in graph $T$. Even when these components have one vertex each, which is the initial state of $T$: remember that we create $T$ as a copy of $G$ but without $G$'s edges.

# *The* MST algorithm


```text
Given an undirected, weighted graph G:
Initialize T a copy of G with all its vertices and none of its edges.

while T has more than one component:
  Count components and label all vertices in T with their component.
  Assume there are no safe edges out of any component in T.
  for every edge (u,v) in G and u,v are in different components of T:
    if there is no safe edge out of component with vertex u:
      make (u,v) the safe edge out of component with vertex u.
    else:
      if weight(u,v) < weight of current safe edge out of component with vertex u:
        make (u,v) the safe edge out of component with vertex u.
    if there is no safe edge out of component with vertex v:
      make (v,u) the safe edge out of component with vertex v
    else:
      if weight(v,u) < weight of current safe edge out of component with vertex v:
        make (v,u) the safe edge out of component with vertex v.
  add safe edges to T
T is the Minimum Spanning Tree of G.
```
The algorithm above is simpler than it looks. It's important to remember that we use two adjacency matrices. The adjacency matrix for the input graph $G$ to **look up edge weights** and the adjacency matrix for the output graph $T$ where we add only the safe edges.

Adding an edge of $G$ to $T$ is quite straightforward. Let's say that the safe edge between two components of $T$ is $(x,y)$. Adding this edge to $T$ requires two simple assignments:

```python
adj_T[x][y] = adj_G[x][y]
adj_T[y][x] = adj_G[y][x]
```

The two assignments are necessary because the adjancey matrix of an undirected graph is symmetric.

What makes this algorithm a <s>nightmare</s> challenge to implement is chosing the data structures to represent safe edges for each component in $T$. To overcome this challenge, it is important to remember that **an edge can be represented as a pair of vertices.**


In [None]:
def initialize_tree_of(graph):
  """Creates a new graph that has all the vertices of input graph and none of
  its edges. The function expects the input graph in adjacency matrix form and
  returns an edgeless copy of the graph also in adjacency matrix representation.
  """
  # What the input graph uses for infinity (any diagonal element should be inf)
  _ = graph[0][0]
  return [[ _ for i in range(len(graph))] for j in range(len(graph))]


def count_and_label(graph):
  """Labels vertices in the same component with the component count. As the
  function discovers new components, it increments the count value, and assigns
  it to every vertex in that component. The function expects the input graph in
  adjacency matrix form. It returns the count of components in the input graph
  and an list with each vertices component label.
  """
  # Initialize count of components
  count = 0
  # Initialize list of visited vertices for the depth-first search.
  visited = []
  # Initialize list with component labels for each vertex.
  comp = [-1] * len(graph)
  # Explore every vertex in the graph
  for u in range(len(graph)):
    # But only if we have not visited it before
    if u not in visited:
      # First time at this vertex: we just found a new component
      count += 1
      # Label this and adjacent vertices with this component count
      bag = [u]
      # Perform a depth first search for all vertices reachable from u
      # These are vertices in the same compoment as u
      while bag:
        v = bag.pop()
        if v not in visited:
          visited.append(v)
          comp[v] = count
          for w in range(len(graph[v])):
            if graph[v][w] < graph[0][0]:
              bag.append(w)
  return count, comp

In [None]:
def minimum_spanning_tree(G):

  # Initialize T as a copy of G with all its vertices and none of its edges.
  T = initialize_tree_of(G)

  # count components of T and while at it, label T's vertices by their components
  count, comp = count_and_label(T)

  # while T has more than one component:
  while count > 1:

    # Assume there are no safe edges out of any component in T.
    safe = [None] * (count+1) # +1 to offet 0-indexing since first component label is 1

    # for every edge (u,v) in G and u,v are in different components of T:
    for u in range(len(G)):
      for v in range(len(G)):
        if G[u][v] < G[0][0]: # There is an edge (u,v)
          if comp[u] != comp[v]: # edge (u,v) is across diff components

            if safe[cfomp[u]] is None:
              # if there is no safe edge out of component with vertex u:
              #  make (u,v) the safe edge out of component with vertex u.
              safe[comp[u]] = [u,v]
            else:
              # if weight(u,v) < weight of current safe edge out of component with vertex u:
              # make (u,v) the safe edge out of component with vertex u.
              current_safe_edge = safe[comp[u]]
              x, y = current_safe_edge[0], current_safe_edge[1]
              current_weight = G[x][y]
              if G[u][v] < current_weight:
                safe[comp[u]] = [u,v]

            if safe[comp[v]] is None:
              safe[comp[v]] = [v,u]
            else:
              current_safe_edge = safe[comp[v]]
              x, y = current_safe_edge[0], current_safe_edge[1]
              current_weight = G[x][y]
              if G[v][u] < current_weight:
                safe[comp[v]] = [v,u]

    # Done with exploring safe edges in this iteration
    # Add safe edges to T
    for i in range(1, count+1):
      safe_edge = safe[i]
      x = safe_edge[0]
      y = safe_edge[1]
      T[x][y] = G[x][y]
      T[y][x] = G[y][x]
    count, comp = count_and_label(T)

  return T # T is the Minimum Spanning Tree of G.

In [None]:
################################################################################
#                                                                              #
#     DO   N O T   MODIFY THIS CODE CELL. IT IS USED FOR TESTING  OUR CODE!    #
#                                                                              #
################################################################################

_ = float('inf')

# The adjacency matrix for the graph used in the examples above

adj_G = [  #0   1   2   3   4   5   <---- column labels
          [ _,  _,  _,  5,  1,  _], # 0 \
          [ _,  _, 20,  5,  _, 10], # 1  \
          [ _, 20,  _, 10,  _,  _], # 2   row
          [ 5,  5, 10,  _,  _, 15], # 3   labels
          [ 1,  _,  _,  _,  _, 20], # 4  /
          [ _,  10, _, 15, 20,  _]  # 5 /
]

# Test your MST:
minimum_spanning_tree(adj_G)

#
# If our code is successful, the output of MST(adj_G) will be:
#
# [[inf, inf, inf,   5,   1, inf],
#  [inf, inf, inf,   5, inf,  10],
#  [inf, inf, inf,  10, inf, inf],
#  [  5,   5,  10, inf, inf, inf],
#  [  1, inf, inf, inf, inf, inf],
#  [inf,  10, inf, inf, inf, inf]]


[[inf, inf, inf, 5, 1, inf],
 [inf, inf, inf, 5, inf, 10],
 [inf, inf, inf, 10, inf, inf],
 [5, 5, 10, inf, inf, inf],
 [1, inf, inf, inf, inf, inf],
 [inf, 10, inf, inf, inf, inf]]

# WRITE YOUR ESSAY HERE

At the beginning of the algorithm, each vertex of the graph is initialized as its own component with no edges between them. A component is a group of 1 or more trees. This means, initially, each node is its own single-node component. When then count the number of components and number them so we can reference what node is a part of what component. While there is more than one component left in the graph, we iterate and connect components in the most cost-effective way possible. For each component, we find the safe edge connecting it to another component. An edge is considered safe if it's the cheapest edge connecting the component to another component that is not already part of the MST. Once we've identified these cheapest edges, we add them to our evolving MST, effectively merging the corresponding components. This continues until there is only one component left and that is the MST. By exploring safe edges and selecting the cheapest ones, we gradually expand the MST while ensuring that it remains a spanning tree and has the minimum total weight.


# Mideterm Reflection
### How do you rate your attendance? It's ok to miss a class meeting now and then, but chronic absences or tardiness need to be addressed.
- I have had one absence this semester and that was due to a meeting I had to attend. I have been present for all other classes

### How do you rate your class participation?
- I dont always speak out, but I am alwasy attentive and engaged in the lectures and course material

### How do you rate your assignment work so far?
- I think it was a little sloppy at first; I did not add very many comments to the code and I think some of my solutions could have been more concise, but I feel like my work has imporved as I've gotten more comfortable with python and all its capabilities

### If any of the above are not at the level where you hoped to be by then, what do you plan to do to improve?
- I will continue to try to expand my knowlege of pyton and make more comments on the methods that I wirite to make my thoughts more clear and explicit

### What can I do to improve both as an instructor and to make the course better?
- I think you do a great job of makeing class intresting and engaging through your jokes and short stories! The only thing I wish was different is that I wish we could have spent more time on dynamic programing topics

### What do you believe your course grade is right now (choose from A, B, C, or D).
 - I think my course grade should be an A