# Asignment Week 4 
## Randomised Graph Contraction

The file contains the adjacency list representation of a simple undirected graph. There are 200 vertices labeled 1 to 200. The first column in the file represents the vertex label, and the particular row (other entries except the first column) tells all the vertices that the vertex is adjacent to. So for example, the 6
th row looks like : "6	155	56	52	120	......". This just means that the vertex with label 6 is adjacent to (i.e., shares an edge with) the vertices with labels 155,56,52,120,......,etc

Your task is to code up and run the **randomized contraction algorithm** for the min cut problem and use it on the above graph to compute the **min cut**.  (HINT: Note that you'll have to figure out an implementation of edge contractions.  Initially, you might want to do this naively, creating a new graph from the old every time there's an edge contraction.  But you should also think about more efficient implementations.)   

(WARNING: As per the video lectures, please make sure to run the algorithm many times with different random seeds, and remember the smallest cut that you ever find.)  Write your numeric answer in the space provided.  So e.g., if your answer is 5, just type 5 in the space provided.

### Solution

#### Some definitions

To solve this asignment, it's important to understand first what graphs and minimun cuts are, as well as and how the Randomised Contration Algorithm (RandCon) works.

- **Graph**: in Computer Science, a **graph** is a data structure formed by nodes (usually called **vertices** or vertex in its singular form) and a set of connections between these verteces, known normally as **edges**. Graphs are used to model relationships between objects and are widely used in network design, social network analysis, routing algorithms, etc. A graph G is usually written as G = (V, E), where V is the number of vertices and E the number of edges in the graph.

- **Edges**: as described above, the connection between vertices are known as edges, which can be undirected (defining a connection that goes 'both ways' between the vertices it connects) or directed (when the connection goes in only one direction between the nodes or vertices). Graphs with (mostly) undirected edges are commonly called undirected, while those with (mostly) directed edges are called directed. Notice that vertices can be directly connected by more than one edge (even when it might seem redundant).

The **Image 1** below represents a 10 vertex undirected graph:
<img src="Week_4_graph_example_iter0.png">

As indicated in the problem, graphs can be represented as lists of lists (or arrays), where each list represents a vertex and its connections with other vertices via edges. For example, the array representing vertex '1' and its connections the graph of "Image 1" can be written as: [1, 2, 4, 7, 8], since vertex 1 is directly connected with nodes/vertices 2, 4, 7, and 8.

The complete definition of the graph would then be:

$\quad$[\
$\quad$ [1, 2, 4, 7, 8], \
$\quad$ [2, 1, 3], \
$\quad$ [3, 2, 6], \
$\quad$ [4, 1, 5], \
$\quad$ [5, 4, 6], \
$\quad$ [6, 3, 5, 9, 10], \
$\quad$ [7, 1, 8, 10], \
$\quad$ [8, 1, 7, 9], \
$\quad$ [9, 6, 8, 10], \
$\quad$ [10, 6, 7, 9] \
$\quad$]



A **cut** in the context of Graphs is a partition of the graph into 2 separate subsets, dividing the graph into two separate components. Cuts are defined as (A, B), where A and B are subsets of the total vertices in the original graph, say V, thus (A U B = V). There are some special kinds of cuts that deserve some attention:

- **Min-Cut**: represents a cut that minimises the number of edges between the subsets. Finding the cut that minimises connections between sets has many applications in networf flow (and vulnerability), clustering, etc.

- **Max-Cut**: represents the cut that maximises the possible edges or connections between the subsets defined by the cut.

- **s-t Cut**: this is a (A, B) cut defined such that the 's' **source vertex** is in subset A and the **target vertex** 't' is in subset B. This cut is often used in network flow and graph connectivity.

#### Random Contraction (Karger) algorithm

The **Random contraction algorithm** (also known as Karger's algorithm) is a randomised algorithm used to fin the minimum cut in a graph. It works by randomly selecting 2 connected vertices and contracting their edges, doing this repeatedly until only 2 'mega-contracted-vertices" remain, which represent the cut of the graph.

In general, it can be shown that this algorithm has a probability of finding the optimal or minimal cut with a probability of $\frac{2}{n(n-1)}$, or $\frac{1}{n^2}$ to simplify. Though this probability is not particularly, high we usually perform the algorithm several times, increasing the chance of finding the optimal cut (in general, using $2^n$ independent trials for a n-vertices graph, we can be fairly certain we'll find the optimal cut).

##### Karger's algorithm example:
Let's use the graph in image one to see how the algorithm would work:

-  **1st contraction**: Let's assume we randomly selected vertex '1' and then vertex '2' between all vertices connected to vertex '1' via one or more edges. This means that vertex 1 and 2 are collapsed into 1, so we:
   -  Select one vertex to 'eliminate' or collapse onto the other one (say we chose vertex '2')
   -  Eliminate vertex '2' from the list of connected vertices of vertex '1' (the remaining vertex)
   -  Add all vertices connected to vertex '2' into the list of connections to vertex '1' if they are not there yet.
   - Replace vertex '2' (collapsed) for vertex '1' (remaining) on all other vertices' connections lists/arrays.

So, after implementing the previous steps, the graph is transformed into the following image:
<img src="Week_4_graph_example_iter1.png">

And the following list:\
$\quad$[\
$\quad$ [1, **_3_**, 4, 7, 8], \
$\quad$ [3, **_1_**, 6], \
$\quad$ [4, 1, 5], \
$\quad$ [5, 4, 6], \
$\quad$ [6, 3, 5, 9, 10], \
$\quad$ [7, 1, 8, 10], \
$\quad$ [8, 1, 7, 9], \
$\quad$ [9, 6, 8, 10], \
$\quad$ [10, 6, 7, 9] \
$\quad$]

-  **2nd contraction**: Now, we randomly selected vertex '6' and then vertex '9' between all vertices connected to vertex '6' via one or more edges. This means that vertex 6 and 9 are collapsed into one. Notice that we now have 2 edges connecting vertices 6 and 10. Performing the same operations, we have the following graph and list:
<img src="Week_4_graph_example_iter2.png">\
$\quad$[\
$\quad$ [1, 3, 4, 7, 8], \
$\quad$ [3, 1, 6], \
$\quad$ [4, 1, 5], \
$\quad$ [5, 4, 6], \
$\quad$ [6, 3, 5, **_8_**, 10], \
$\quad$ [7, 1, 8, 10], \
$\quad$ [8, 1, 7, 6], \
$\quad$ [10, 6, 7, 6] \
$\quad$]

If we continue with the contractions in the following (supposedly random) sequence:

- **3rd contraction**: Vertex 6 and then connected vertex 10.
- **4th contraction**: Vertex 4 and then connected vertex 5.
- **5th contraction**: Vertex 8 and then connected vertex 6.
- **6th contraction**: Vertex 4 and then connected vertex 8. By this step, we have the following graph and list:
<img src="Week_4_graph_example_iter6.png">\
$\quad$[\
$\quad$ [1, 3, 4, 7], \
$\quad$ [3, 1, 6], \
$\quad$ [4, 1, 1, 7], \
$\quad$ [7, 1, 4], \
$\quad$]

- **7th contraction**: Vertex 4 and then connected vertex 1. Notice that we need to eliminate a self-connection here.
- **8th contraction**: Vertex 4 and then connected vertex 7. Now we have just two vertices, so we stop iterating and use this as the final cut. 
<img src="Week_4_graph_example_iter8.png">

Now we need to return the cut represented by these two final vertices (identifying the contractions we performed to arrive to the last 2 collapsed vertices):

<img src="Week_4_graph_cut_example.png">

With the example above we can see that we need the following functions to solve the asignment:

- A function to **randomly select an element (vertex)** from a list using a uniform distribution (and different seeds to account for all the executions of the Karger's algorithm to increase the probability of finding the optimal cut). We can use this function to randomly define both the first vertex and then another vertex adjacent to it.
- Another function that will receive both adjacent **vertices and collapse them**.
- Another main function that receives the original Graph and calls the previous function to **execute each section of the solution**. This function will take the initial graph and iterate a number of times (in this example, we'll use 50) to arrive at the final cut. We can store the pairs of vertices that were collapsed if we want to 'reconstruct' the cut from the original graph.
- Another function to **load the data** in the .txt file.

Though it is not necessary to solve the problem, we will also have another function to take the initial graph and the pairs of vertices that were collapsed to 'reconstruct' the cut from the initial graph.

#### Function 1: Vertex random selection

In [291]:
import random 
import numpy as np

def rand_vertex(seed: int, vertices_list: list)->int:
    """
    This function retuns a random location/index within a list a given length
    
    Input:
        seed: integer allowing the definition of a differnt random seed for every execution
        vertices_list: defines the list of vertices from which an element will be chosen at random
        
    Output:
        rand_loc = returns the location/index of an element within a list of len=list_len
    """
    
    # Set a random seed
    np.random.seed(seed)
    
    # Asigns random location
    rand_loc = random.choice(vertices_list)
    
    return rand_loc
        

##### Function 2: Vertices Collapse

In [292]:
def vertex_collapse(v1: int, v2: int, X: list)-> list:
    """
    This function collapses two preselected vertices and returns a new graph.
    There are two important features to this function: 
        1st we must remove all self-loops from the collapsed vertices (that is, adjacencies to it self must be eliminated)
        2nd we must keep adjacencies between the vertex and other nodes even when they are duplicates 
            (i.e: [10,4,5] and [4,5] will be collapsed to [10,5,5] ) 
    
    Input:
        v1: index of the 1st selected vertex within the list of lists X.
        v2: index of the 2nd selected vertex (adjacent to the vertex defined bt v1) within the list X[v1]
        X: a graph including several vertices and their adjacencies as a list of lists
    
    Output:
        X: graph after collapsing vertices represented by the indices v1 and v2
    
    """
    
    # We asign the values to the vertex to variables vert_1 and vert_2
    vert_1 = X[v1][0]
    vert_2 = X[v1][v2]
    
    # define index for vertex corresponding to v2
    idx_vertex2 = next((i for i, sublist in enumerate(X) if sublist[0] == vert_2), -1)
    
    # We 1st merge adjacents vertices of v2 into adjacents to v1. We include all adjacencies, even if they already exist in X[v1]
    X[v1] = X[v1] + [adj_vert for adj_vert in X[idx_vertex2][1:] if adj_vert != vert_1]
    
    # We update the whole graph replacing vert_2 for vert_1 when vert_2 is adjacent to other vertices 
    # (except for the vertex where X[v2][0] == vert_2, which remains unchanged and needs to be removed)
    X = [vertex if vertex[0] == vert_2 else [vert_1 if v == vert_2 else v for v in vertex] for vertex in X]
    
    # Now we remove vert_2 from X[v1] to discard self-loops
    X[v1] = [X[v1][0]] + [v for v in X[v1] if v != vert_1]
    
    # Finaly we remove the list where vert_2 is the main vertex
    X = [vert for vert in X if vert[0] != vert_2]
    
    return X

#### Function 3: read txt file

In [293]:
def import_file(file_path:str)->list:
    
    """
    Function receives a file path and returns a list of lists containing the vertex and its adjacent vertices for a graph
    
        Input: file path

        Output: list of lists
    """
    
    data = []

    # Open the file in read mode
    with open(file_path, 'r') as file:
        # Iterate over each line in the file
        for line in file:
            # Split the line by tabs and convert each element to integer
            row = [int(x) for x in line.split()]
            # Append the row to the data list
            data.append(row)

    return data

##### Function 4: Main function

In [328]:
import copy
import random

def karger_contraction(graph: list)->(list, list):
    """
    This function receives the graph, collapses vertices until only 2 remain, counts the edges between these 
    cuts and returns that value as the min_cut, alog with the 2 cuts as the min graph
    
    
    Input: A list of lists containing a graph data. The first element of every list is a vertex, and the remaining elements are its adjacent vertices
    
    Output:  
        min_cut: integer representing the number of edges connecting the 2 sections of the min_cut
        collapsed_vertices: contains lists of all the pairs of vertices collapsed until we arrived at the min_cut (X)
        X: Contains the 2 vertices of the min_cut

    """
    
    # initialise a min_cut value that we'll update with the results of the cuts obtained in each iteration
    min_cut_edges = float('inf')
    collapsed_vertices = []
    min_cut = []
    
    # set a loop to try several reductions to find the min cut
    for i in range(50):
        
        # set random seed
        seed = i
        
        # make a copy of the graph
        X = copy.deepcopy(graph)
        
        # list to store collapsed vertices
        collapsing_vertex_list = []
        
        # contracts edges (vertices) until only 2 vertices remain
        while len(X) > 2:
            
            # select vertices v1 (index for vertex to collapse into) and v2 (index for collapsing vertex adjacent to v1)
            # notice that we are changing the seed for every iteration
            v1 = rand_vertex(seed = seed, vertices_list = range(len(X)))
            v2 = rand_vertex(seed = seed, vertices_list = range(1, len(X[v1])))  # need to exclude the 1st element since it is the vertex and we need the adjacent ones
            
            # Store collapsing vertices to re-construct the graph when the final cut is defined
            collapsing_vertex_list.append([X[v1][0], X[v1][v2]])
            
            # redefine X as the collapsed graph to iterate again
            X = vertex_collapse(v1 = v1, v2 = v2, X = X)
        
        # after collapsing the graph into just 2 cuts, evaluate the edges and reset the min + collapse vertices
        assert len(X[0][1:]) == len(X[1][1:])
        if len(X[0][1:]) < min_cut_edges:
            min_cut_edges = len(X[0][1:])
            collapsed_vertices = collapsing_vertex_list
            min_cut = X
            
            
    return min_cut_edges, collapsed_vertices, min_cut
        

Now, we'll execute the functions to determine the number of edges of the min_cut

In [329]:
# Read the txt file and asign the graph to a list
graph_list = import_file("Week_4_kargerMinCut.txt")

min_cut_edges, collapsed_vertices, min_cut = karger_contraction(graph_list)

print(f"Number of edges for the min_cut: {min_cut_edges}")
print(f"Min_cut: {min_cut}\n")
print(f"Series of collapsing vertices to achieve the min_cut {collapsed_vertices}")

Number of edges for the min_cut: 17
Min_cut: [[63, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200], [200, 63, 63, 63, 63, 63, 63, 63, 63, 63, 63, 63, 63, 63, 63, 63, 63, 63]]

Series of collapsing vertices to achieve the min_cut [[94, 154], [31, 98], [27, 39], [169, 110], [41, 129], [196, 46], [37, 97], [58, 86], [63, 168], [147, 59], [48, 179], [25, 185], [153, 48], [11, 33], [130, 189], [127, 191], [28, 141], [49, 136], [130, 3], [54, 41], [176, 13], [159, 105], [109, 148], [119, 71], [54, 161], [132, 18], [125, 53], [155, 172], [133, 56], [24, 11], [181, 176], [81, 25], [1, 140], [166, 34], [87, 164], [85, 150], [24, 169], [163, 153], [175, 79], [158, 60], [173, 130], [120, 160], [58, 38], [90, 117], [188, 61], [89, 37], [174, 155], [111, 162], [82, 66], [107, 81], [7, 156], [107, 121], [132, 120], [32, 158], [106, 24], [101, 28], [40, 119], [143, 142], [183, 188], [174, 108], [20, 133], [149, 96], [76, 85], [178, 45], [32, 175], [10, 90], [31, 1

#### Tests

The following are tests performed to evaluate the functions:

In [307]:
A = [[1,2,4,7,8], [2,1,3], [3,2,6], 
     [4,1,5], [5,4,6], [6,3,5,9,10], [7,1,8,10],
    [8,1,7,9], [9,8,6,10], [10,7,9,6]]

# collapse [1,2,4,7,8] and [4,1,5]
collapsed = vertex_collapse(v1=0, v2= 2, X=A)
print(f"Collased graph: {collapsed}")

Collased graph: [[1, 2, 7, 8, 5], [2, 1, 3], [3, 2, 6], [5, 1, 6], [6, 3, 5, 9, 10], [7, 1, 8, 10], [8, 1, 7, 9], [9, 8, 6, 10], [10, 7, 9, 6]]


In [330]:
A = [[1,2,4,7,8], [2,1,3], [3,2,6], 
     [4,1,5], [5,4,6], [6,3,5,9,10], [7,1,8,10],
    [8,1,7,9], [9,8,6,10], [10,7,9,6]]

min_cut_edges_test, collapsed_vertices_test, min_cut_test = karger_contraction(A)
print(f"Number of edges for the min_cut: {min_cut_edges_test}")
print(f"Min_cut: {min_cut_test}\n")
print(f"Series of collapsing vertices to achieve the min_cut {collapsed_vertices_test}")

Number of edges for the min_cut: 2
Min_cut: [[5, 8, 8], [8, 5, 5]]

Series of collapsing vertices to achieve the min_cut [[1, 2], [6, 10], [3, 6], [8, 7], [8, 9], [8, 3], [5, 4], [8, 1]]


In [302]:
A = [[1,2,4,7,8], [2,1,3], [3,2,6], 
     [4,1,5], [5,4,6], [6,3,5,9,10], [7,1,8,10],
    [8,1,7,9], [9,8,6,10], [10,7,9,6]]

B = vertex_collapse(v1=0, v2=1, X = A)

print(f"First collapse: {B}")


C = vertex_collapse(v1=4, v2=3, X = B)

print(f"2nd collapse: {C}")

D = vertex_collapse(v1=4, v2=3, X = C)

print(f"3rd collapse: {D}")

E = vertex_collapse(v1=2, v2=2, X = D)

print(f"4th collapse: {E}")

F = vertex_collapse(v1=5, v2=3, X = E)

print(f"5th collapse: {F}")

G = vertex_collapse(v1=2, v2=2, X = F)

print(f"6th collapse: {G}")

H = vertex_collapse(v1=2, v2=1, X = G)

print(f"7th collapse: {H}")

I = vertex_collapse(v1=1, v2=1, X = H)

print(f"8th collapse: {I}")

First collapse: [[1, 4, 7, 8, 3], [3, 1, 6], [4, 1, 5], [5, 4, 6], [6, 3, 5, 9, 10], [7, 1, 8, 10], [8, 1, 7, 9], [9, 8, 6, 10], [10, 7, 9, 6]]
2nd collapse: [[1, 4, 7, 8, 3], [3, 1, 6], [4, 1, 5], [5, 4, 6], [6, 3, 5, 10, 8, 10], [7, 1, 8, 10], [8, 1, 7, 6], [10, 7, 6, 6]]
3rd collapse: [[1, 4, 7, 8, 3], [3, 1, 6], [4, 1, 5], [5, 4, 6], [6, 3, 5, 8, 7], [7, 1, 8, 6], [8, 1, 7, 6]]
4th collapse: [[1, 4, 7, 8, 3], [3, 1, 6], [4, 1, 6], [6, 3, 4, 8, 7], [7, 1, 8, 6], [8, 1, 7, 6]]
5th collapse: [[1, 4, 7, 8, 3], [3, 1, 8], [4, 1, 8], [7, 1, 8, 8], [8, 1, 7, 3, 4, 7]]
6th collapse: [[1, 4, 7, 4, 3], [3, 1, 4], [4, 1, 1, 7, 3, 7], [7, 1, 4, 4]]
7th collapse: [[3, 4, 4], [4, 7, 3, 7, 7, 3], [7, 4, 4, 4]]
8th collapse: [[3, 4, 4], [4, 3, 3]]


#### Function 5: Reconstructing the Cut

In [362]:
def reconstruct_cut(orig_graph: list, collapsed_vertices: list)-> list:
    """
    This function receives the original graph and the collapsed vertices to create a graph formed as a list
    
    cut = [ [ [collapsed vertices in cut 1],[vertices adjacent to the collapsed ones in cut 1]], 
            [ [collapsed vertices in cut 2],[vertices adjacent to the collapsed ones in cut 2]], 
          ]
    
    input:
        orig_graph: the original graph in the form of list of lists
        collapsed_vertices: list of pairs of collapsing vertices in sequential order
    
    output:
        cuts: cut as a list of lists containing 2 elements, each a list of collapsed vertices and adjacent ones
            notice that the list with minimum adjacencies will correspond to the edges of the min_cut 
            (and where the cuts connect)
    """
    
    # initiate list to contain the cuts
    cuts = []
    
    # set a for loop to go through all the collapsed pairs
    for (v1, v2) in collapsed_vertices:
        
        # find the indices for the collapsed vertices in the original graph
        idx_vertex1 = next((i for i, sublist in enumerate(orig_graph) if sublist[0] == v1), -1)
        idx_vertex2 = next((i for i, sublist in enumerate(orig_graph) if sublist[0] == v2), -1)
        
        # both indices should exist in the graph
        assert idx_vertex1 != -1
        assert idx_vertex2 != -1
        
        # Retrieve vertices and adjacencies for each vertex
        vert1, adj1 = orig_graph[idx_vertex1][0], orig_graph[idx_vertex1][1:]
        vert2, adj2 = orig_graph[idx_vertex2][0], orig_graph[idx_vertex2][1:]
        
        # We degine the new adjacencies, keeping duplicates since they are still edges we need to count
        new_adjs = list(set(adj1) - set([v1, v2])) + list(set(adj2) - set([v1, v2]))
        
        # we define the new vertices 
        new_verts = list(set([vert1, vert2]))
        
        # we find the new collapsing vertices in the cut (section vertices)
        idx_cut_v1 = next((i for i, sublist in enumerate(cuts) if vert1 in sublist[0]), -1)
        idx_cut_v2 = next((j for j, sublist in enumerate(cuts) if vert2 in sublist[0]), -1)
        
        # if neither of the new collapsing vertices are in the current cut, we add them as a new cut        
        if idx_cut_v1 == -1 and idx_cut_v2 == -1:
            cuts.append([new_verts, new_adjs])
                
        else: # if either of the new collapsing vertices are in the current cut, we merge them into the current cut

            if idx_cut_v1 != -1 and idx_cut_v2 == -1:
                cuts[idx_cut_v1][0] = list(set(cuts[idx_cut_v1][0] + list(new_verts)))
                cuts[idx_cut_v1][1] = cuts[idx_cut_v1][1] + adj2
                
            if idx_cut_v2 != -1 and idx_cut_v1 == -1:
                cuts[idx_cut_v2][0] = list(set(cuts[idx_cut_v2][0] + list(new_verts)))
                cuts[idx_cut_v2][1] = cuts[idx_cut_v2][1] + adj1

            # if both vertices already are in the cut, we need to merge the cuts (and use the cut's adjacencies, not the original graph's)
            if idx_cut_v2 != -1 and idx_cut_v1 != -1:
                cuts[idx_cut_v1][0] = list(set(cuts[idx_cut_v1][0] + cuts[idx_cut_v2][0]))
                cuts[idx_cut_v1][1] = cuts[idx_cut_v1][1] + cuts[idx_cut_v2][1]
                cuts.pop(idx_cut_v2) # we erase one of the collapsed cuts
            
    return cuts
    

In [363]:
# test reconstruction function

A = [[1,2,4,7,8], [2,1,3], [3,2,6], 
     [4,1,5], [5,4,6], [6,3,5,9,10], [7,1,8,10],
    [8,1,7,9], [9,8,6,10], [10,7,9,6]]

collapsed_vertices = [[1, 2], [6, 10], [3, 6], [8, 7], [8, 9], [8, 3], [5, 4], [8, 1]]

cuts = reconstruct_cut(orig_graph = A, collapsed_vertices = collapsed_vertices)

print(cuts)


[[[1, 2, 3, 6, 7, 8, 9, 10], [1, 9, 1, 10, 8, 6, 10, 9, 3, 5, 9, 7, 2, 6, 8, 4, 7, 3]], [[4, 5], [6, 1]]]
