# Assignment 8
#### Due November 18, 2022, 11:59

In this week’s assignment, we are going dive to dive back into graph theory and expand on the subject of network science.  
Graphs are powerful constructs with even more powerful mathematical properties that we can take advantage of when we can formulate our problems as a graph. This time around, we are interested in one network property in particular: the **local clustering coefficient** of a node.

## Submission
Edit and turn in this jupyter notebook file containing your solutions to each task.  
Implement your solution to each of the exercises in the code field below the exercise description.  

The libraries you may need are already given, any extra imports are not allowed.

___

### Local clustering coefficient
In this assignment, we want to calculate the local clustering coefficient of a node in an undirected graph. 

Recall that an undirected graph consists a set of nodes that are connected to some extent, where all the edges that connect the nodes are bidirectional. 
Imagine, for example, a graph where the nodes represent people at a party pre-corona and there is an edge between two people if they shook hands. This example graph is undirected because any person, A, can shake hands with another person, B, only if B also shakes hands with A. This means that if A is connected to B, then B is also per definition automatically connected to A.

The intuition behind the **local clustering coefficient** metric is that it describes the connectivity of the neighborhood of a node. That is, the proportion of connections among its neighbours which are actually realised out of the number of all possible connections.

Imagine a person, A, that has three friends: B, C, and D. These friends are person A’s neighborhood. They all have in common that they are friends with A. However, they might not be friends with each other. The local clustering coefficient expresses how many of A’s friends are in fact also friends with each other. 

Different scenarios for the local clustering coefficient of A:
- $LCC_A = \frac{0}{3}$ -- noone is friends in the neighbourhood: no nodes are connected
- $LCC_A = \frac{1}{3}$ -- only B and C are friends (or only C and D, or only D and B)
- $LCC_A = \frac{2}{3}$ -- we have two pairs of friends in the neighbourhood
- $LCC_A = \frac{3}{3}$ -- everybody is friends in the neighbourhood: all nodes are connected


<img src="img/clustering_coeff.png" align="center">

___

## Assignment
Your task in the following exercises is to compute the local clustering coefficient from various representations of the same undirected graph, `tiny`, consisting of 5 nodes and 7 edges.


In [2]:
import numpy as np

### Exercise 1
As we know, one way of representing a graph is with an edge list. 
This representation can be found in the file `tiny_edgelist.txt`. The file contains one edge per line, shown as an edge pair of 2 integers separated by whitespace. Investigate the file to further by yourself to see the formatting of the edge pairs. 

Write a function called `coefficient_from_edgelist(edgefile, node_id)` that takes an edge list file formatted like so, and a node, and returns the local clustering coefficient for that node, rounded to 3 decimals.
___
`coefficient_from_edgelist(data/tiny_edgelist.txt, 2)`  
\>\> `0.667`

In [3]:
# find local cluster coefficient of a node from edge list file

def edge_reader(G, node_id):
    neighbors_of_node = np.where(G == node_id)[0] 
    # print(f'{node_id} has edges {G[neighbors_of_node]}')
    
    # identify neighbors of node, and put them in a list
    neighor_nodes = []
    for i in neighbors_of_node:
        for j in range(2):
            if G[i][j] != node_id:
                neighor_nodes.append(G[i][j])
    # print(f'{node_id} has neightbors {neighor_nodes}')
    
    # find number of edges between neighbors_nodes
    num_edges = 0
    for i in range(len(neighor_nodes)):
        for j in range(i+1, len(neighor_nodes)):
            if np.any(np.all(G == [neighor_nodes[i], neighor_nodes[j]], axis=1)):
                num_edges += 1
    # print(f'{node_id} has {num_edges} edges among its neighbors')
    
    # the total number of edges between neighbors_nodes is the number of neighbors_nodes choose 2
    total_edges = len(neighor_nodes) * (len(neighor_nodes) - 1) / 2
    # print(f'{node_id} has {total_edges} total edges among its neighbors')
    
    # the local cluster coefficient is the number of edges between neighbors_nodes divided by the total number of edges between neighbors_nodes
    # return the local cluster coefficient
    return round(num_edges / total_edges, 3)

def coefficient_from_edgelist(edgefile, node_id):
    # read edge list file into numpy array
    G = np.loadtxt(edgefile, dtype=int)
    # print(G)
    
    return edge_reader(G, node_id)
    
coefficient_from_edgelist('data/tiny_edgelist.txt', 2)


0.667

## Exercise 2
Another common way to represent a graph is with an adjacency matrix. 
This representation can be found in the file `tiny_adjmatrix.txt`. Investigate the file by yourself to see the formatting of the adjacency matrix. 

Write a function called `coefficient_from_adjmatrix(matrixfile, node_id)` that takes an adjacency matrix file formatted like so, and a node, and returns the local clustering coefficient for that node, rounded to 3 decimals.
___
`coefficient_from_adjmatrix(data/tiny_adjmatrix.txt.txt, 0)`  
\>\> `1.0`

In [4]:
def coefficient_from_adjmatrix(matrixfile, node_id):
    # read adjacency matrix file into numpy array
    G = np.loadtxt(matrixfile, dtype=int)
    
    # convert adjacency matrix to edge list
    G = np.argwhere(G == 1)
    
    # sort all numbers in each row
    G = np.sort(G, axis=1)
    
    # remove all duplicate rows in G
    G = np.unique(G, axis=0)

    return edge_reader(G, node_id)

coefficient_from_adjmatrix("data/tiny_adjmatrix.txt", 0)

1.0