# BT3017 Tutorial 5

- There is an online copy<sup>+</sup> of this tutorial on github available [here](https://github.com/KohSiXing/Feature-Engineering-for-Machine-Learning/blob/master/BT3017%20Tutorial%205.ipynb)

<sup>+</sup> Online copy will only be published after Wednesday 1000 of that week to prevent plagiarism.

### Preprocessing

- Graph will be built in this format 
    - Node A will be tag to index 0, Node B tag to index 1 and so on till Node L tag to index 11
    - a<sub>ij</sub> will be 0 when *$i = j$*

In [1]:
import numpy as np
import pandas as pd
from functools import reduce

# Dictionary Data for easy conversion
nodesChart = {'A':0, 'B':1, 'C':2, 'D':3, 'E':4, 'F':5, 'G':6, 'H':7, 'I':8, 'J':9, 'K':10, 'L':11}
nodesChart

{'A': 0,
 'B': 1,
 'C': 2,
 'D': 3,
 'E': 4,
 'F': 5,
 'G': 6,
 'H': 7,
 'I': 8,
 'J': 9,
 'K': 10,
 'L': 11}

### 1a

- Build Adjacency Matrix (12 by 12)
- Graph is an unweighted and undirected graph, a<sub>ij</sub> are set to 1 if there is a connection and 0 otherwise (where *$i \ne j$*)
- If a<sub>ij</sub> exists, then a<sub>ji</sub> exists and are identical

In [2]:
# Construct the adjacency matrix
A1 = np.zeros((12,12))

connectedPairs = (('A','I'), ('A','K'), ('B', 'C'), ('B', 'E'), ('B', 'G'), ('C', 'D'), ('D', 'E'), ('F', 'G'), 
                  ('F','J'), ('G','H'), ('I','K'), ('J','K'), ('J','L'))

for i in connectedPairs:
    A1[nodesChart[i[0]]][nodesChart[i[1]]] = 1
    A1[nodesChart[i[1]]][nodesChart[i[0]]] = 1
    
A1

array([[0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 1., 0.],
       [0., 0., 1., 0., 1., 0., 1., 0., 0., 0., 0., 0.],
       [0., 1., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 1., 0., 1., 0., 0., 0., 0., 0., 0., 0.],
       [0., 1., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 1., 0., 0., 1., 0., 0.],
       [0., 1., 0., 0., 0., 1., 0., 1., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0.],
       [1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0.],
       [0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 1., 1.],
       [1., 0., 0., 0., 0., 0., 0., 0., 1., 1., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0.]])

- Compute degree matrix where degree of each node are stored in the leading diagonals

In [3]:
D1 = np.zeros((12,12))

for i in range(len(D1)):
    deg = reduce(lambda x,y : x + y, A1[i])
    D1[i][i] = deg
D1

array([[2., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 3., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 2., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 2., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 2., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 2., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 3., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 2., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 3., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 3., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1.]])

- Compute *Graph Laplacian*

$L1 = D1 - A1$

In [4]:
L1 = D1 - A1
L1

array([[ 2.,  0.,  0.,  0.,  0.,  0.,  0.,  0., -1.,  0., -1.,  0.],
       [ 0.,  3., -1.,  0., -1.,  0., -1.,  0.,  0.,  0.,  0.,  0.],
       [ 0., -1.,  2., -1.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0., -1.,  2., -1.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0., -1.,  0., -1.,  2.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  2., -1.,  0.,  0., -1.,  0.,  0.],
       [ 0., -1.,  0.,  0.,  0., -1.,  3., -1.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0., -1.,  1.,  0.,  0.,  0.,  0.],
       [-1.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  2.,  0., -1.,  0.],
       [ 0.,  0.,  0.,  0.,  0., -1.,  0.,  0.,  0.,  3., -1., -1.],
       [-1.,  0.,  0.,  0.,  0.,  0.,  0.,  0., -1., -1.,  3.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0., -1.,  0.,  1.]])

- Get the eigenvalues and eigenvectors, store them in *w* and *v* respectively

In [5]:
w,v = np.linalg.eig(L1)
w

array([4.87563651e+00, 4.56155281e+00, 3.68563892e+00, 3.82999437e-18,
       1.11040072e-01, 4.38447187e-01, 6.12480357e-01, 1.43605649e+00,
       2.27914764e+00, 3.00000000e+00, 3.00000000e+00, 2.00000000e+00])

- Sort the eigenvalues and eigenvectors

In [6]:
idx = np.argsort(w)
w = w[idx]
v = v[:,idx]

In [7]:
lmbd1 = np.zeros((12,12))

for i in range(len(lmbd1)):
    lmbd1[i][i] = w[i]
    
pd.DataFrame(lmbd1).round(decimals = 5)

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11
0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,0.0,0.11104,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,0.0,0.0,0.43845,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,0.0,0.0,0.0,0.61248,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,0.0,0.0,0.0,0.0,1.43606,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,0.0,0.0,0.0,0.0,0.0,2.0,0.0,0.0,0.0,0.0,0.0,0.0
6,0.0,0.0,0.0,0.0,0.0,0.0,2.27915,0.0,0.0,0.0,0.0,0.0
7,0.0,0.0,0.0,0.0,0.0,0.0,0.0,3.0,0.0,0.0,0.0,0.0
8,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,3.0,0.0,0.0,0.0
9,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,3.68564,0.0,0.0


### 1b

- Since the whole graph is connected, there is only one cluster. By reading from the first column (i.e. columns 0) corresponding to the column which the eigenvalue is 0 along the diagonal, we can further prove that there is only 1 clusters:
    - cluster 1: *[0, 1, ..., 11]* corresponds to all nodes `[A, B, ..., L]`
- Based on inspection of the graph, we intuitively know that there is only **1 connected component**, i.e. all nodes are connected in this graph. The result of the program further proves our claim that there is only 1 connected component in this instance

In [8]:
df1 = pd.DataFrame(v).round(decimals = 5)
df1

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11
0,-0.28868,-0.38265,-0.30057,0.24358,-0.07023,-0.0,0.06819,-0.75277,0.00053,0.1234,-0.15825,-0.06365
1,-0.28868,0.2824,-0.06589,0.00509,0.09134,-0.0,0.60058,0.04428,0.12124,-0.16542,-0.36093,0.53778
2,-0.28868,0.34016,-0.23468,-0.09439,-0.03063,-0.70711,0.08722,0.04428,0.12124,0.33141,0.20268,-0.24667
3,-0.28868,0.36016,-0.30057,-0.13606,-0.10861,0.0,-0.62493,-0.08856,-0.24247,-0.39322,-0.15825,0.17156
4,-0.28868,0.34016,-0.23468,-0.09439,-0.03063,0.70711,0.08722,0.04428,0.12124,0.33141,0.20268,-0.24667
5,-0.28868,-0.04336,0.30057,-0.04387,0.69595,-0.0,-0.21218,-0.08856,-0.24247,0.33754,0.15825,0.29583
6,-0.28868,0.1355,0.30057,0.20093,0.20411,-0.0,0.25848,-0.08856,-0.24247,-0.5494,0.15825,-0.51533
7,-0.28868,0.15243,0.53525,0.5185,-0.46808,0.0,-0.20208,0.04428,0.12124,0.20457,-0.04443,0.13297
8,-0.28868,-0.38265,-0.30057,0.24358,-0.07023,-0.0,0.06819,0.57565,-0.48548,0.1234,-0.15825,-0.06365
9,-0.28868,-0.21741,0.16879,-0.2618,0.18837,0.0,-0.19925,0.17712,0.48495,-0.01957,-0.56362,-0.33537


In [9]:
print("-- Values from Column 0 ---")
col_values = df1[0].values.ravel()
unique_values =  pd.unique(col_values)

for j in range(len(unique_values)):
    print("Cluster ", (j + 1), " : ", df1.index[df1[0].values == unique_values[j]].tolist())

-- Values from Column 0 ---
Cluster  1  :  [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]


### 2a

- Build Adjacency Matrix (12 by 12)
- Graph is same as in `part 1` but connection **between F and G** is removed

In [10]:
# Copy adjacency matrix A1 into A2
A2 = A1

# Disconnect nodes F and G
A2[nodesChart['F']][nodesChart['G']] = 0
A2[nodesChart['G']][nodesChart['F']] = 0
    
A2

array([[0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 1., 0.],
       [0., 0., 1., 0., 1., 0., 1., 0., 0., 0., 0., 0.],
       [0., 1., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 1., 0., 1., 0., 0., 0., 0., 0., 0., 0.],
       [0., 1., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0.],
       [0., 1., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0.],
       [1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0.],
       [0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 1., 1.],
       [1., 0., 0., 0., 0., 0., 0., 0., 1., 1., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0.]])

- Compute degree matrix where degree of each node are stored in the leading diagonals

In [11]:
D2 = np.zeros((12,12))

for i in range(len(D1)):
    deg = reduce(lambda x,y : x + y, A2[i])
    D2[i][i] = deg
D2

array([[2., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 3., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 2., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 2., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 2., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 2., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 2., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 3., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 3., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1.]])

- Compute *Graph Laplacian*

In [12]:
L2 = D2 - A2
L2

array([[ 2.,  0.,  0.,  0.,  0.,  0.,  0.,  0., -1.,  0., -1.,  0.],
       [ 0.,  3., -1.,  0., -1.,  0., -1.,  0.,  0.,  0.,  0.,  0.],
       [ 0., -1.,  2., -1.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0., -1.,  2., -1.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0., -1.,  0., -1.,  2.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  1.,  0.,  0.,  0., -1.,  0.,  0.],
       [ 0., -1.,  0.,  0.,  0.,  0.,  2., -1.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0., -1.,  1.,  0.,  0.,  0.,  0.],
       [-1.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  2.,  0., -1.,  0.],
       [ 0.,  0.,  0.,  0.,  0., -1.,  0.,  0.,  0.,  3., -1., -1.],
       [-1.,  0.,  0.,  0.,  0.,  0.,  0.,  0., -1., -1.,  3.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0., -1.,  0.,  1.]])

- Get the eigenvalues and eigenvectors, store them in *w* and *v* respectively

In [13]:
w,v = np.linalg.eig(L2)
w

array([-5.55111512e-17,  4.38447187e-01,  3.00000000e+00,  4.56155281e+00,
        4.56155281e+00,  2.00000000e+00, -5.73555674e-16,  4.38447187e-01,
        3.00000000e+00,  3.00000000e+00,  1.00000000e+00,  2.00000000e+00])

- Sort the eigenvalues and eigenvectors

In [14]:
idx = np.argsort(w)
w = w[idx]
v = v[:,idx]

In [15]:
lmbd2 = np.zeros((12,12))

for i in range(len(lmbd2)):
    lmbd2[i][i] = w[i]
    
pd.DataFrame(lmbd2).round(decimals = 5)

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11
0,-0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,0.0,-0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,0.0,0.0,0.43845,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,0.0,0.0,0.0,0.43845,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,0.0,0.0,0.0,0.0,0.0,2.0,0.0,0.0,0.0,0.0,0.0,0.0
6,0.0,0.0,0.0,0.0,0.0,0.0,2.0,0.0,0.0,0.0,0.0,0.0
7,0.0,0.0,0.0,0.0,0.0,0.0,0.0,3.0,0.0,0.0,0.0,0.0
8,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,3.0,0.0,0.0,0.0
9,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,3.0,0.0,0.0


### 2b

- By reading from the first and second columns (i.e. columns 0 and 1) since they correspond to the columns which the eigenvalues are 0 along the diagonals, we can conclude that there are 2 clusters and:
    - cluster 1: *[0, 5, 8, 9, 10, 11]* corresponds to nodes `[A, F, I, J, K, L]`
    - cluster 2: *[1, 2, 3, 4, 6, 7]* corresponds to nodes `[B, C, D, E, G, H]`
- The results makes sense. Based on inspection of the graph, we intuitively know that there are **2 connected components**. Furthermore, the values in the cluster generated by the program matches the nodes of each of the connected components in the graph.

In [16]:
df2 = pd.DataFrame(v).round(decimals = 5)
df2

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11
0,-0.10876,-0.40825,-0.09537,0.46471,0.0,-0.0,-0.0,0.76376,-0.01568,-0.02464,-0.18452,-0.01002
1,-0.39349,0.0,0.08456,0.0,-0.0,0.17109,-0.37336,-0.0,0.21471,0.18359,0.0,0.70077
2,-0.39349,0.0,0.30116,0.0,-0.0,-0.66442,-0.47033,0.0,0.21471,0.18359,-0.0,-0.39352
3,-0.39349,0.0,0.38571,0.0,0.0,-0.17109,0.37336,-0.0,-0.42942,-0.36717,-0.0,0.30725
4,-0.39349,0.0,0.30116,0.0,-0.0,0.66442,0.47033,0.0,0.21471,0.18359,-0.0,-0.39352
5,-0.10876,-0.40825,0.09537,-0.46471,0.70711,0.0,0.0,0.10911,-0.1808,0.20255,0.18452,0.01002
6,-0.39349,0.0,-0.38571,0.0,0.0,0.17109,-0.37336,0.0,-0.42942,-0.36717,-0.0,-0.30725
7,-0.39349,0.0,-0.68687,0.0,-0.0,-0.17109,0.37336,-0.0,0.21471,0.18359,-0.0,0.08627
8,-0.10876,-0.40825,-0.09537,0.46471,0.0,0.0,0.0,-0.54554,-0.34592,0.42974,-0.18452,-0.01002
9,-0.10876,-0.40825,0.05356,-0.26096,-0.0,-0.0,-0.0,-0.21822,0.3616,-0.4051,-0.65719,-0.03569


In [17]:
for i in range(2):
    print("--- Values from Column ", i, " ---")
    col_values = df2[i].values.ravel()
    unique_values =  pd.unique(col_values)
    
    for j in range(len(unique_values)):
        print("Cluster ", (j + 1), " : ", df2.index[df2[i].values == unique_values[j]].tolist())

--- Values from Column  0  ---
Cluster  1  :  [0, 5, 8, 9, 10, 11]
Cluster  2  :  [1, 2, 3, 4, 6, 7]
--- Values from Column  1  ---
Cluster  1  :  [0, 5, 8, 9, 10, 11]
Cluster  2  :  [1, 2, 3, 4, 6, 7]


### 3a

- Build Adjacency Matrix (12 by 12)
- Graph has connection **between J and K** removed in addition to F and G

In [18]:
# Copy adjacency matrix A2 into A3
A3 = A2

# Disconnect nodes J and K
A3[nodesChart['J']][nodesChart['K']] = 0
A3[nodesChart['K']][nodesChart['J']] = 0
    
A3

array([[0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 1., 0.],
       [0., 0., 1., 0., 1., 0., 1., 0., 0., 0., 0., 0.],
       [0., 1., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 1., 0., 1., 0., 0., 0., 0., 0., 0., 0.],
       [0., 1., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0.],
       [0., 1., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0.],
       [1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0.],
       [0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 1.],
       [1., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0.]])

- Compute degree matrix where degree of each node are stored in the leading diagonals

In [19]:
D3 = np.zeros((12,12))

for i in range(len(D1)):
    deg = reduce(lambda x,y : x + y, A3[i])
    D3[i][i] = deg
D3

array([[2., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 3., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 2., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 2., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 2., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 2., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 2., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 2., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 2., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1.]])

- Compute *Graph Laplacian*

In [20]:
L3 = D3 - A3
L3

array([[ 2.,  0.,  0.,  0.,  0.,  0.,  0.,  0., -1.,  0., -1.,  0.],
       [ 0.,  3., -1.,  0., -1.,  0., -1.,  0.,  0.,  0.,  0.,  0.],
       [ 0., -1.,  2., -1.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0., -1.,  2., -1.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0., -1.,  0., -1.,  2.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  1.,  0.,  0.,  0., -1.,  0.,  0.],
       [ 0., -1.,  0.,  0.,  0.,  0.,  2., -1.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0., -1.,  1.,  0.,  0.,  0.,  0.],
       [-1.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  2.,  0., -1.,  0.],
       [ 0.,  0.,  0.,  0.,  0., -1.,  0.,  0.,  0.,  2.,  0., -1.],
       [-1.,  0.,  0.,  0.,  0.,  0.,  0.,  0., -1.,  0.,  2.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0., -1.,  0.,  1.]])

- Get the eigenvalues and eigenvectors, store them in *w* and *v* respectively

In [21]:
w,v = np.linalg.eig(L3)
w

array([ 3.00000000e+00, -2.22044605e-16,  4.56155281e+00,  3.00000000e+00,
        2.00000000e+00,  4.38447187e-01,  8.80490372e-17, -3.11043117e-16,
        1.00000000e+00,  2.00000000e+00,  3.00000000e+00,  3.00000000e+00])

- Sort the eigenvalues and eigenvectors

In [22]:
idx = np.argsort(w)
w = w[idx]
v = v[:,idx]

In [23]:
lmbd3 = np.zeros((12,12))

for i in range(len(lmbd3)):
    lmbd3[i][i] = w[i]
    
pd.DataFrame(lmbd3).round(decimals = 5)

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11
0,-0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,0.0,-0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,0.0,0.0,0.0,0.43845,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,0.0,0.0,0.0,0.0,0.0,2.0,0.0,0.0,0.0,0.0,0.0,0.0
6,0.0,0.0,0.0,0.0,0.0,0.0,2.0,0.0,0.0,0.0,0.0,0.0
7,0.0,0.0,0.0,0.0,0.0,0.0,0.0,3.0,0.0,0.0,0.0,0.0
8,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,3.0,0.0,0.0,0.0
9,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,3.0,0.0,0.0


### 3b

- By reading from the first to third columns (i.e. columns 0,1 & 2) since they correspond to the columns which the eigenvalues are 0 along the diagonals, we can conclude that there are 3 clusters and:
    - cluster 1: *[0, 8, 10,]* corresponds to nodes `[A, I, K]`
    - cluster 2: *[1, 2, 3, 4, 6, 7]* corresponds to nodes `[B, C, D, E, G, H]`
    - cluster 3: *[5, 9, 11]* corresponds to the nodes `[F, J, L]`
- Based on inspection of the graph, we intuitively know that there are **3 connected components**. Furthermore, the values in the cluster generated by the program matches the nodes of each of the connected components in the graph.

In [24]:
df3 = pd.DataFrame(v).round(decimals = 5)
df3

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11
0,0.51639,0.57735,-0.01323,0.0,-0.0,-0.0,-0.0,-0.00285,0.07271,0.8165,0.14034,0.0
1,0.18186,0.0,0.01623,-0.0864,-0.0,0.07285,0.5,-0.28867,0.01126,0.0,-0.00183,-0.70181
2,0.18186,0.0,0.01623,-0.30771,-0.0,-0.69956,0.0,-0.28867,0.01126,0.0,-0.00183,0.3941
3,0.18186,0.0,0.01623,-0.3941,-0.0,-0.07285,-0.5,0.57735,-0.02251,0.0,0.00365,-0.30771
4,0.18186,0.0,0.01623,-0.30771,-0.0,0.69956,0.0,-0.28867,0.01126,0.0,-0.00183,0.3941
5,0.02302,0.0,0.57674,-0.0,0.70711,0.0,0.0,0.0,-0.1989,0.0,0.3084,-0.0
6,0.18186,0.0,0.01623,0.3941,0.0,0.07285,0.5,0.57735,-0.02251,0.0,0.00365,0.30771
7,0.18186,0.0,0.01623,0.70181,0.0,-0.07285,-0.5,-0.28867,0.01126,0.0,-0.00183,-0.0864
8,0.51639,0.57735,-0.01323,0.0,-0.0,-0.0,-0.0,0.00142,0.57731,-0.40825,0.37692,-0.0
9,0.02302,0.0,0.57674,-0.0,0.0,-0.0,0.0,0.0,0.3978,0.0,-0.61679,-0.0


In [25]:
for i in range(3):
    print("--- Values from Column ", i, " ---")
    col_values = df3[i].values.ravel()
    unique_values =  pd.unique(col_values)
    
    for j in range(len(unique_values)):
        print("Cluster ", (j + 1), " : ", df3.index[df3[i].values == unique_values[j]].tolist())

--- Values from Column  0  ---
Cluster  1  :  [0, 8, 10]
Cluster  2  :  [1, 2, 3, 4, 6, 7]
Cluster  3  :  [5, 9, 11]
--- Values from Column  1  ---
Cluster  1  :  [0, 8, 10]
Cluster  2  :  [1, 2, 3, 4, 5, 6, 7, 9, 11]
--- Values from Column  2  ---
Cluster  1  :  [0, 8, 10]
Cluster  2  :  [1, 2, 3, 4, 6, 7]
Cluster  3  :  [5, 9, 11]
