# 02-04 - Spectral Clustering

*April 24 2024*

In the final practice session, we demonstrate how the Laplacian matrix, algebraic connectivity, and the Fiedler vector can be used to detect communities in networks.

In [2]:
import pathpyG as pp
import numpy as np
from tqdm.notebook import tqdm

import seaborn as sns
import matplotlib.pyplot as plt

plt.style.use('default')
sns.set_style("whitegrid")
    
from numpy import linalg as npl
import scipy as sp

In the previous unit, we have shown how we can use the Fiedler vector to find a minimum edge cut that allows to bisect a network into two partitions that naturally map to communities in the graph. We can visualize the found partitions using node colors:

In [4]:
def laplacian(network: pp.Graph):
    A = network.get_sparse_adj_matrix()
    D = sp.sparse.diags([d for d in network.degrees().values()])
    L = D - A
    return L

def algconn(network: pp.Graph):
    L = laplacian(network)
    w = sp.sparse.linalg.eigs(L, which="SM", k=2, return_eigenvectors=False)
    eigenvalues_sorted = np.sort(np.absolute(w))
    return eigenvalues_sorted[1]


def fiedler(network):
    L = laplacian(network).todense()
    w, v = sp.linalg.eig(L, right=False, left=True)
    return v[:, np.argsort(np.absolute(w))][:, 1]

In [5]:
n = pp.Graph.from_edge_list([('a', 'b'), ('b', 'c'), ('c', 'a'), ('d', 'e'), ('e', 'f'), ('f', 'g'), ('g', 'd'), ('d', 'f'), ('b', 'd')]).to_undirected()
pp.plot(n, edge_color='white', node_label = [x for x in n.mapping.node_ids]);

In [10]:
print('Algebraic connectivity =', algconn(n))

f = fiedler(n)
print('Fiedler vector =', f)

color = lambda x: 'red' if x < 0 else 'green'
pp.plot(n, node_color={v: color(f[n.mapping.to_idx(v)]) for v in n.nodes }, node_label = [x for x in n.mapping.node_ids])

Algebraic connectivity = 0.3983208681168456
Fiedler vector = [-0.4928865  -0.29655952 -0.4928865   0.21422028  0.35603741  0.35603741
  0.35603741]


<pathpyG.visualisations.network_plots.StaticNetworkPlot at 0x7f12314d4c10>

 Let us now generalize this to a recursive spectral clustering algorithm that can detect any number of communities. We will test in on a modular network that consists of four $5$-regular networks with $20$ nodes each. We use `pathpy`'s $+$ operator to create the union of networks and then rewire a small number of arbitrary links in such a way that the four modules are connected in a single connected network (with strong cluster structure).

In [None]:
n = 20
p = 10/n

random_1 = pp.generators.random_graphs.ER_np(n, p, node_uids=[str(x) for x in range(n)])
random_2 = pp.generators.random_graphs.ER_np(n, p, node_uids=[str(x+1*n) for x in range(n)])
random_3 = pp.generators.random_graphs.ER_np(n, p, node_uids=[str(x+2*n) for x in range(n)])
random_4 = pp.generators.random_graphs.ER_np(n, p, node_uids=[str(x+3*n) for x in range(n)])

# Add four networks to a single network with 80 nodes
modular = random_1 + random_2 + random_3 + random_4

# Pick one link from each of the modules and connect the endpoints in a "ring"
e1 = np.random.choice([e.uid for e in random_1.edges])
e2 = np.random.choice([e.uid for e in random_2.edges])
e3 = np.random.choice([e.uid for e in random_3.edges])
e4 = np.random.choice([e.uid for e in random_4.edges])
e1 = random_1.edges[e1]
e2 = random_2.edges[e2]
e3 = random_3.edges[e3]
e4 = random_4.edges[e4]
modular.remove_edges(e1, e2, e3, e4)
modular.add_edge(e1.w, e2.v)
modular.add_edge(e2.w, e3.v)
modular.add_edge(e3.w, e4.v)
modular.add_edge(e4.w, e1.v)
modular.plot(edge_color='gray')

We first compute the algebraic connectivity and the Fiedler vector of this network.

In [None]:
print('Algebraic connectivity =', algconn(modular))

f = fiedler(modular)
print('Fiedler vector =', f)

In [None]:
for v in modular.nodes:
    v['color'] = color(f[modular.nodes.index[v.uid]])
modular.plot(edge_color='gray')

Using this approach, we find one of several minimal edge cuts, where the two partitions are connected by two edges. We note that a cut that separates only one of the four communities would have the same cut size, but a larger normalized cut size, which is why we obtain a cut where each partition contains two of the four communities.

To detect communities contained in each of those partitions, we may have to recursively apply the bisectioning. To simplify this recursive application we write a function that (i) uses the Fiedler vector to bisect the network and (ii) returns two new network objects that contain the nodes/edges in the two communities.

In [None]:
def spectral_bisection(network):

    f = fiedler(network)

    cluster_a = network.copy()
    cluster_b = network.copy()

    for v in network.nodes.uids:
        if f[network.nodes.index[v]]<=0: 
            cluster_a.remove_node(v)
        else:
            cluster_b.remove_node(v)
    
    return cluster_a, cluster_b

We can write a function that colors the nodes in the network based on the clusters to which they belong (assuming that we pass a list of clusters as argument, where each cluster is a pathpy network). To achieve this, we first create a dictionary that maps nodes to cluster labels and then use the `pathpy` function `color_map` to assign easily distinguishable colors to nodes:

In [None]:
def color_nodes(network, clusters):
    colors = sns.color_palette("Set1", 200)
    mapping = {}
    for v in network.nodes.uids:
        for i in range(len(clusters)):
            if v in clusters[i].nodes.uids:
                mapping[v] = i
    for v in network.nodes:
        v['color'] = colors[mapping[v.uid]]

Let's try our functions in the example network:

In [None]:
a, b = spectral_bisection(modular)
color_nodes(modular, [a,b])
modular.plot(edge_color='gray')

That's nice! But how do we know whether we should continue to partition the subgraphs `a` and `b` or not? We can compute the algebraic connectivity of the subgraphs. A small value indicates that there is a small cut in the graph. In this case, we can apply the bisectioning again to find natural communities.

In [None]:
print(algconn(a))
print(algconn(b))

We again apply the spectral bisectioning to the subgraphs to find a total of four clusters:

In [None]:
a1, a2 = spectral_bisection(a)
color_nodes(a, [a1, a2])
a.plot(edge_color='gray')

In [None]:
b1, b2 = spectral_bisection(b)
color_nodes(b, [b1, b2])
b.plot(edge_color='gray')

To check whether we should continue we calculate the algebraic connectivity of the four subgraphs:

In [None]:
print(algconn(a1))
print(algconn(a2))
print(algconn(b1))
print(algconn(b2))

We find that the remaining four subgraphs have a large algebraic connectivity, which means that they do not contain a small cut along which we can naturally partition them into communities. We thus stop the recursive partitioning. We can automate the process above using the following recursive function, which returns a set of subgraphs that represent the clusters in the network. We can further pass a threshold for the algebraic connectivity below which we perform a further bisection. This parameter determines the "resolution" of our community detection algorithm.

In [None]:
def spectral_clustering(network: pp.Graph, threshold=1e-1) -> set:
    # bisect network if algebraic connectivity is below threshold 
    if algconn(network) < threshold:
        a, b = spectral_bisection(network)

        # recursive apply the clustering to the two clusters 
        clusters_a = spectral_clustering(a, threshold)
        clusters_b = spectral_clustering(b, threshold)

        # merge the cluster lists
        return clusters_a | clusters_b
    else:
    # do not bisect, i.e. this stops the recursion
        return set([ network ])

In [None]:
clusters = spectral_clustering(modular)
color_nodes(modular, list(clusters))
modular.plot(edge_color='gray')