***

*Course:* [Math 535](https://people.math.wisc.edu/~roch/mmids/) - Mathematical Methods in Data Science (MMiDS)  
*Chapter:* 4-Spectral graph theory   
*Author:* [Sebastien Roch](https://people.math.wisc.edu/~roch/), Department of Mathematics, University of Wisconsin-Madison  
*Updated:* Jan 6, 2024   
*Copyright:* &copy; 2024 Sebastien Roch

***

In [None]:
# IF RUNNING ON GOOGLE COLAB, UNCOMMENT THE FOLLOWING CODE CELL
# When prompted, upload: 
#     * mmids.py
# from your local file system
# Files at: https://github.com/MMiDS-textbook/MMiDS-textbook.github.io/tree/main/utils
# Alternative instructions: https://colab.research.google.com/notebooks/io.ipynb

In [None]:
from google.colab import files

uploaded = files.upload()

for fn in uploaded.keys():
    print('User uploaded file "{name}" with length {length} bytes'.format(
      name=fn, length=len(uploaded[fn])))

In [None]:
# PYTHON 3
import numpy as np
from numpy import linalg as LA
from numpy.random import default_rng
rng = default_rng(535)
import matplotlib.pyplot as plt
import pandas as pd
import networkx as nx
import mmids

## Motivating example: finding communities

In this chapter, we analyze datasets in the form of networks. As motivation, we first look at the [Karate Club dataset](https://en.wikipedia.org/wiki/Zachary%27s_karate_club). 

From [Wikipedia](https://en.wikipedia.org/wiki/Zachary%27s_karate_club):
> A social network of a karate club was studied by Wayne W. Zachary for a period of three years from 1970 to 1972. The network captures 34 members of a karate club, documenting links between pairs of members who interacted outside the club. During the study a conflict arose between the administrator "John A" and instructor "Mr. Hi" (pseudonyms), which led to the split of the club into two. Half of the members formed a new club around Mr. Hi; members from the other part found a new instructor or gave up karate. Based on collected data Zachary correctly assigned all but one member of the club to the groups they actually joined after the split.

![Karate club network](https://upload.wikimedia.org/wikipedia/commons/thumb/0/0d/Social_Network_Model_of_Relationships_in_the_Karate_Club.png/480px-Social_Network_Model_of_Relationships_in_the_Karate_Club.png)

**Figure:** Karate Club network ([Source](https://commons.wikimedia.org/wiki/File:Social_Network_Model_of_Relationships_in_the_Karate_Club.png))

We use the [`NetworkX`](https://networkx.org) package to load the data and vizualize it. We will say more about it later in this chapter. In the meantime, there is a good tutorial [here](https://networkx.org/documentation/stable/tutorial.html).

In [None]:
import networkx as nx

In [None]:
G = nx.karate_club_graph()
nx.draw_networkx(G)

Our goal: 

> identify natural sub-communities in the network 

That is, we want to find groups of nodes that have many links between them, but relatively few with the other nodes. 

It will turn out that the eigenvectors of the Laplacian matrix, a matrix naturally associated to the graph, contain useful information about such communities.

## Background: review of graphs and associated matrices

**NUMERICAL CORNER:** In Python, the [`NetworkX`](https://networkx.org) package provides many functionalities for defining, modifying and plotting graphs. For instance, many standard graphs can be defined conveniently. The [`petersen_graph()`](https://networkx.org/documentation/stable/reference/generated/networkx.generators.small.petersen_graph.html#networkx.generators.small.petersen_graph) function defines the Petersen graph.

In [None]:
import networkx as nx

In [None]:
G = nx.petersen_graph()

This graph can be plotted using the function [`draw_networkx()`](https://networkx.org/documentation/networkx-1.7/reference/generated/networkx.drawing.nx_pylab.draw_networkx.html).

In [None]:
nx.draw_networkx(G, node_size=600, node_color='black', font_size=16, font_color='white')

Other standard graphs can be generated with special functions, e.g. complete graphs using [`complete_graph()`](https://networkx.org/documentation/stable/reference/generated/networkx.generators.classic.complete_graph.html#networkx.generators.classic.complete_graph). See [here](https://networkx.org/documentation/stable/reference/generators.html#module-networkx.generators.classic) for a complete list.

In [None]:
G = nx.complete_graph(3)

In [None]:
nx.draw_networkx(G, node_size=600, node_color='black', font_size=16, font_color='white')

See [here](https://networkx.org/documentation/stable/reference/functions.html) and [here](https://networkx.org/documentation/stable/reference/algorithms/index.html) for a list of functions to access various properties of a graph. Here are a few examples:

In [None]:
G = nx.path_graph(10)

In [None]:
nx.draw_networkx(G, node_size=600, node_color='black', font_size=16, font_color='white')

In [None]:
G.number_of_nodes() # number of nodes

In [None]:
G.number_of_edges() # number of edges

In [None]:
G.has_node(7) # checks whether the graph has a particular vertex

In [None]:
G.has_node(10)

In [None]:
G.has_edge(0, 1) # checks whether the graph has a particular vertex

In [None]:
G.has_edge(0, 2)

In [None]:
[n for n in G.neighbors(2)] # returns a list of neighbors of the specified vertex

In [None]:
nx.is_connected(G) # checks whether the graph is connected

In [None]:
[cc for cc in nx.connected_components(G)] # returns the connected components

In [None]:
for e in G.edges():
    print(e)

Another way of specifying a graph is to start with an empty graph with a given number of vertices and then add edges one by one. The following command creates a graph with $4$ vertices and no edge (see [`empty_graph()`](https://networkx.org/documentation/stable/reference/generated/networkx.generators.classic.empty_graph.html#networkx.generators.classic.empty_graph)).

In [None]:
G = nx.empty_graph(4)

In [None]:
G.add_edge(0, 1)
G.add_edge(2, 3)
G.add_edge(0, 3)
G.add_edge(3, 0)

In [None]:
nx.draw_networkx(G, node_size=600, node_color='black', font_size=16, font_color='white')

$\unlhd$

**NUMERICAL CORNER:** The package `NetworkX` also supports digraphs.

In [None]:
G = nx.DiGraph()
nx.add_star(G, [0, 1, 2, 3, 4])

In [None]:
nx.draw_networkx(G, node_size=600, node_color='black', font_size=16, font_color='white')

Another way of specifying a digraph is to start with an empty graph with a given number of vertices and then add edges one by one (compare to the undirected case above). The following command creates a graph with no vertices.

In [None]:
G = nx.DiGraph()

In [None]:
G.add_edge(0, 1)
G.add_edge(2, 3)
G.add_edge(0, 3)
G.add_edge(3, 0)
G.add_edge(1,1)

In [None]:
nx.draw_networkx(G, node_size=600, node_color='black', font_size=16, font_color='white')

$\unlhd$

**NUMERICAL CORNER:** Using `NetworkX`, the adjacency matrix of a graph can be obtained with [`adjacency_matrix()`](https://networkx.org/documentation/stable/reference/generated/networkx.linalg.graphmatrix.adjacency_matrix.html). By default, it returns a `SciPy` sparse matrix. Alternatively, one can get a regular array with [`toarray()`](https://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.csr_matrix.toarray.html). Recall that in NumPy (and SciPy) array indices start at $0$. Consistently, NetworkX also names vertices starting at $0$. **Note, however, that this conflicts with our mathematical conventions.**

In [None]:
G = nx.complete_graph(3)

In [None]:
A = nx.adjacency_matrix(G)
print(A)

In [None]:
A = nx.adjacency_matrix(G).toarray()
print(A)

In [None]:
G = nx.petersen_graph()
A = nx.adjacency_matrix(G)
print(A)

The incidence matrix is obtained with [`incidence_matrix()`](https://networkx.org/documentation/stable/reference/generated/networkx.linalg.graphmatrix.incidence_matrix.html) -- again as a sparse array.

In [None]:
B = nx.incidence_matrix(G)
print(B)

In [None]:
B = nx.incidence_matrix(G).toarray()
print(B)

In the digraph case, the definitions are adapted as follows. The adjacency matrix $A$ of a digraph $G = (V, E)$ is the matrix
defined as

\begin{align*}
A_{xy} 
= 
\begin{cases}
1 & \text{if $(x,y) \in E$}\\ 
0 & \text{o.w.}
\end{cases}
\end{align*}

The incidence matrix of a digraph $G$ with vertices $1,\ldots,n$ and edges $e_1, \ldots, e_m$ is the matrix $B$ such that $B_{ij} = -1$ if egde $e_j$ leaves vertex $i$, $B_{ij} = 1$ if egde $e_j$ enters vertex $i$, and 0 otherwise. 

**NUMERICAL CORNER:** We revisit an earlier directed graph.

In [None]:
G = nx.DiGraph()

In [None]:
G.add_edge(0, 1)
G.add_edge(2, 3)
G.add_edge(0, 3)
G.add_edge(3, 0)
G.add_edge(1,1)

We compute the adjacency and incidence matrices. For the incidence matrix, one must specify `oriented=True` for the oriented version.

In [None]:
A = nx.adjacency_matrix(G).toarray()
print(A)

In [None]:
B = nx.incidence_matrix(G, oriented=True).toarray()
print(B)

Revisiting an ealier undirected graph, we note that `incidence_matrix()` can also produce an arbitrary oriented incidence matrix by using the `oriented=True` option.

In [None]:
G = nx.empty_graph(4)

In [None]:
G.add_edge(0, 1)
G.add_edge(2, 3)
G.add_edge(0, 3)
G.add_edge(3, 0)

In [None]:
B = nx.incidence_matrix(G, oriented=True).toarray()
print(B)

$\unlhd$