# Spectral clustering / embedding

Author: Matt Smart

[Overview](#linkOverview)  
[Graph Theory](##linkGraph)   
[Algorithm](#linkAlgorithm)  
[Example 1: Simple graph cluster walkthrough](#linkExample1)  
[Example 2: Community structure in cell type data](#linkExample2)  
[Resources](#linkResources)  

### Overview <a id='linkOverview'></a>
- non-linear dimension reduction technique
- rough idea: Find structure in data from the eigendecomposition of the Laplacian of the data's distance matrix
    - Step 1: Construct graph Laplacian from the pairwise distances between data points
    - Step 2: Find eigendecomposition of the Laplacian
    - Step 3: Use eigendecomposition to find a low dimension embedding of the data or clusters
- can be thought of as the "vanilla version" of related techniques such as diffusion map and umap

### Some minimal graph theory <a id='linkGraph'></a>

#### Graphs
Graphs are denoted by $G=(V,E)$  
- $V$ or $V(G)$ denotes the "vertices" or "nodes"  
- $E$ or $E(G)$ denotes the "edges"  

#### Graph properties
- **_adjacent vertices_**: directly connected by an edge
- **_edge weights_**: edges may or may not have "weights" associated with them, giving a notion of connection strength
- **_connected graph_**: there exists path between any two nodes $i$ and $j$
- **_weighted graph_**: unweighted graphs have all edge weights constant (or $1$), weighted graphs are any deviation from this
- **_undirected graph_**: all edge weights are symmetric
- **_directed graph_**: one or more edge weights are assymmetric
- **_degree of a node_**: $d(j) = \sum_{i} w_{ij} $

#### Matriices that characterize a graph
- $A$ - **_adjacency matrix_**: $a_{ij}=1$ if nodes are adjacent, $0$ otherwise (note $a_{ii}=0$)
- $W$ - **_weight matrix_**: a scaled (i.e. non-boolean) version of the adjacency matrix; typically restrict weights $w_{ij}\geq 0$
- $D$ - **_degree matrix_**: diagonal matrix of degrees of each node; $d_{jj}=\sum_i w_{ij}$
- $L$ - **_graph laplacian_**: $L=D-A$ for unweighted graphs, or $L=D-W$ for weighted graphs

Example from https://towardsdatascience.com/spectral-clustering-for-beginners-d08b7d25b4d8:
<img src="http://cdn-images-1.medium.com/max/1600/1*GD0E2nvpj853wOz5WTB35Q.jpeg" width="700" height="470" />  

$L$ is a useful construction to store and infer graph properties

#### Basic properties of graph laplacian
- off-diagonals are non-positive (if we constrain the weights to be non-negative); diagonals are non-negative
- columns of $L$ sum to zero
- $L$ is positive semi-definite
- the smallest eigenvalue is $\lambda_1=0$, with corresponding eigenvector a "steady state" of the graph (more below)
- algebraic multiplicity of $\lambda_1=0$ corresponds to the number of subgraphs (multiplicity one means $G$ is connected)
- $L$ has $p$ non-negative, real eigenvalues $0=\lambda_1\leq\lambda_2\leq\ldots\leq\lambda_p$

#### Connection to general stochastic processes
- consider using $L$ as generator of dynamics between the nodes of the graph; $v(t)$ describes the state of the graph as real values at each node
- dynamics: $\dot v = -Lv \implies$ trajectories: $v(t)=e^{-Lt}v_0$
- these dynamics preserve $\sum_i v_i$, as well as the sign of the components
- probability flow: $-L_{ij}$ describes the infinitesimal transition rate of probability (or heat, water.. etc) from node $j$ to node $i$
- steady state for connected graph: the unique zero eigenvector $v_0$
- steady state for non-connected graph: any point in the zero-eigenspace with $\lvert v=\sum_a c_a v_{0,a}\rvert=1$

#### Why is it called "Laplacian"?
- recall the heat equation $\frac{\partial u}{\partial t}=\nabla^2 u$
- $L$ is the discrete version of the continuous laplacian operator $\nabla^2$
- suppose space were note continuous but discrete (e.g. mesh of $p$ connected nodes), then $\dot u = -Lu$ would describe heat flow on the mesh

#### For our purposes
- we will assume connected graph and make sure any constructions satisfy this
- nodes of the graph are in 1-1 correspondence with our data points/samples
- edges describe some relationship between data samples (e.g. proximity)

### Algorithm <a id='linkAlgorithm'></a>

#### Setup / input
- suppose one has $p$ samples of N-dimensional data points, $x_i\in\mathbb{R}^N$
- store the data samples columnwise as $X\in\mathbb{R}^{N\,\times\,p}$

#### Step 1: Distance matrix -> Similarity matrix  -> Graph Laplacian

**_Pairwise distances_**: Choose a metric $d(\cdot, \cdot)$.
- often one uses Euclidean distance if data is continuous samples from $\mathbb{R}^N$
- other distances may be more appropriate (e.g. 1-norm if data is binarized)

**_Similarity matrix_**: Crudely, make them inversely proportional to the distances. Goal is to store local relationships between the datapoints.  
Example choices:
- $s_{ij}=\frac{1}{d_{ij}^2}$  
- $s_{ij}(\sigma)=exp(-d_{ij}^2\,/\, 2\sigma^2)$  
- $s_{ij}$ as a boolean indicator function for k-NN
- alternatively, it could correspond to a qualitative ordering / ranking of similarities

Intuition: by associating transition probabilities with similarities, trajectories beginning within "clusters of similar points" will flow throughout said cluster before moving to other clusters. We expect this to be reflected somehow in the eigenspectrum, since for a markov process (e.g. master equation) the eigenspectrum captures info about timescales. 

(This is more physicist/dynamicist interpretation. There is another interpretation based on minimal cuts through graphs discussed in the references, which appears better suited to image detection/clustering scenarios.)

**_Laplacian construction_**: Need to make some choices to build something resembling $L=D-S$.  
First construct diagonal degree matrix $D$ by summing the columns of similarity matrix $S$: $\:\:d_{jj} = \sum_i s_{ij} $.  
Possible Laplacian choices (some are outlined in von Luxburg, 2007): 
1. $L=D-S\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:$ (standard, unnormalized)
2. $L=I-SD^{-1}\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:$  (normalized)
3. $L=I-D^{-1/2}SD^{-1/2}\:\:\:\:\:\:$ (normalized, symmetrized)


#### Step 2: Find eigendecomposition of the Laplacian
- normalize eigenvectors to sum to one; recall spectral structure of Laplacian
- this step is otherwise standard


#### Step 3: (SPECTRAL EMBEDDING) Use eigendecomposition to construct a low-dimension embedding 
...
??????
...  
...  

#### Step 4: (SPECTRAL CLUSTERING) Use low-dimension embedding to choose clusters
??????

#### Output
- ???????? 
- ???????? maybe remove
- ???????? 
- locally optimal embedding (k-dim representation) $Y\in\mathbb{R}^{k\,\times\,p}$

#### Notes
- when similarity graph is not fully connected, the multiplicitry of the zero eigenvalue gives estimate on the number of clusters

#### Complexity (according to Sckit-learn)
- Spectral Embedding is less than $O\left(p^3\right)$  (where $p$ is the number of $\mathbb{R}^N$ data points)
- suggests kNN approach typically faster than diffusion kernel approach, seems counterintuitive
- compare vs. PCA $\approx O(p^2)$ and MDS $\approx O(p^3)$

### Example 1: Walkthrough + basic idea to identify graph clusters <a id='linkExample1'></a>

In [185]:
%matplotlib notebook
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
import networkx as nx
import numpy as np
import scipy
from sklearn.datasets import make_blobs
from sklearn.metrics import pairwise_distances

p = 15     # num samples
n = 3      # original dimension R^n
k = 3      # number clusters

# generate k blobs in n-dim
X, Y = make_blobs(n_samples=p, centers=k, n_features=n, shuffle=False, random_state=40)
print("X data samples shape:", X.shape)
print("Y data labels shape:", Y.shape)

# visualize data in R^n
fig = plt.figure()
if n == 2:
    for label in set(Y):
        mask = Y==label
        plt.scatter(X[:,0][mask], X[:,1][mask], marker = '.', label = label, s=120, alpha=0.7)
    for i in range(p):
        plt.text(X[i,0],X[i,1], i, fontsize=9)
if n == 3:
    ax = fig.add_subplot(111, projection='3d')
    for label in set(Y):
        mask = Y==label
        ax.scatter(X[:,0][mask], X[:,1][mask], zs=X[:,2][mask], zdir='z', s=20, c=None, depthshade=True)
    for i in range(p):
        ax.text(X[i,0],X[i,1],X[i,2], i, fontsize=9)
plt.legend()
plt.show()

X data samples shape: (15, 3)
Y data labels shape: (15,)


<IPython.core.display.Javascript object>

No handles with labels found to put in legend.


In [183]:
# compute data distances
distances = pairwise_distances(X, metric='euclidean')

# convert to similarity matrix
sigma = 1
S = np.exp(-distances**2)/(2*sigma**2)

# construct degree matrix 
D = np.diag(np.sum(S, axis=0))
D_inv = scipy.linalg.inv(D)
D_inv_sqrt = scipy.linalg.sqrtm(D_inv)

# construct graph laplacian L = D - S and analogs
L_1 = D - S
L_2 = np.eye(p) - S * D_inv
L_3 = np.eye(p) - D_inv_sqrt * S * D_inv_sqrt

print(np.eye(2))
# show laplacians
fig = plt.figure()
plt.imshow(L_1, cmap=plt.get_cmap('PiYG'))
fig = plt.figure()
plt.imshow(L_2, cmap=plt.get_cmap('PiYG'))
fig = plt.figure()
plt.imshow(L_3, cmap=plt.get_cmap('PiYG'))


# eigendecomposition of each
E_1, V_1 = np.linalg.eig(L_1)
E_2, V_2 = np.linalg.eig(L_2)
E_3, V_3 = np.linalg.eig(L_3)

# normalization step
# ... TODO
print(V_1.shape)
print(E_1.shape)

# plotting direct
fig = plt.figure()
plt.plot(V_1[:,0], '-o')
plt.plot(V_1[:,1], '--o')
plt.plot(V_1[:,2], 'o')
plt.plot(V_1[:,3], 'o')
plt.plot(V_1[:,4], 'o')
plt.show()
fig = plt.figure()
plt.plot(V_1[0,:])
plt.plot(V_1[1,:])
plt.plot(V_1[2,:])
plt.plot(V_1[3,:])
plt.plot(V_1[4,:])
plt.show()

# r-dim embedding:
r = 2
embedding_1 = V_1[:,0:r]
embedding_2 = V_2[:,0:r]
embedding_3 = V_3[:,0:r]



fig = plt.figure()
for i in range(p):
    plt.plot(embedding_1[i,0], embedding_1[i,1], 'o')
    plt.text(embedding_1[i,0], embedding_1[i,1], i)

[[1. 0.]
 [0. 1.]]


<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

(15, 15)
(15,)


<IPython.core.display.Javascript object>

  return array(a, dtype, copy=False, order=order)


<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

In [151]:
# compare to scikit-learn
from sklearn import cluster, manifold
import time

colors = np.array([x for x in 'bgrcmykbgrcmykbgrcmykbgrcmyk'])
colors = np.hstack([colors] * 20)

spectral_cluster = cluster.SpectralClustering(n_clusters=2, eigen_solver='arpack', affinity="nearest_neighbors")

# predict cluster memberships
t0 = time.time()
spectral_cluster.fit(X)
t1 = time.time()
if hasattr(spectral_cluster, 'labels_'):
    y_pred = spectral_cluster.labels_.astype(np.int)
else:
    y_pred = spectral_cluster.predict(X)

# plot
if n == 2:
    plt.scatter(X[:, 0], X[:, 1], color=colors[y_pred].tolist(), s=10)
    plt.text(.99, .01, ('%.2fs' % (t1 - t0)).lstrip('0'),
             transform=plt.gca().transAxes, size=15,
             horizontalalignment='right')
if n == 3:
    fig = plt.figure()
    ax = fig.add_subplot(111, projection='3d')
    plt.scatter(X[:, 0], X[:, 1], zs=X[:, 2], color=colors[y_pred].tolist(), s=10)
plt.title('Spectral scikit', size=18)




<IPython.core.display.Javascript object>

Text(0.5,0.92,'Spectral scikit')

In [165]:
# compare vs embedding
spectral_embed = manifold.SpectralEmbedding(n_components=r, eigen_solver='arpack', affinity="rbf", gamma=1)

# predict cluster memberships
t0 = time.time()
X_transformed = spectral_embed.fit_transform(X)
t1 = time.time()
if hasattr(spectral_embed, 'labels_'):
    y_pred = spectral_embed.labels_.astype(np.int)

# plot
fig = plt.figure()
print(X_transformed.shape)
if r == 2:
    plt.plot(X_transformed[:, 0], X_transformed[:, 1], '--o', alpha=0.2)
    for i in range(p):
        plt.text(X_transformed[i,0], X_transformed[i,1], i, fontsize=9)
if r == 3:
    ax = fig.add_subplot(111, projection='3d')
    ax.plot(X_transformed[:, 0], X_transformed[:, 1], zs=X_transformed[:, 2])
    for i in range(p):
        ax.text(X_transformed[i,0], X_transformed[i,1], X_transformed[i,2], i, fontsize=9)
plt.title('Spectral embedding scikit', size=18)
plt.show()

<IPython.core.display.Javascript object>

(150, 2)


### Example 2: Community structure in cell type data <a id='linkExample2'></a>

In [None]:
# load memory data + simulation data
celltypes, genes, labels = 1
simulation_data = 1

# construct distance matrix from celltypes

# construct similarity matrix from celltypes

# plot

In [20]:
# spectral clustering
spectral_cluster = cluster.SpectralClustering(n_clusters=2, eigen_solver='arpack', affinity="rbf", gamma=2.0)

# spectral embedding
spectral_embed = manifold.SpectralEmbedding(n_components=2, eigen_solver='arpack', affinity="rbf", gamma=2.0)


NameError: name 'manifold' is not defined

### Resources <a id='linkResources'></a>
- On Spectral Clustering - analysis and algorithm (Ng, 2002, NIPS)
- A Tutorial on Spectral Clustering (von Luxburg, 2007)
- REVIEW Community detection in graphs (Fortunato, 2010, arxiv)
- Detecting communities in large networks (Cappoci, 2005, PhysicaA)

#### scikit-learn links
- https://scikit-learn.org/stable/modules/clustering.html#spectral-clustering
- https://scikit-learn.org/stable/modules/generated/sklearn.cluster.SpectralClustering.html#sklearn.cluster.SpectralClustering
- https://scikit-learn.org/stable/modules/manifold.html#spectral-embedding
- https://scikit-learn.org/stable/modules/generated/sklearn.manifold.SpectralEmbedding.html#sklearn.manifold.SpectralEmbedding