## Formula Sheet for Referencing

In [4]:
# Library Import
import numpy as np
import scipy
import sklearn

## Adjacency Matrix, A

Refer to Lecture 02: Graph Science, Slide 19. 

* data: Value of Edge I and Edge J. Binary Weights: $A_{ij} \in {0, 1}$. Weighted Weights: $A_{ij} \in [0, 1]$ where weights are normalized in range of $[0, 1]$
* row_idx: Vector representing the node index of the starting edge i, i.e. the row index of W
* col_idx: Vector representing the node index of the ending edge j, i.e. the column index of W
* shape: Shape of A = (n, m) where n, m = |V| = # of Vertices

In [None]:
# Adjacency Matrix, A
A = scipy.sparse.csr_matrix((data, (row_idx, col_idx)), shape=(n, m))# Size of A is Vertices x Vertices

## Degree Matrix, D

Refer to Lecture 02: Graph Science. Slide 30

* D = $diag(d_{1},...,d_{n})$ <- Diagonal Matrix of size N x N where N = |V|, V = Number of Vertices
* Value of $d_{i} = \sum_{j}A_{ij}$ <- Summation of all values of Adjacency Matrix, A's $row_{i}$

In [2]:
# Degree Matrix, D
# D = diag(d_1, ..., d_n) where d_i = summation of row A_ij

A = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
d = np.sum(A, axis=1)
D = np.diag(d)
print(D)

[[ 6  0  0]
 [ 0 15  0]
 [ 0  0 24]]


## Un-normalized Graph Laplacian, L

Refer to Lecture 02: Graph Science, Slide 30

* $L_{un} = D - A$ where D = Degree Matrix, A = Adjacency Matrix

In [None]:
L_un = D - A

## Normalized Graph Laplacian, L

Refer to Lecture 02: Graph Science, Slide 30
* L = $D^{-1/2}L_{un}D^{-1/2} = I_{n} - D^{-1/2}AD^{-1/2}$ where $I_{n}$= Identity Matrix of Size n, D = Degree Matrix, A = Adjacency Matrix

I = np.eye(n) # Create Identity Square Matrix
L = I - (D**-1/2).dot(A.dot(D ** -1/2)) # .dot = Dot Product

## Graph Spectrum

Refer to Lecture 02: Graph Science, Slide 31

* L = $U\wedge U^{T} \in \R^{n\times n}$
* U = $[u_{1},...,u_{n}] \in \R^{n\times n}$ where $u_n$ = Laplacian Eigenvectors
* $U^{T}U = I_{n}$ where $<u_{k},u_{k'}> = 
\begin{cases}
1 & \quad \text{k = k'}\\ 
0 & \quad \text{otherwise}
\end{cases}$
* $\wedge = diag(\lambda_{1},...,\lambda_{n}) \in \R^{n\times n}$ where $\lambda_{1} \leq \lambda_{2} .... \leq \lambda_{n} \leq 2$ and $\lambda_{min} = 0$ and $\lambda_{n}$ = Laplacian Eigenvalues

In [None]:
L = U.dot(lamb.dot(U.T)) # lamb = Lambda 

## Euclidean Distance

Refer to Lecture 02: Graph Science, Slide 38

* $d_{l_2}(x_i,x_j)$ = $||x_{i} - x_{j}||_{2} = \sqrt{\sum_{m=1}^{d}{|x_{i,m} - x_{j,m}|^{2}}}$

In [None]:
# Using Scikit Learn
D = sklearn.metrics.pairwise.pairwise_distances(X, metric='euclidean', n_jobs=1)

## Cosine Distance

Refer to Lecture 02: Graph Science, Slide 38

* $d_{cos}(x_i, x_j)$ = $| cos^{-1}(\dfrac{<x_i, x_j>}{||x_i||_2||x_j||_2})|$ = $|\theta_{ij}|$

In [None]:
# Cosine Distance
D = np.abs(np.dot(v, w) / (np.linalg.norm(v) * np.linalg.norm(w)))

# Cosine Distance using arccos <- This is mainly used in Lab Tutorials
D = np.abs(np.arccos(D))

# Standard Pre-Processing

Refer to Lecture 02: Graph Science. Slide 40

There are 4 types of preprocessing techniques:
* Center data along feature dimension a.k.a zero-mean property = $x_i \leftarrow x_i - mean(\{x_i\}) \in \R^d$

* Normalize data variance along feature dimension a.k.a z-scoring property = $x_i \leftarrow \dfrac{x_i}{std(\{x_i\})} \in \R^d, std(\{x_i\})^2 = mean(\{(x_i - mean({x_i}))^2\})$

* Project data on L2-sphere along feature dimension or data dimension = $x_i \leftarrow \dfrac{x_i}{||x_i||_2} \in \R^d$

* Normalize max and min of feature value = $x_i \leftarrow \dfrac{x_i - min(\{x_i\})}{max(\{x_i\}) - min(\{x_i\})} \in [0, 1]^d$

In [None]:
# Center Data along feature dimension
x = x - np.mean(x, axis=0)

# Normalize data variance along feature dimension
x = x / np.std(x) # From NumPy Definition: np.std = sqrt(mean(x)), x = abs(x - x.mean()) ** 2

# Project data on L2-sphere
x =  x / scipy.linalg.norm(x, 2)

# Normalize max and min of feature value
x = (x - np.min(x)) / (np.max(x) - np.min(x))

## 