In [1]:
import numpy as np

# Description of PCA
---


**The principle**

We have input data matrix X of dimension $R^{m*p}$:
* each of the m rows is a sample
* each of the p columns is a feature

We look for a transformation W of dimension $R^{p*l}$ where l is typically lower than p:
* The new data will be $T = XW$ of dimension $R^{m*l}$
* The new features will be ordered by decreasing variance along the dimension

We can see this matrix W as a set of $l$ vectors $w_j$ of dimension $p$. Each feature for each sample $i$ is obtained by multiplying the sample $x_i$ with $w_j$. So basically: $T_{i,j} = x_i w_j$.


**How we proceed**

The variance along the axis k is by definition $\sum_{i=1}^m (t_{i,k} - \bar{t}_k)^2$ where:

* $i$ runs through all the samples
* $\bar{t}_k$ is the expected value of $t_{i,k}$ on all samples $i$

We can rework this expression (and considering that the data is centered in zero):

$\sum_{i=1}^m (t_{i,k})^2 = \sum_{i=1}^m (x_i w_k)^2 = ||X w_k||^2 = w_k^T X^T X w_k$.

This is the Rayleigh quotient, and to maximize it, we need to find the eigenvalues and eigenvectors for the matrix $X^T X$ and order them from the highest to the smallest eigenvalues.

We then pick the a sub-selection of $l < p$ eigenvectors of size $p$ to form the matrix $W$ for dimension $R^{p*l}$.


**Important notes**

* the covariance only find linear (not even affine) relationships (we will see example after)
* the data needs to be centered in zero for each dimension individually in order to work

# Simple example (workable by hand)
---

The notations here:

* X represents the data set $R^{m*p}$
* each of the $m$ row of X is a sample
* each of the $p$ column of X is a feature

In [53]:
# 3 samples, with clearly correlated features
X = np.array([
    [-1., 2.],
    [ 0., 0.],
    [ 1, -2.]])

print("Covariance matrix:")
cov = X.T @ X
print(cov)

# Example of eigenvector
# Just solve the 2 equations with two unknowns:
#   2x - 4y = a * x
#  -4x + 8y = a * y
# Which implies that: a * (y + 2 x) = 0
# Which implies that: y = - 2 x (or else null eigenvalue)
v = np.array([1, -2]) / np.sqrt(5)
print("* eigen vector:", v)
print("* eigen value:", (cov @ v) / v)

# We can use NUMPY to find the eigenvalues and eigen vectors
eigen_values, eigen_vectors = np.linalg.eig(cov)

# Just keep the highest
descending_indices = np.argsort(eigen_values)[::-1]
W = eigen_vectors[:,descending_indices[:1]]
print("\nTransformation:")
print(W)

# Use this to reduce the new data
print("\nNew data:")
T = X @ W
print(T)

Covariance matrix:
[[ 2. -4.]
 [-4.  8.]]
* eigen vector: [ 0.4472136  -0.89442719]
* eigen value: [10. 10.]

Transformation:
[[ 0.4472136 ]
 [-0.89442719]]

New data:
[[-2.23606798]
 [ 0.        ]
 [ 2.23606798]]


# Kernel PCA
---

# Auto-encoders
---