In [1]:
import random
import numpy as np
import pandas as pd
import sklearn as skl

## Matrix Decompositions

### Spectral Decomposition

Given a **symmetric matrix** $A$, we can decompose into the form $A = P\Lambda P^T$, where $P$ is a matrix whose columns are **eigenvectors**, and $\Lambda$ is a diagonal matrix, whose diagonals contain **eigenvalues**, in descending order from left to right.  
This is often called the **spectral decomposition**.

Note, not entirely sure if symmetry is required, or if it's coincidental, and we're just currently thinking about our symmetric matrix.

All **symmetric** matrices are square; suppose $A$ is $n \times n$.
Then $P$ and $\Lambda$ are $n \times n$.

### Eigenvectors and Eigenvalues

The **eigenvectors** and **eigenvalues** have some useful properties.

**eigenvector** properties:
- All **eigenvectors** $\vec{e}$ are normalized
- All **eigenvectors** are mutually orthogonal

Sum over all eigenvalues gives us the total variance of the matrix, I guess.
And taking an eigenvalue over the sum gives us the "importance" of that component.
We refer to the eigenvectors that make up the columns of $P$ as components.  

### Singular Value Decomposition

Similar to the spectral decomposition, we can perform a decomposition on a general matrix $A$, with dimensions $n \times p$.
That is, matrices that are not necessarily symetric (or positive or definite).
In this case, we have $A = P \Lambda Q'$, where $P$ and $Q$ are not necessarily equal.

However, if we compute a correlation matrix (maybe you could do covariance too, not quite sure) from general $A$, then the spectral decomposition of that symmetric matrix will have the same $P$ and $\Lambda$ as the singular value decomposition.

So, after performing SVD, we have $\Lambda$ with **eigenvalues** in descending order.
The first component, then, is the most "important."
It best represents the variability within the original matrix $A$.
We can then take the first **eigenvector**, $\vec{e_1}$, the first **eigenvalue** $e_1$, and a row vector of $\vec{e_1}$, which we could denote as $\vec{e_1}^T$.
Then $\vec{e_1} \cdot e_1 \cdot \vec{e_1}^T$ is an approximation of the original $A$.

We can instead take, say, the first two **eigenvectors** and **eigenvalues** of the decomposition, and get a better approximation of $A$, and so forth.

Also note, initially each variable in original matrix is equivalent to an eigenvalue of 1, so an eigenvalue > 1 indicates the corresponding eigenvector gives more information than an original variable would.

In [14]:
mat = pd.DataFrame([[random.random() * 10 for j in range(5)] for i in range(5)])
print(f"Original matrix:\n{mat}\n")

correlations = mat.corr()
print(f"Correlation matrix:\n{correlations}\n")

covariances = mat.cov()
print(f"Covariance matrix:\n{covariances}\n")

mat_svd = np.linalg.svd(mat)
corr_svd = np.linalg.svd(correlations)
cov_svd = np.linalg.svd(covariances)
print(
f"""The matrix svd:
{mat_svd[0]}\n{mat_svd[1]}\n{mat_svd[2]}
The correlation svd:
{corr_svd[0]}\n{corr_svd[1]}\n{corr_svd[2]}\n
The covariance svd:
{cov_svd[0]}\n{cov_svd[1]}\n{cov_svd[2]}\n"""
)

mat_eig = np.linalg.eig(mat)
corr_eig = np.linalg.eig(correlations)
print(f"The matrix eig:\n{mat_eig[0]}\n{mat_eig[1]}\nThe correlation eig:\n{corr_eig[0]}\n{corr_eig[1]}\n")

Original matrix:
          0         1         2         3         4
0  5.382380  7.096721  6.040735  5.291639  9.525496
1  0.365113  0.972874  9.297729  6.265712  5.637460
2  7.186884  8.494899  8.472400  5.064667  7.765232
3  1.071014  6.345808  7.561702  7.595883  9.517038
4  9.241452  9.527494  6.503701  0.726309  0.094821

Correlation matrix:
          0         1         2         3         4
0  1.000000  0.854053 -0.534764 -0.854091 -0.514550
1  0.854053  1.000000 -0.663922 -0.584032 -0.206453
2 -0.534764 -0.663922  1.000000  0.454595  0.086395
3 -0.854091 -0.584032  0.454595  1.000000  0.864872
4 -0.514550 -0.206453  0.086395  0.864872  1.000000

Covariance matrix:
           0          1         2         3          4
0  14.805748  10.908341 -2.774997 -8.489727  -7.771384
1  10.908341  11.018384 -2.972083 -5.008063  -2.689898
2  -2.774997  -2.972083  1.818737  1.583739   0.457330
3  -8.489727  -5.008063  1.583739  6.673422   8.769632
4  -7.771384  -2.689898  0.457330  8.769632