## Principle Component Analysis (PCA)

Given a dataset with multiple features, PCA is a technique used to reduce the dimensionality of the data while preserving as much variance as possible. It transforms the original features into a new set of features (principal components) that are orthogonal to each other.

For example take data matrix 
$$ D= 
    \begin{pmatrix} 
        1 & 2 & 1 \\
        0 & 1 & 2 \\
        3 & 1 & 2 \\
        3 & 3 & 1 \\
    \end{pmatrix} $$

We need to compute the first principle component PC1 and the projection of the data onto this component. To do this we follow these steps:
1. **Standardize the Data**: Subtract the mean of each feature from the data.
2. **Compute Eigenvalues and Eigenvectors**: Find the eigenvalues and eigenvectors of the covariance matrix.
3. **Select Principal Components**: Choose the top k eigenvectors corresponding to the k largest eigenvalues.
4. **Project the Data**: Project the standardized data onto the selected principal components.


In [2]:
import numpy as np

D = np.array([[1,2,1],[0,1,2],[3,1,2],[3, 3,1]])

reduced_dimension = 1

In [10]:
# Step 1
mean = D.mean(axis=0)
D_centered = D - mean

# Step 2
U, S, Vt = np.linalg.svd(D_centered, full_matrices=False)

# Step 3
X = Vt[:reduced_dimension].T

# Step 4
projected = D_centered @ X

# Optional: Comput Sample Variance
var = np.var(projected, ddof=1)

X, projected, var

(array([[ 0.91168524],
        [ 0.38001702],
        [-0.15625968]]),
 array([[-0.51062983],
        [-1.95859178],
        [ 0.77646395],
        [ 1.69275766]]),
 np.float64(2.521716446805694))