# Principal Component Analysis (PCA) Implementation from Scratch

PCA is a dimensionality reduction technique that transforms a high-dimensional dataset into a lower-dimensional subspace while preserving as much variance as possible.

## Mathematical Steps
1. **Standardize the data**: Subtract the mean from each feature.
2. **Compute Covariance Matrix**: Calculate how variables change together.
3. **Eigendecomposition**: Compute eigenvalues and eigenvectors of the covariance matrix.
4. **Sort and Select**: Rank eigenvectors by eigenvalues and keep the top $k$ components.
5. **Project**: Multiply the original data by the top $k$ eigenvectors.

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from pca_scratch import PCA
from sklearn.datasets import load_iris
%matplotlib inline

## Loading Data

In [None]:
iris = load_iris()
X = iris.data
y = iris.target
print("Original Shape:", X.shape)

## Reducing Dimensions

We reduce the 4-dimensional Iris data to 2 dimensions for visualization.

In [None]:
pca = PCA(n_components=2)
pca.fit(X)
X_projected = pca.transform(X)

print("Projected Shape:", X_projected.shape)

plt.figure(figsize=(8, 6))
plt.scatter(X_projected[:, 0], X_projected[:, 1], c=y, edgecolor='none', alpha=0.8, cmap='viridis')
plt.xlabel('Principal Component 1')
plt.ylabel('Principal Component 2')
plt.title('Iris Data Reduced to 2D via PCA')
plt.colorbar()
plt.show()