# Principal Component Analysis (PCA) - Mathematical Foundation

## Objective
PCA is a **dimensionality reduction** technique that transforms a dataset to a lower-dimensional space while preserving **as much variance as possible**.

---

## Step-by-Step Mathematical Derivation

Let:  
- $X \in \mathbb{R}^{n \times d}$ be the dataset, where:
  - $n$ = number of samples  
  - $d$ = number of features (dimensions)

---

### 1. Center the Data

We subtract the mean of each feature to center the data:

- Compute the mean:
  $$
  \mu = \frac{1}{n} \sum_{i=1}^{n} X_i
  $$

- Center the data:
  $$
  \tilde{X} = X - \mu
  $$

---

### 2. Compute the Covariance Matrix

The covariance matrix captures pairwise feature dependencies:

$$
\Sigma = \frac{1}{n - 1} \tilde{X}^T \tilde{X} \in \mathbb{R}^{d \times d}
$$

Each entry $\Sigma_{ij}$ represents the covariance between features $i$ and $j$.

---

### 3. Eigen-Decomposition

We compute the eigenvalues and eigenvectors of the covariance matrix:

$$
\Sigma v_k = \lambda_k v_k
$$

Where:
- $v_k \in \mathbb{R}^{d}$ is the $k$-th eigenvector (principal axis)  
- $\lambda_k \in \mathbb{R}$ is the $k$-th eigenvalue (variance along $v_k$)

The eigenvectors form an orthonormal basis:

$$
V = [v_1, v_2, \dots, v_d] \quad \text{such that} \quad V^T V = I
$$

---

### 4. Sort Eigenvalues and Select Top $k$

Sort eigenvalues in decreasing order:

$$
\lambda_1 \geq \lambda_2 \geq \dots \geq \lambda_d
$$

Select the top $k$ eigenvectors to form the projection matrix:

$$
V_k = [v_1, v_2, \dots, v_k] \in \mathbb{R}^{d \times k}
$$

---

### 5. Project the Data

To get the reduced representation $Z \in \mathbb{R}^{n \times k}$:

$$
Z = \tilde{X} V_k
$$

Each row of $Z$ is a projection of the corresponding sample from the original space to the new $k$-dimensional space.

---

## Summary

- **Goal**: Find a lower-dimensional subspace that captures the maximum variance in the data.
- **Variance Maximization**: PCA chooses directions (principal components) where data varies the most.
- **Uncorrelated Features**: The resulting components are orthogonal (statistically uncorrelated).
- **Dimensionality Reduction**: We keep only the top $k$ eigenvectors to reduce from $d$ dimensions to $k$.

---

## Example

If your data is in $\mathbb{R}^3$ and you apply PCA with $k = 2$, then:

- The data is projected from 3D to 2D: $\mathbb{R}^3 \rightarrow \mathbb{R}^2$
- The 2D projection retains the directions of **maximum variance** in the original data.

## Import libraries

In [None]:
import numpy as np

In [None]:
class PCA:
    """
    Principal Component Analysis (PCA) for dimensionality reduction.

    This class implements PCA using eigen-decomposition of the covariance matrix.
    It projects the data to a lower-dimensional subspace that captures the maximum variance.

    Attributes:
        k_features (int): Number of principal components (dimensions) to retain.
        best_vectors (np.ndarray): Matrix of top-k eigenvectors (principal components) found during fitting.
    """

    def __init__(self, k_features):
        """
        Initializes the PCA model.

        Args:
            k_features (int): The number of principal components to keep.
        """
        self.k_features = k_features
        self.best_vectors = None

    def fit(self, X: np.ndarray):
        """
        Fits the PCA model to the data by computing the top-k eigenvectors.

        Args:
            X (np.ndarray): Input data of shape (n_samples, n_features).
                           Each row corresponds to a sample and each column to a feature.

        Process:
            - Centers the data by subtracting the mean.
            - Computes the covariance matrix.
            - Performs eigen-decomposition.
            - Selects the top-k eigenvectors based on largest eigenvalues.
        """
        X_meaned = X - X.mean(axis=0)
        cov_matrix = np.cov(X_meaned, rowvar=False)
        eighvalues, eighvectors = np.linalg.eigh(cov_matrix)
        best_values = np.argsort(eighvalues)[::-1]
        self.best_vectors = eighvectors[:, best_values][:, :self.k_features]

    def transform(self, X: np.ndarray):
        """
        Projects the data onto the top-k principal components.

        Args:
            X (np.ndarray): Input data of shape (n_samples, n_features).

        Returns:
            np.ndarray: Transformed data of shape (n_samples, k_features).
        """
        return X @ self.best_vectors

    def fit_transform(self, X: np.ndarray):
        """
        Fits the model and transforms the data in one step.

        Args:
            X (np.ndarray): Input data of shape (n_samples, n_features).

        Returns:
            np.ndarray: Transformed data of shape (n_samples, k_features).
        """
        self.fit(X)
        return self.transform(X)