## Principal Component Analysis (PCA)

### Overview

Principal Component Analysis (PCA) is a powerful statistical technique used for dimensionality reduction in machine learning and data science. It transforms high-dimensional data into a lower-dimensional form while preserving as much variability (information) as possible.

### Mathematical Foundations

#### 1. **Data Standardization**
The first step in PCA is to standardize the data. Standardization involves subtracting the mean and dividing by the standard deviation for each feature.

Given a dataset $X$ with $n$ observations and $p$ features:
$$ X_{std} = \frac{X - \mu}{\sigma} $$
where $\mu$ is the mean vector and $\sigma$ is the standard deviation vector of the dataset.

#### 2. **Covariance Matrix Computation**
Next, compute the covariance matrix of the standardized data to understand how variables are correlated with each other.
$$ \Sigma = \frac{1}{n-1} X_{std}^T X_{std} $$
The covariance matrix $\Sigma$ is a $p \times p$ matrix, where each element $\sigma_{ij}$ represents the covariance between feature $i$ and feature $j$.

#### 3. **Eigenvalues and Eigenvectors**
Calculate the eigenvalues and eigenvectors of the covariance matrix. The eigenvalues ($\lambda_i$) give the magnitude of the variance in the direction of the corresponding eigenvector ($v_i$).

$$ \Sigma v_i = \lambda_i v_i $$

#### 4. **Sorting Eigenvalues and Eigenvectors**
Sort the eigenvalues in descending order and arrange the corresponding eigenvectors to form a new basis. The eigenvectors corresponding to the largest eigenvalues capture the most variance in the data.

#### 5. **Projection onto Principal Components**
Select the top $k$ eigenvectors (principal components) and project the standardized data onto these vectors.

$$ X_{PCA} = X_{std} W_k $$
where $W_k$ is a $p \times k$ matrix containing the top $k$ eigenvectors.

### Example

Suppose we have a dataset with two features: height and weight. Our goal is to reduce this to a single dimension.

1. **Standardize the Data**

    | Height (cm) | Weight (kg) |
    |-------------|-------------|
    | 170         | 70          |
    | 180         | 80          |
    | 160         | 60          |

    After standardizing:

    | Height (std) | Weight (std) |
    |--------------|--------------|
    | -0.267       | -0.267       |
    | 1.069        | 1.069        |
    | -0.801       | -0.801       |

2. **Compute Covariance Matrix**

    $$
    \Sigma = \begin{bmatrix}
    1.0 & 1.0 \\
    1.0 & 1.0
    \end{bmatrix}
    $$

3. **Compute Eigenvalues and Eigenvectors**

    Eigenvalues: $\lambda_1 = 2, \lambda_2 = 0$

    Eigenvectors: $v_1 = \frac{1}{\sqrt{2}}[1, 1]^T, v_2 = \frac{1}{\sqrt{2}}[-1, 1]^T$

4. **Projection**

    Project the data onto the first principal component $v_1$:

    $$
    X_{PCA} = X_{std} \begin{bmatrix} \frac{1}{\sqrt{2}} \\ \frac{1}{\sqrt{2}} \end{bmatrix}
    $$

    The result is a one-dimensional representation of the original data.

### When to Use PCA

- **High-dimensional data**: When dealing with datasets with many features, PCA helps in reducing the dimensionality.
- **Data visualization**: PCA can reduce data to 2D or 3D for visualization purposes.
- **Noise reduction**: PCA can help in removing noise from the data by retaining only the principal components with significant variance.

### How to Use PCA

1. **Standardize the data** to ensure each feature contributes equally.
2. **Compute the covariance matrix** of the standardized data.
3. **Calculate eigenvalues and eigenvectors** of the covariance matrix.
4. **Sort eigenvalues and select the top k eigenvectors**.
5. **Transform the data** by projecting it onto the selected eigenvectors.

### Advantages

- **Simplifies the complexity** of high-dimensional data.
- **Improves computational efficiency** by reducing the number of dimensions.
- **Helps in data visualization** by reducing data to 2D or 3D.
- **Removes correlated features**, thus reducing multicollinearity.

### Disadvantages

- **Loss of interpretability**: Principal components are linear combinations of original features, which may not be easily interpretable.
- **Not suitable for non-linear data**: PCA assumes linear relationships among variables.
- **Sensitivity to scaling**: PCA requires standardized data to ensure each feature contributes equally.

### Assumptions

1. **Linearity**: PCA assumes linear relationships among variables.
2. **Large variances have important structure**: PCA assumes that components with larger variances represent more significant information.
3. **Orthogonality of principal components**: PCA assumes that principal components are orthogonal (uncorrelated).

### Conclusion

PCA is a versatile tool for dimensionality reduction, especially useful in high-dimensional datasets. By transforming data into a new basis defined by the principal components, PCA retains the most significant variance in fewer dimensions, aiding in visualization, noise reduction, and improving computational efficiency. However, it is important to be aware of its limitations and assumptions when applying it to real-world problems.