# Principal Component Analysis (PCA)

## Definition

**Principal Component Analysis (PCA)** is a powerful dimensionality reduction technique widely used in data analysis, machine learning, and statistics. It transforms high-dimensional data into a lower-dimensional form while retaining most of the variance in the original dataset.

---

## Why PCA?

### Key Motivations:

1. **Curse of Dimensionality**: High-dimensional data is computationally expensive and difficult to visualize.
2. **Noise Reduction**: PCA helps eliminate redundant features and noise.
3. **Feature Extraction**: It identifies the most significant patterns in the data.

---

## Applications of PCA

- **Image Compression**: Reducing the dimensionality of image data while preserving visual quality
- **Stock Market Trend Analysis**: Identifying key factors driving market movements
- **Genetics**: Analyzing gene expression data and identifying patterns in genetic variation

---

## Mathematical Foundations of PCA

PCA relies on **linear algebra** concepts, particularly **eigenvalue decomposition** of the covariance matrix.

### Steps in PCA:

1. **Standardize data**
2. **Compute covariance matrix**
3. **Find eigenvalues & eigenvectors**
4. **Select top-k principal components**
5. **Project data onto these components**

---

## Detailed Steps with Formulas

### Step 1: Standardize the Data

**Purpose**: Ensure all features have the same scale to prevent features with larger magnitudes from dominating the analysis.

**Formula**:
$$X_{\text{std}} = \frac{X - \mu}{\sigma}$$

where:
- $X$ is the original data
- $\mu$ is the mean of each feature
- $\sigma$ is the standard deviation of each feature

---

### Step 2: Compute the Covariance Matrix

**Purpose**: Measure how features vary together to understand relationships between variables.

**Formula**:
$$C = \frac{1}{n-1} X^T X$$

where:
- $C$ is the covariance matrix (d × d for d features)
- $X$ is the centered data matrix (n × d)
- $n$ is the number of samples

---

### Step 3: Find Eigenvalues & Eigenvectors

**Purpose**: Identify the directions (eigenvectors) of maximum variance and their corresponding magnitudes (eigenvalues).

**Formula**:
$$C v = \lambda v$$

where:
- $C$ is the covariance matrix
- $v$ is an eigenvector (principal component direction)
- $\lambda$ is the corresponding eigenvalue (variance along that direction)

---

### Step 4: Select Top-k Principal Components

**Purpose**: Choose the most important components based on explained variance.

**Explained Variance Ratio**:
$$\text{Explained Variance} = \frac{\lambda_i}{\sum_{j=1}^{d} \lambda_j}$$

Sort eigenvalues in descending order and select the top k eigenvectors.

---

### Step 5: Project Data onto Principal Components

**Purpose**: Transform the original data into the new lower-dimensional space.

**Formula**:
$$X_{\text{PCA}} = X \cdot V_k$$

where:
- $X_{\text{PCA}}$ is the transformed data (n × k)
- $X$ is the standardized original data (n × d)
- $V_k$ is the matrix of top k eigenvectors (d × k)

---

## Matrix Decomposition

Matrix decomposition techniques are fundamental to understanding PCA, as they provide the mathematical foundation for extracting principal components.

### 1. Eigen-Decomposition:

- Decomposes a square matrix **A** into:

$$A = Q\Lambda Q^{-1}$$

where:
- $Q$ = Matrix of eigenvectors.
- $\Lambda$ = Diagonal matrix of eigenvalues.

**Use Case**: Solving systems of linear differential equations.

---

### 2. Singular Value Decomposition (SVD):

- Decomposes any matrix **A** (m×n) into:

$$A = U\Sigma V^T$$

where:
- $U$ = Left singular vectors (eigenvectors of $AA^T$).
- $\Sigma$ = Diagonal matrix of singular values.
- $V$ = Right singular vectors (eigenvectors of $A^T A$).

---

## Understanding Variance Ratio in Detail

### What is Variance Ratio?

A **variance ratio** is a statistical measure that compares two variances, most commonly in the context of a variance ratio F-test, which tests if the variances of two populations are equal or not by dividing the larger sample variance by the smaller one and comparing it to a critical value from the F-distribution. It can also refer to the proportion of total variance explained by each principal component in a PCA or a measure of the dispersion of events in a statistical distribution.

### In Principal Component Analysis (PCA)

**Purpose**:
- To determine the optimal number of dimensions needed to explain the variance in a dataset.

**Calculation**:
- It is the ratio of the variance explained by a specific principal component to the total variance of the dataset.

**Application**:
- By analyzing the variance ratio for each principal component, one can select the components that contribute most to the overall variation in the data.

---

## Advantages of PCA

1. **Dimensionality Reduction**: Simplifies models without significant information loss
2. **Noise Reduction**: Filters out irrelevant features
3. **Visualization**: Enables 2D/3D plotting of high-dimensional data
4. **Uncorrelated Features**: Principal components are orthogonal (uncorrelated)

---

## Limitations of PCA

1. **Linear Assumption**: PCA may fail with nonlinear relationships (Kernel PCA is an alternative)
   - **Linear relationship**: Can be represented mathematically by a linear equation, such as $y = mx + b$
   - **Nonlinear relationships**: Relationships between two variables that cannot be described by a straight line. Instead, they may follow a curve or some other pattern

2. **Interpretability**: Principal components may not have clear real-world meaning

3. **Sensitive to Scaling**: Features must be standardized

---

# Practical Examples

## Example 1: Simple PCA Demonstration

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler

# Sample data
X = np.random.rand(100, 5)  # 100 samples, 5 features

# Step 1: Standardize
scaler = StandardScaler()
X_std = scaler.fit_transform(X)

# Step 2: Apply PCA
pca = PCA(n_components=2)  # Reduce to 2D
X_pca = pca.fit_transform(X_std)

# Explained variance
print("Explained Variance Ratio:", pca.explained_variance_ratio_)
print("Total Explained Variance:", sum(pca.explained_variance_ratio_))

# Plot
plt.figure(figsize=(8, 6))
plt.scatter(X_pca[:, 0], X_pca[:, 1])
plt.xlabel('PC1')
plt.ylabel('PC2')
plt.title('PCA Projection')
plt.show()

## Example 2: Eigenvalue Decomposition

In [None]:
import numpy as np

# Create a square matrix
A = np.array([[4, 2, 1],
              [2, 5, 3],
              [1, 3, 6]])

print("Original matrix A:")
print(A)

# Perform eigen decomposition
eigenvalues, eigenvectors = np.linalg.eig(A)

print("\nEigenvalues:")
print(eigenvalues)

print("\nEigenvectors (columns are eigenvectors):")
print(eigenvectors)

# Create diagonal matrix of eigenvalues
Lambda = np.diag(eigenvalues)

print("\nDiagonal matrix of eigenvalues (Λ):")
print(Lambda)

# Reconstruct A using A = QΛQ⁻¹
Q = eigenvectors
Q_inv = np.linalg.inv(Q)
A_reconstructed = Q @ Lambda @ Q_inv

print("\nReconstructed matrix A = QΛQ⁻¹:")
print(A_reconstructed)

print("\nVerification - Max absolute error:", np.max(np.abs(A - A_reconstructed)))

## Example 3: Singular Value Decomposition (SVD)

In [None]:
import numpy as np

A = np.random.rand(4, 3)  # Example matrix

# Using numpy
U, S, Vt = np.linalg.svd(A, full_matrices=False)

print("U (left singular vectors):\n", U)
print("\nS (singular values):\n", S)
print("\nVt (right singular vectors transposed):\n", Vt)

# Create diagonal matrix from singular values
Sigma = np.diag(S)

# Verify: A ≈ U @ Sigma @ Vt
print("\nVerification (A ≈ U @ Sigma @ Vt):", np.allclose(A, U @ Sigma @ Vt))

## Applications of Matrix Decomposition

Matrix decomposition techniques like Eigen-Decomposition and SVD have numerous real-world applications:

### Key Applications:

1. **Image Compression (JPEG)**: SVD is used to compress images by keeping only the most significant singular values, reducing storage requirements while maintaining visual quality.

2. **Natural Language Processing (Latent Semantic Analysis)**: Matrix decomposition helps identify latent patterns and relationships in text data, enabling better document similarity and topic modeling.

3. **Recommender Systems (Netflix, Amazon)**: SVD is fundamental to collaborative filtering algorithms that power recommendation engines, predicting user preferences based on historical data.

4. **Medical Imaging (MRI)**: SVD helps in noise reduction and efficient storage of medical images, improving diagnostic capabilities while reducing storage costs.

---

# Hands-On PCA Problems

## Problem 1: Basic PCA with NumPy

**Objective**: Implement PCA from scratch using NumPy on a synthetic dataset.

**Steps**:
1. Center the data
2. Compute covariance matrix
3. Perform eigendecomposition
4. Sort eigenvalues and eigenvectors
5. Project data to principal components

In [None]:
import numpy as np

# Generate synthetic data
np.random.seed(42)
mean = [0, 0]
cov = [[1, 0.8], [0.8, 1]]
X = np.random.multivariate_normal(mean, cov, 100)

# 1. Center the data
X_centered = X - np.mean(X, axis=0)

# 2. Compute covariance matrix
cov_matrix = np.cov(X_centered.T) # Same as np.cov(X_centered, rowvar=False)

# 3. Perform eigendecomposition
eigenvalues, eigenvectors = np.linalg.eig(cov_matrix)

# 4. Sort eigenvalues and eigenvectors
sorted_idx = np.argsort(eigenvalues)[::-1]
eigenvalues = eigenvalues[sorted_idx]
eigenvectors = eigenvectors[:, sorted_idx]

# 5. Project data to principal components
principal_components = X_centered.dot(eigenvectors)

print("Explained variance:", eigenvalues)
print("Principal components shape:", principal_components.shape)
print("\nEigenvectors (Principal Component Directions):")
print(eigenvectors)

## Problem 2: PCA with scikit-learn

**Objective**: Use scikit-learn's PCA on the Iris dataset and visualize the results.

**Steps**:
1. Perform PCA with 2 components
2. Plot the transformed data
3. Print explained variance ratio

In [None]:
from sklearn.datasets import load_iris
from sklearn.decomposition import PCA
import matplotlib.pyplot as plt

iris = load_iris()
X = iris.data
y = iris.target

# 1. Perform PCA with 2 components
pca = PCA(n_components=2)
X_pca = pca.fit_transform(X)

# 2. Plot the transformed data
plt.figure(figsize=(10, 7))
scatter = plt.scatter(X_pca[:, 0], X_pca[:, 1], c=y, cmap='viridis', s=50, alpha=0.7, edgecolors='k')
plt.xlabel('First Principal Component', fontsize=12)
plt.ylabel('Second Principal Component', fontsize=12)
plt.title('PCA of Iris Dataset', fontsize=14, fontweight='bold')

# Convert handles to a list
handles, labels = scatter.legend_elements()
plt.legend(handles=list(handles), labels=list(iris.target_names), title="Species")
plt.show()

# 3. Print explained variance ratio
print("Explained variance ratio:", pca.explained_variance_ratio_)
print("Total explained variance:", sum(pca.explained_variance_ratio_))

## Problem 3: PCA for Dimensionality Reduction

**Objective**: Apply PCA for dimensionality reduction on the Digits dataset and evaluate classification performance.

**Steps**:
1. Perform PCA keeping 95% of variance
2. Transform train and test data
3. Train logistic regression on original and reduced data
4. Compare accuracy

In [None]:
from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score

digits = load_digits()
X = digits.data
y = digits.target

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# 1. Perform PCA keeping 95% of variance
pca = PCA(n_components=0.95)
pca.fit(X_train)

# 2. Transform train and test data
X_train_pca = pca.transform(X_train)
X_test_pca = pca.transform(X_test)

print("Original shape:", X_train.shape)
print("Reduced shape:", X_train_pca.shape)
print(f"Number of components selected: {pca.n_components_}")
print(f"Total variance explained: {sum(pca.explained_variance_ratio_):.4f}")

# 3. Train logistic regression on original and reduced data
# Original data
lr_original = LogisticRegression(max_iter=1000, random_state=42)
lr_original.fit(X_train, y_train)
y_pred_original = lr_original.predict(X_test)

# Reduced data
lr_pca = LogisticRegression(max_iter=1000, random_state=42)
lr_pca.fit(X_train_pca, y_train)
y_pred_pca = lr_pca.predict(X_test_pca)

# 4. Compare accuracy
acc_original = accuracy_score(y_test, y_pred_original)
acc_pca = accuracy_score(y_test, y_pred_pca)

print(f"\nAccuracy with original data: {acc_original:.4f}")
print(f"Accuracy with PCA-reduced data: {acc_pca:.4f}")
print(f"Dimensionality reduction: {X_train.shape[1]} → {X_train_pca.shape[1]} features")

## Problem 4: PCA for Image Compression

**Objective**: Use PCA to compress a grayscale image and analyze the reconstruction error.

**Steps**:
1. Perform PCA with increasing number of components
2. Reconstruct image for each case
3. Calculate and plot reconstruction error (MSE)
4. Display reconstructed images

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.decomposition import PCA
from sklearn.datasets import load_sample_image

# Load sample image
china = load_sample_image("china.jpg")
X = np.mean(china, axis=2) # Convert to grayscale
X = X / 255.0  # Scale to [0, 1]

# Flatten the image
X_flat = X.reshape(-1, X.shape[1])

# Try different numbers of components
n_components = [1, 5, 10, 20, 50, 100, 200]
mse_values = []

plt.figure(figsize=(15, 10))
plt.subplot(3, 3, 1)
plt.imshow(X, cmap='gray')
plt.title("Original", fontsize=12, fontweight='bold')
plt.axis('off')

for i, n in enumerate(n_components):
    # Perform PCA
    pca = PCA(n_components=n)
    X_pca = pca.fit_transform(X_flat)
    X_reconstructed = pca.inverse_transform(X_pca)

    # Calculate MSE
    mse = np.mean((X_flat - X_reconstructed) ** 2)
    mse_values.append(mse)

    # Display reconstructed image
    plt.subplot(3, 3, i+2)
    plt.imshow(X_reconstructed.reshape(X.shape), cmap='gray')
    plt.title(f"n={n}\nMSE={mse:.5f}", fontsize=10)
    plt.axis('off')

plt.tight_layout()
plt.show()

# Plot MSE vs n_components
plt.figure(figsize=(10, 6))
plt.plot(n_components, mse_values, 'o-', linewidth=2, markersize=8)
plt.xlabel('Number of Components', fontsize=12)
plt.ylabel('Reconstruction MSE', fontsize=12)
plt.title('PCA Image Compression: MSE vs Number of Components', fontsize=14, fontweight='bold')
plt.grid(True)
plt.show()

---

# Summary & Key Takeaways

## Comprehensive Overview of Key Concepts

| Concept | Definition | Application |
|---------|-----------|-------------|
| **Eigenvalues** | Scalars that scale eigenvectors | Stability analysis, Google PageRank |
| **Eigenvectors** | Non-zero vectors scaled by eigenvalues | Facial recognition, Quantum mechanics |
| **PCA** | Dimensionality reduction using eigenvalues | Finance, Image compression |
| **Matrix Decomposition** | Breaking matrices into simpler forms (SVD, Eigen-decomposition) | MRI, Recommender systems |

### Key Points to Remember:

- **PCA** is a powerful tool for reducing dimensionality while preserving most of the variance in data
- **Eigenvalues** and **eigenvectors** are fundamental to understanding how PCA works
- **Variance ratio** helps determine how many principal components to retain
- **Matrix decomposition** techniques (Eigen-decomposition and SVD) have wide-ranging applications across multiple domains
- Always **standardize** your data before applying PCA to ensure all features contribute equally

---