# Module 05: Linear Algebra for Machine Learning

**Difficulty**: ⭐⭐⭐ Advanced

**Estimated Time**: 90 minutes

**Prerequisites**: 
- Module 04: Linear Algebra Foundations
- Understanding of matrices, vectors, and matrix operations

## Learning Objectives

By the end of this notebook, you will be able to:
1. Understand eigenvalues and eigenvectors and their significance
2. Perform matrix decomposition using SVD
3. Apply Principal Component Analysis (PCA) for dimensionality reduction
4. Understand the mathematical foundation of PCA
5. Use PCA on real datasets
6. Interpret PCA results for data science applications

In [None]:
# Import necessary libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler
from sklearn.datasets import load_iris, load_wine

# Configure visualization
%matplotlib inline
plt.style.use('seaborn-v0_8-whitegrid')
sns.set_palette("husl")

# Set random seed
np.random.seed(42)

# Display options
np.set_printoptions(precision=4, suppress=True)
pd.set_option('display.precision', 4)

print("Setup complete!")

## 1. Eigenvalues and Eigenvectors

**Definition**: For a square matrix $, if there exists a non-zero vector $ec{v}$ and scalar $\lambda$ such that:

4390Aec{v} = \lambdaec{v}4390

then:
- $ec{v}$ is an **eigenvector** of $
- $\lambda$ is the corresponding **eigenvalue**

**Intuition**: Eigenvectors are special directions that don't change direction when the matrix transformation is applied - they only get scaled by $\lambda$.

**Why Important in ML:**
- PCA: Find principal components (eigenvectors of covariance matrix)
- PageRank: Dominant eigenvector of web link matrix
- Spectral clustering: Eigenvectors reveal data structure
- Neural networks: Eigenvalues affect training stability

In [None]:
# Example: Find eigenvalues and eigenvectors

A = np.array([[4, 2],
              [1, 3]])

print("Matrix A:")
print(A)

# Calculate eigenvalues and eigenvectors
eigenvalues, eigenvectors = np.linalg.eig(A)

print("\nEigenvalues:")
print(eigenvalues)

print("\nEigenvectors (columns):")
print(eigenvectors)

# Verify: A * v = λ * v for first eigenvector
v1 = eigenvectors[:, 0]
lambda1 = eigenvalues[0]

Av1 = A @ v1
lambda_v1 = lambda1 * v1

print(f"\nVerification for first eigenvector:")
print(f"A * v1 = {Av1}")
print(f"λ1 * v1 = {lambda_v1}")
print(f"Equal? {np.allclose(Av1, lambda_v1)}")

## 2. Singular Value Decomposition (SVD)

**SVD** decomposes any matrix {m 	imes n}$ into three matrices:

4390A = U \Sigma V^T4390

where:
- {m 	imes m}$: Left singular vectors (orthogonal)
- $\Sigma_{m 	imes n}$: Diagonal matrix of singular values
- ^T_{n 	imes n}$: Right singular vectors (orthogonal)

**Properties:**
- Works for any matrix (not just square!)
- Singular values in $\Sigma$ are non-negative
- Reveals the rank of the matrix
- Used in: PCA, image compression, recommender systems

In [None]:
# SVD example

A = np.array([[1, 2, 3],
              [4, 5, 6],
              [7, 8, 9],
              [10, 11, 12]])

print("Matrix A (4×3):")
print(A)

# Perform SVD
U, S, VT = np.linalg.svd(A, full_matrices=False)

print(f"\nU shape: {U.shape}")
print("U (left singular vectors):")
print(U)

print(f"\nS shape: {S.shape}")
print("Singular values:")
print(S)

print(f"\nV^T shape: {VT.shape}")
print("V^T (right singular vectors):")
print(VT)

# Reconstruct A
Sigma = np.diag(S)
A_reconstructed = U @ Sigma @ VT

print("\nReconstructed A (should match original):")
print(A_reconstructed)
print(f"Reconstruction error: {np.linalg.norm(A - A_reconstructed):.10f}")

## 3. Principal Component Analysis (PCA)

**PCA** is a technique for dimensionality reduction that:
1. Finds directions of maximum variance in data
2. Projects data onto these directions (principal components)
3. Reduces dimensions while preserving most information

**Mathematical Steps:**
1. Center the data: {centered} = X - ar{X}$
2. Compute covariance matrix:  = rac{1}{n-1}X_{centered}^T X_{centered}$
3. Find eigenvectors of $ (these are the principal components)
4. Project data: {pca} = X_{centered} \cdot 	ext{eigenvectors}$

**Applications:**
- Data visualization (reduce to 2D/3D)
- Feature extraction
- Noise reduction
- Data compression

In [None]:
# PCA from scratch

# Generate sample data
np.random.seed(42)
mean = [0, 0]
cov = [[3, 1.5], [1.5, 1]]
X = np.random.multivariate_normal(mean, cov, 200)

print("Original data shape:", X.shape)

# Step 1: Center the data
X_mean = np.mean(X, axis=0)
X_centered = X - X_mean

print(f"\nData mean before centering: {X_mean}")
print(f"Data mean after centering: {np.mean(X_centered, axis=0)}")

# Step 2: Compute covariance matrix
cov_matrix = np.cov(X_centered.T)
print("\nCovariance matrix:")
print(cov_matrix)

# Step 3: Compute eigenvectors and eigenvalues
eigenvalues, eigenvectors = np.linalg.eig(cov_matrix)

# Sort by eigenvalues (descending)
idx = eigenvalues.argsort()[::-1]
eigenvalues = eigenvalues[idx]
eigenvectors = eigenvectors[:, idx]

print("\nEigenvalues (variance explained):")
print(eigenvalues)
print("\nEigenvectors (principal components):")
print(eigenvectors)

# Variance explained ratio
var_explained = eigenvalues / np.sum(eigenvalues)
print("\nVariance explained ratio:")
for i, var in enumerate(var_explained):
    print(f"PC{i+1}: {var:.4f} ({var*100:.2f}%)")

In [None]:
# Visualize PCA

fig, axes = plt.subplots(1, 2, figsize=(14, 6))

# Original data
axes[0].scatter(X[:, 0], X[:, 1], alpha=0.6)
axes[0].scatter(X_mean[0], X_mean[1], c='red', s=200, marker='x', linewidths=3, label='Mean')

# Plot principal components
scale = 3
for i in range(2):
    axes[0].arrow(X_mean[0], X_mean[1], 
                 eigenvectors[0, i] * scale * np.sqrt(eigenvalues[i]),
                 eigenvectors[1, i] * scale * np.sqrt(eigenvalues[i]),
                 head_width=0.3, head_length=0.3, fc=f'C{i+2}', ec=f'C{i+2}', linewidth=2.5,
                 label=f'PC{i+1}')

axes[0].set_xlabel('Feature 1', fontsize=12)
axes[0].set_ylabel('Feature 2', fontsize=12)
axes[0].set_title('Original Data with Principal Components', fontsize=13, fontweight='bold')
axes[0].legend(fontsize=10)
axes[0].grid(True, alpha=0.3)
axes[0].set_aspect('equal')

# Project onto principal components
X_pca = X_centered @ eigenvectors

axes[1].scatter(X_pca[:, 0], X_pca[:, 1], alpha=0.6)
axes[1].axhline(0, color='k', linewidth=0.5)
axes[1].axvline(0, color='k', linewidth=0.5)
axes[1].set_xlabel('PC1', fontsize=12)
axes[1].set_ylabel('PC2', fontsize=12)
axes[1].set_title('Data in PCA Space', fontsize=13, fontweight='bold')
axes[1].grid(True, alpha=0.3)
axes[1].set_aspect('equal')

plt.tight_layout()
plt.show()

print("PC1 captures the direction of maximum variance.")
print("PC2 is orthogonal to PC1 and captures the next most variance.")

## 4. PCA with scikit-learn

While understanding the math is important, we typically use scikit-learn's PCA implementation in practice.

In [None]:
# PCA using scikit-learn

from sklearn.decomposition import PCA

# Load Iris dataset
iris = load_iris()
X_iris = iris.data
y_iris = iris.target

print("Iris dataset:")
print(f"Shape: {X_iris.shape} (150 samples, 4 features)")
print(f"Features: {iris.feature_names}")

# Standardize the data
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X_iris)

# Apply PCA
pca = PCA(n_components=2)
X_pca = pca.fit_transform(X_scaled)

print(f"\nPCA reduced shape: {X_pca.shape} (150 samples, 2 components)")
print("\nVariance explained by each component:")
for i, var in enumerate(pca.explained_variance_ratio_):
    print(f"PC{i+1}: {var:.4f} ({var*100:.2f}%)")
print(f"\nTotal variance explained: {np.sum(pca.explained_variance_ratio_)*100:.2f}%")

In [None]:
# Visualize Iris dataset in PCA space

plt.figure(figsize=(10, 7))

# Plot each class
colors = ['red', 'green', 'blue']
targets = iris.target_names

for target, color in zip([0, 1, 2], colors):
    indices = y_iris == target
    plt.scatter(X_pca[indices, 0], X_pca[indices, 1], 
               c=color, label=targets[target], s=50, alpha=0.7, edgecolors='black')

plt.xlabel(f'PC1 ({pca.explained_variance_ratio_[0]*100:.1f}% variance)', fontsize=12)
plt.ylabel(f'PC2 ({pca.explained_variance_ratio_[1]*100:.1f}% variance)', fontsize=12)
plt.title('Iris Dataset - PCA Projection', fontsize=14, fontweight='bold')
plt.legend(fontsize=11)
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

print("PCA successfully reduced 4D data to 2D while preserving class separability!")

## 5. Choosing Number of Components

How many principal components should we keep?

**Methods:**
1. **Variance threshold**: Keep components that explain X% of variance (e.g., 95%)
2. **Scree plot**: Look for elbow in variance explained
3. **Cross-validation**: Choose number that gives best model performance

In [None]:
# Scree plot - visualizing variance explained

# Fit PCA with all components
pca_full = PCA()
pca_full.fit(X_scaled)

# Calculate cumulative variance explained
cumsum_var = np.cumsum(pca_full.explained_variance_ratio_)

fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Individual variance explained
axes[0].bar(range(1, len(pca_full.explained_variance_ratio_) + 1),
           pca_full.explained_variance_ratio_, alpha=0.7, edgecolor='black')
axes[0].set_xlabel('Principal Component', fontsize=12)
axes[0].set_ylabel('Variance Explained Ratio', fontsize=12)
axes[0].set_title('Scree Plot', fontsize=13, fontweight='bold')
axes[0].set_xticks(range(1, len(pca_full.explained_variance_ratio_) + 1))
axes[0].grid(True, alpha=0.3, axis='y')

# Cumulative variance explained
axes[1].plot(range(1, len(cumsum_var) + 1), cumsum_var, 
            marker='o', linewidth=2.5, markersize=8)
axes[1].axhline(0.95, color='red', linestyle='--', linewidth=2, label='95% threshold')
axes[1].set_xlabel('Number of Components', fontsize=12)
axes[1].set_ylabel('Cumulative Variance Explained', fontsize=12)
axes[1].set_title('Cumulative Variance Explained', fontsize=13, fontweight='bold')
axes[1].set_xticks(range(1, len(cumsum_var) + 1))
axes[1].legend(fontsize=11)
axes[1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

# Find number of components for 95% variance
n_components_95 = np.argmax(cumsum_var >= 0.95) + 1
print(f"\nNumber of components needed for 95% variance: {n_components_95}")
print(f"With {n_components_95} components: {cumsum_var[n_components_95-1]*100:.2f}% variance explained")

## 6. Practice Exercises

### Exercise 1: Eigenvalues and Eigenvectors

Given matrix:
4390A = egin{bmatrix} 3 & 1 \ 1 & 3 \end{bmatrix}4390

Tasks:
1. Calculate eigenvalues and eigenvectors
2. Verify that ec{v} = \lambdaec{v}$ for each eigenpair
3. Visualize the eigenvectors

In [None]:
# Exercise 1 Solution
A = np.array([[3, 1],
              [1, 3]])

print("=== Exercise 1 Solution ===")
print("Matrix A:")
print(A)

# 1. Calculate eigenvalues and eigenvectors
eigenvalues, eigenvectors = np.linalg.eig(A)

print("\n1. Eigenvalues:", eigenvalues)
print("   Eigenvectors:")
print(eigenvectors)

# 2. Verify
print("\n2. Verification:")
for i in range(len(eigenvalues)):
    v = eigenvectors[:, i]
    lam = eigenvalues[i]
    Av = A @ v
    lam_v = lam * v
    print(f"\n   Eigenpair {i+1}:")
    print(f"   A*v = {Av}")
    print(f"   λ*v = {lam_v}")
    print(f"   Equal? {np.allclose(Av, lam_v)}")

# 3. Visualize
fig, ax = plt.subplots(figsize=(8, 8))

# Plot eigenvectors
origin = np.array([[0, 0], [0, 0]])
for i in range(2):
    v = eigenvectors[:, i]
    ax.quiver(*origin, v[0], v[1], angles='xy', scale_units='xy', scale=0.3,
             color=f'C{i}', width=0.012, label=f'v{i+1} (λ={eigenvalues[i]:.2f})', linewidth=2)

ax.set_xlim(-2, 2)
ax.set_ylim(-2, 2)
ax.set_xlabel('x', fontsize=12)
ax.set_ylabel('y', fontsize=12)
ax.set_title('Eigenvectors of A', fontsize=14, fontweight='bold')
ax.legend(fontsize=11)
ax.grid(True, alpha=0.3)
ax.axhline(0, color='k', linewidth=0.5)
ax.axvline(0, color='k', linewidth=0.5)
ax.set_aspect('equal')
plt.tight_layout()
plt.show()

### Exercise 2: PCA on Wine Dataset

Use PCA on the wine dataset:
1. Load and standardize the data
2. Apply PCA to reduce to 2 components
3. Visualize the data in PCA space
4. Calculate how much variance is explained

In [None]:
# Exercise 2 Solution
from sklearn.datasets import load_wine

print("=== Exercise 2 Solution ===")

# 1. Load and standardize
wine = load_wine()
X_wine = wine.data
y_wine = wine.target

print(f"Wine dataset shape: {X_wine.shape}")
print(f"Number of classes: {len(wine.target_names)}")
print(f"Classes: {wine.target_names}")

scaler = StandardScaler()
X_wine_scaled = scaler.fit_transform(X_wine)

# 2. Apply PCA
pca_wine = PCA(n_components=2)
X_wine_pca = pca_wine.fit_transform(X_wine_scaled)

print(f"\nPCA shape: {X_wine_pca.shape}")

# 3. Visualize
plt.figure(figsize=(10, 7))

colors = ['red', 'green', 'blue']
for target, color in zip([0, 1, 2], colors):
    indices = y_wine == target
    plt.scatter(X_wine_pca[indices, 0], X_wine_pca[indices, 1],
               c=color, label=wine.target_names[target], s=50, alpha=0.7, edgecolors='black')

plt.xlabel(f'PC1 ({pca_wine.explained_variance_ratio_[0]*100:.1f}% variance)', fontsize=12)
plt.ylabel(f'PC2 ({pca_wine.explained_variance_ratio_[1]*100:.1f}% variance)', fontsize=12)
plt.title('Wine Dataset - PCA Projection', fontsize=14, fontweight='bold')
plt.legend(fontsize=11)
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

# 4. Variance explained
print(f"\n4. Variance explained:")
print(f"   PC1: {pca_wine.explained_variance_ratio_[0]*100:.2f}%")
print(f"   PC2: {pca_wine.explained_variance_ratio_[1]*100:.2f}%")
print(f"   Total: {np.sum(pca_wine.explained_variance_ratio_)*100:.2f}%")
print(f"\n   Reduced from {X_wine.shape[1]} to 2 dimensions while retaining")
print(f"   {np.sum(pca_wine.explained_variance_ratio_)*100:.2f}% of the variance!")

## 7. Summary and Key Takeaways

In this module, you learned:

✅ **Eigenvalues and Eigenvectors**
- Special vectors that don't change direction under transformation
- ec{v} = \lambdaec{v}$
- Foundation for PCA and many ML algorithms

✅ **Singular Value Decomposition (SVD)**
- Decomposes any matrix:  = U\Sigma V^T$
- Generalizes eigendecomposition
- Applications: dimensionality reduction, compression

✅ **Principal Component Analysis (PCA)**
- Finds directions of maximum variance
- Reduces dimensionality while preserving information
- Steps: center → covariance → eigenvectors → project

✅ **PCA Applications**
- Data visualization (high-D → 2D/3D)
- Feature extraction and selection
- Noise reduction
- Data compression

✅ **Choosing Components**
- Variance explained threshold (e.g., 95%)
- Scree plot for visual inspection
- Cross-validation for optimal performance

### What's Next?

In **Module 06: Calculus Basics**, you'll learn:
- Limits and derivatives
- Chain rule and partial derivatives
- Gradient descent optimization
- Applications to neural networks

### Additional Resources

- [3Blue1Brown - Eigenvectors and Eigenvalues](https://www.youtube.com/watch?v=PFDu9oVAE-g)
- [StatQuest - PCA](https://www.youtube.com/watch?v=FgakZw6K1QQ)
- [Scikit-learn PCA Documentation](https://scikit-learn.org/stable/modules/decomposition.html#pca)
- [Stanford CS229 - PCA](http://cs229.stanford.edu/notes/cs229-notes10.pdf)

---

**Fantastic work!** You now understand the linear algebra behind dimensionality reduction and PCA - crucial for modern machine learning.

**Next**: Proceed to 