# 🧠 PCA (Principal Component Analysis)
**Teaching Assistant: Arshiya Doosti**
### Linear Algebra Explanation + Practical Exercises

## Overview
Principal Component Analysis (PCA) reduces the dimensionality of a dataset while retaining most of its variance. It does this by finding new axes (principal components) that are linear combinations of the original features.

**Steps:**
1. Standardize the data
2. Compute the covariance matrix
3. Find eigenvalues and eigenvectors
4. Project the data onto principal components

In [None]:
# Run this cell to import required libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris
from sklearn.preprocessing import StandardScaler

## 🏁 Step 1: Load and Preprocess Data

In [None]:
# Load the Iris dataset and standardize the features
# TODO: Fill in the missing parts

# Load the data
data = load_iris()
X = data.data  # features
y = data.target  # labels

# Standardize the data
scaler = StandardScaler()
X_scaled = scaler.__________  # <<< FILL HERE >>>

# Check the shape
print("Shape of X_scaled:", X_scaled.shape)

## 🧮 Step 2: Compute the Covariance Matrix

In [None]:
# TODO: Compute the covariance matrix of the standardized data

cov_matrix = __________  # <<< FILL HERE >>>
print("Covariance matrix shape:", cov_matrix.shape)

## 🔍 Step 3: Perform Eigen Decomposition

In [None]:
# TODO: Find the eigenvalues and eigenvectors of the covariance matrix

eigenvalues, eigenvectors = __________  # <<< FILL HERE >>>

# Sort eigenvalues and eigenvectors
idx = np.argsort(eigenvalues)[::-1]
eigenvalues = eigenvalues[idx]
eigenvectors = eigenvectors[:, idx]

## 📐 Step 4: Project Data onto Top 2 Principal Components

In [None]:
# TODO: Project the data onto the top 2 principal components

W = eigenvectors[:, :2]
X_pca = __________  # <<< FILL HERE >>>

print("Shape of X_pca:", X_pca.shape)

## 📊 Step 5: Visualize the Results

In [None]:
# TODO: Plot the 2D PCA results

plt.figure(figsize=(8,6))
for label in np.unique(y):
    plt.scatter(X_pca[y == label, 0], X_pca[y == label, 1], label=data.target_names[label])

plt.xlabel("Principal Component 1")
plt.ylabel("Principal Component 2")
plt.title("PCA of Iris Dataset")
plt.legend()
plt.grid(True)
plt.show()

## 🧪 Exercises
Complete the following tasks to deepen your understanding:

### ✍️ Task 1: Use a Different Dataset
**Replace Iris dataset with the Wine dataset** from `sklearn.datasets`. Perform all the PCA steps again.

In [None]:
# TODO: Load the Wine dataset and perform PCA
# HINT: from sklearn.datasets import load_wine

# Your code here

### ✍️ Task 2: Implement PCA using SVD
Instead of eigen decomposition, implement PCA using Singular Value Decomposition (SVD).

In [None]:
# TODO: Perform PCA using SVD
# HINT: Use np.linalg.svd

# Your code here

### ✍️ Task 3: Compare with sklearn PCA
Use `sklearn.decomposition.PCA` to compare your implementation.

In [None]:
# TODO: Compare with sklearn PCA
from sklearn.decomposition import PCA

# Your code here