Here's a detailed, in-depth breakdown of the **"Principal Component Analysis (PCA) – Lecture 4 (Contd.)"** covering each concept mentioned in the lecture. This is designed as comprehensive study notes:

---

# 🧠 Principal Component Analysis (PCA) – Lecture 4 (Contd.)

### 📘 Lecture Context:

* Week: 6
* Lecture: 4 (Continued from previous lecture)
* Focus: Derivation using Lagrangian, arriving at PCA algorithm, and worked-out example.

---

## 🔁 Recap of PCA Motivation

We are given a dataset $x_1, x_2, \dots, x_n \in \mathbb{R}^d$, and we aim to project it onto a lower-dimensional subspace $\mathbb{R}^m$ such that the **reconstruction error** is minimized.

---

## 🔍 Optimization Problem – A Simple Setup

### 🎯 Objective:

Minimize:

$$
J^* = \sum_{j=m+1}^{d} u_j^\top C u_j
$$

subject to:

$$
u_j^\top u_j = 1
$$

This is a **constrained optimization problem**, and we solve it using **Lagrangian multipliers**.

---

## 🔧 Solving with Lagrangian

### 🔣 Formulate Lagrangian:

Let’s consider a simpler version:

$$
\min \, u^\top C u \quad \text{subject to} \quad u^\top u = 1
$$

Define:

$$
L(u, \lambda) = u^\top C u - \lambda(u^\top u - 1)
$$

Differentiate $L$ w\.r.t. $u$ and set to 0:

$$
\nabla_u L = 2Cu - 2\lambda u = 0 \quad \Rightarrow \quad Cu = \lambda u
$$

This is the **eigenvalue equation**. So, $u$ is an eigenvector and $\lambda$ is the corresponding eigenvalue of matrix $C$.

✅ **Conclusion**:

* The minimum of $u^\top C u$ is achieved by choosing the **eigenvector corresponding to the smallest eigenvalue** of $C$.
* Similarly, the largest projection (max variance) is in the direction of the **largest eigenvalue**.

---

## 🧠 Properties of Covariance Matrix $C$

Let:

$$
C = \frac{1}{n} \sum_{i=1}^{n} (x_i - \bar{x})(x_i - \bar{x})^\top
$$

* $C$ is **real and symmetric**
* ⇒ All eigenvalues are **real**
* ⇒ Eigenvectors are **orthonormal**
* ⇒ There exists a **basis** of eigenvectors $u_1, u_2, ..., u_d$

Let eigenvalues be ordered as:

$$
\lambda_1 \geq \lambda_2 \geq \dots \geq \lambda_d
$$

---

## 🧮 PCA Algorithm Summary

### ✍️ Given: Data $x_1, x_2, \dots, x_n \in \mathbb{R}^d$

### 📌 Steps:

1. **Compute Mean**:

   $$
   \bar{x} = \frac{1}{n} \sum_{j=1}^{n} x_j
   $$

2. **Compute Covariance Matrix**:

   $$
   C = \frac{1}{n} \sum_{i=1}^{n} (x_i - \bar{x})(x_i - \bar{x})^\top
   $$

3. **Compute Eigenvalues and Eigenvectors** of $C$:

   $$
   \lambda_1 \geq \lambda_2 \geq \dots \geq \lambda_d
   $$

   With corresponding eigenvectors:

   $$
   u_1, u_2, ..., u_d
   $$

4. **Select Top-**$m$\*\* Eigenvectors\*\*:

   * These define the lower-dimensional subspace
   * $u_1, ..., u_m$ ← Top $m$ eigenvectors (largest eigenvalues)

5. **Project Data**:

   $$
   \tilde{x}_i = \sum_{j=1}^{m} (x_i^\top u_j) u_j + \sum_{j=m+1}^{d} (\bar{x}^\top u_j) u_j
   $$

   * In centered data, $\bar{x} = 0$, so:

     $$
     \tilde{x}_i = \sum_{j=1}^{m} (x_i^\top u_j) u_j
     $$

6. **Minimization**:

   * Reconstruction error is minimized by choosing $u_{m+1}, ..., u_d$ as the eigenvectors corresponding to the **smallest eigenvalues**.

---

## ✅ Example: 2D to 1D PCA

### 🧾 Data Points:

$$
x_1 = (-1, -1), \quad x_2 = (0, 0), \quad x_3 = (1, 1)
$$

### Step 1: Centering the Data

* $\bar{x} = (0, 0)$, so already centered

### Step 2: Covariance Matrix

$$
C = \frac{1}{3} \sum_{i=1}^{3} x_i x_i^\top
$$

Compute each:

$$
x_1 x_1^\top = \begin{bmatrix} 1 & 1 \\ 1 & 1 \end{bmatrix}, \quad
x_2 x_2^\top = \begin{bmatrix} 0 & 0 \\ 0 & 0 \end{bmatrix}, \quad
x_3 x_3^\top = \begin{bmatrix} 1 & 1 \\ 1 & 1 \end{bmatrix}
$$

So:

$$
C = \frac{1}{3} \left( \begin{bmatrix} 2 & 2 \\ 2 & 2 \end{bmatrix} \right)
= \frac{2}{3} \begin{bmatrix} 1 & 1 \\ 1 & 1 \end{bmatrix}
$$

### Step 3: Eigenvalues and Eigenvectors

Characteristic equation:

$$
\det(C - \lambda I) = 0
\Rightarrow \left| \begin{bmatrix} \frac{2}{3}-\lambda & \frac{2}{3} \\ \frac{2}{3} & \frac{2}{3}-\lambda \end{bmatrix} \right| = 0
\Rightarrow (\frac{2}{3}-\lambda)^2 - \left(\frac{2}{3}\right)^2 = 0
\Rightarrow \lambda = \frac{4}{3}, 0
$$

Eigenvectors:

* $u_1 = \frac{1}{\sqrt{2}}(1, 1)^\top$
* $u_2 = \frac{1}{\sqrt{2}}(1, -1)^\top$

### Step 4: Projection

We choose 1D projection, so use $u_1$:

$$
\tilde{x}_i = (x_i^\top u_1) u_1
$$

Example:

* $x_1 = (-1, -1) \Rightarrow \tilde{x}_1 = -\sqrt{2} \cdot \frac{1}{\sqrt{2}} (1,1) = (-1, -1)$
* $x_2 = (0, 0) \Rightarrow \tilde{x}_2 = (0, 0)$
* $x_3 = (1, 1) \Rightarrow \tilde{x}_3 = (1, 1)$

So the data was already on a 1D line. PCA **does nothing** in this case.

### Step 5: Reconstruction Error

Since projections equal original points:

$$
J^* = \frac{1}{3} \sum_{i=1}^3 \|x_i - \tilde{x}_i\|^2 = 0
$$

---

## 🧾 Final Summary

| Concept         | Explanation                                                            |
| --------------- | ---------------------------------------------------------------------- |
| **Goal**        | Reduce dimension while preserving structure (low reconstruction error) |
| **Method**      | Use eigen-decomposition of the covariance matrix                       |
| **Key Step**    | Choose top-m eigenvectors (max variance), project data                 |
| **Mathematics** | Use Lagrangian to solve constrained optimization                       |
| **Error**       | Reconstruction error minimized using smallest eigenvalues              |

---

## 🔜 What’s Next?

In the next lecture, PCA will be interpreted from the **variance maximization** perspective, offering a dual view:

* We've seen **minimizing reconstruction error**
* Next: **maximizing variance of projected data**