Here's a **detailed and structured note** covering every concept from the lecture on **PCA in Higher Dimensions** from your course (Lecture 6, Week 6). The goal is to understand how **PCA can be implemented efficiently** when **dimensionality (d)** is much greater than the **number of data points (n)**.

---

# 📘 PCA in Higher Dimensions — Detailed Notes

---

## 🔷 Objective

To **perform PCA efficiently** when:

* The data is high-dimensional (i.e., the feature space dimension `d` is very large).
* The number of data points `n` is **small** (i.e., `d >> n`).

---

## 🔶 Motivation

In standard PCA, we compute eigenvectors of the **covariance matrix** $C \in \mathbb{R}^{d \times d}$.
But when **d is large**, this is computationally **expensive**.

The trick:
Reformulate the PCA problem to work with an **$n \times n$** matrix (much smaller), which is computationally easier.

---

## 🔶 PCA Recap

Given:

* $\mathbf{x}_1, \mathbf{x}_2, \dots, \mathbf{x}_n \in \mathbb{R}^d$
* Mean of data:

  $$
  \bar{\mathbf{x}} = \frac{1}{n} \sum_{i=1}^n \mathbf{x}_i
  $$

Define:

* Mean-centered data:

  $$
  \mathbf{A} = [\mathbf{x}_1 - \bar{\mathbf{x}}, \dots, \mathbf{x}_n - \bar{\mathbf{x}}] \in \mathbb{R}^{d \times n}
  $$

Covariance matrix:

$$
\mathbf{C} = \frac{1}{n} \sum_{i=1}^n (\mathbf{x}_i - \bar{\mathbf{x}})(\mathbf{x}_i - \bar{\mathbf{x}})^T = \frac{1}{n} \mathbf{A} \mathbf{A}^T \in \mathbb{R}^{d \times d}
$$

---

## 🔶 Problem with Naive PCA in High Dimensions

* $\mathbf{C} = \frac{1}{n} \mathbf{A} \mathbf{A}^T$ is a **$d \times d$** matrix.
* If **$d \gg n$**, computing eigenvectors/eigenvalues of $\mathbf{C}$ is **computationally expensive**.

**Key Insight**:
The **rank** of $\mathbf{C}$ is at most $n$, because it is a sum of $n$ rank-one matrices:

$$
\mathbf{C} = \frac{1}{n} \sum_{i=1}^n (\mathbf{x}_i - \bar{\mathbf{x}})(\mathbf{x}_i - \bar{\mathbf{x}})^T
$$

* Each term is rank 1 ⇒ max rank of sum is $n$
* So, **$d - n$ eigenvalues are zero**
* It’s wasteful to compute $d$ eigenvectors when only $n$ can be nonzero.

---

## 🔶 Solution: Work with an $n \times n$ Matrix Instead

### Step 1: Define matrix A

Let:

$$
\mathbf{A} = [\mathbf{x}_1 - \bar{\mathbf{x}}, \dots, \mathbf{x}_n - \bar{\mathbf{x}}] \in \mathbb{R}^{d \times n}
$$

Then:

$$
\mathbf{C} = \frac{1}{n} \mathbf{A} \mathbf{A}^T \in \mathbb{R}^{d \times d}
$$

Let’s instead consider:

$$
\mathbf{C}' = \frac{1}{n} \mathbf{A}^T \mathbf{A} \in \mathbb{R}^{n \times n}
$$

---

## 🔶 Eigenvalue Trick

Let:

* $\lambda_i$ be a non-zero eigenvalue of $\mathbf{C} = \frac{1}{n} \mathbf{A} \mathbf{A}^T$
* $\mathbf{u}_i \in \mathbb{R}^d$ be the corresponding eigenvector

We claim:

$$
\lambda_i \text{ is also an eigenvalue of } \frac{1}{n} \mathbf{A}^T \mathbf{A}, \text{ and vice versa}
$$

### ✅ Proof Sketch:

Assume:

$$
\mathbf{C} \mathbf{u}_i = \lambda_i \mathbf{u}_i \Rightarrow \frac{1}{n} \mathbf{A} \mathbf{A}^T \mathbf{u}_i = \lambda_i \mathbf{u}_i
$$

Multiply both sides by $\mathbf{A}^T$:

$$
\Rightarrow \frac{1}{n} \mathbf{A}^T \mathbf{A} \mathbf{A}^T \mathbf{u}_i = \lambda_i \mathbf{A}^T \mathbf{u}_i
$$

So:

$$
\mathbf{A}^T \mathbf{u}_i \text{ is an eigenvector of } \mathbf{A}^T \mathbf{A}
$$

Hence:

* If $\mathbf{v}_i$ is an eigenvector of $\frac{1}{n} \mathbf{A}^T \mathbf{A}$, then

  $$
  \mathbf{u}_i = \frac{1}{\sqrt{\lambda_i}} \mathbf{A} \mathbf{v}_i
  $$

  is an eigenvector of $\frac{1}{n} \mathbf{A} \mathbf{A}^T$

This lets us **construct the principal components $\mathbf{u}_i$** from $\mathbf{v}_i$!

---

## 🔶 Final Procedure: PCA in High Dimensions

1. Compute **mean-centered data matrix** $\mathbf{A} \in \mathbb{R}^{d \times n}$

2. Form $\mathbf{C}' = \frac{1}{n} \mathbf{A}^T \mathbf{A} \in \mathbb{R}^{n \times n}$

3. Find the **top k eigenvectors** $\mathbf{v}_1, \dots, \mathbf{v}_k$ of $\mathbf{C}'$

4. Compute the **top k PCA directions** in $\mathbb{R}^d$ as:

   $$
   \mathbf{u}_i = \frac{1}{\sqrt{\lambda_i}} \mathbf{A} \mathbf{v}_i, \quad i = 1, \dots, k
   $$

5. These $\mathbf{u}_i$'s are the **principal components** (columns of the projection matrix)

---

## 🔶 Key Learning Outcomes

✅ Reformulate the PCA eigenvector computation from $d \times d$ to $n \times n$ matrix

✅ Use the **duality between** $\mathbf{A}^T \mathbf{A}$ and $\mathbf{A} \mathbf{A}^T$ for eigenvalue problems

✅ Achieve **exact same PCA projection** but with **lower computational cost**

✅ Especially useful when **features (d) are huge** but **samples (n) are few**

✅ Connection to **SVD**:

* This trick is the basis of PCA via **Singular Value Decomposition**:

  $$
  \mathbf{A} = \mathbf{U} \mathbf{\Sigma} \mathbf{V}^T
  $$

  * Eigenvectors of $\mathbf{C} = \mathbf{A} \mathbf{A}^T$ are columns of $\mathbf{U}$
  * Eigenvectors of $\mathbf{A}^T \mathbf{A}$ are columns of $\mathbf{V}$
  * Singular values are $\sigma_i = \sqrt{\lambda_i}$

---

## 🔚 Summary

* **Goal**: Perform PCA efficiently in high-dimensional space
* **Strategy**: Work with $n \times n$ matrix $\mathbf{A}^T \mathbf{A}$ instead of $d \times d$ covariance matrix
* **Benefit**: Reduce computation from $O(d^3)$ to $O(n^3)$ when $d \gg n$
* **Application**: Widely used in text, genomics, and image processing where features are more than samples