#### Problem statement 

Given

1) Each data point is a vector
2) Example:
3) A point x = (x1,x2) can be seen as a vector in 2D space

4) We have many such vectors (the dataset)

### Projection of one vector onto another (origin-based)

Given two points from the origin:
- Vector **v** = (x₁, y₁)
- Vector **u** = (x₂, y₂)

We want the projection of **v** onto **u**.

#### Step 1: Dot products
v · u = x₁x₂ + y₁y₂  
u · u = x₂² + y₂²

#### Step 2: Projection formula
The projection of **v** onto **u** is:

projᵤ(v) = ( (v · u) / (u · u) ) · u

#### Step 3: Write it explicitly
projᵤ(v) = ( (x₁x₂ + y₁y₂) / (x₂² + y₂²) ) · (x₂, y₂)

= ( ((x₁x₂ + y₁y₂)x₂) / (x₂² + y₂²),
    ((x₁x₂ + y₁y₂)y₂) / (x₂² + y₂²) )

#### Interpretation
- This gives the **shadow of (x₁, y₁)** along the direction **(x₂, y₂)**.
- If v is perpendicular to u → projection is (0, 0).
- If v is parallel to u → projection equals v (scaled).


### Variance of Projected Data (Same Example)

Assume we have n data points:
x₁, x₂, ..., xₙ  
Each point xᵢ is a vector in ℝ² (or ℝᵈ).

Let u be a unit vector (projection direction).

---

### Step 1: Projection gives scalar values

Project each data point xᵢ onto u:

zᵢ = xᵢ · u

So after projection:
- Each vector → one scalar value
- Dataset becomes: z₁, z₂, ..., zₙ

---

### Step 2: Mean of projected values

μ_z = (1/n) Σ (xᵢ · u)

---

### Step 3: Variance of projected data

Variance is defined as:

Var(z) = (1/n) Σ (zᵢ − μ_z)²

Substitute zᵢ = xᵢ · u:

Var(z) = (1/n) Σ [ (xᵢ · u − μ_z)² ]

This is the variance **along direction u**.

---

### Step 4: PCA objective

We choose the direction u such that:

Var(z) is maximized

i.e.,

maximize  (1/n) Σ (xᵢ · u − μ_z)²  
subject to  ||u|| = 1

This optimal direction u becomes
the **first principal component (PC1)**.


### Covariance (What & Why)

Covariance measures how two variables change together.

Given two features X and Y with n observations:

Cov(X, Y) = (1/n) Σ (xᵢ − μₓ)(yᵢ − μᵧ)

where:
- μₓ = mean of X
- μᵧ = mean of Y

---

### Interpretation

Cov(X, Y) > 0  
→ X and Y increase together (positive relationship)

Cov(X, Y) < 0  
→ One increases, the other decreases (negative relationship)

Cov(X, Y) = 0  
→ No linear relationship

---

### Relation to Variance

Variance is a special case of covariance:

Var(X) = Cov(X, X)

So:
- Variance → spread of one variable
- Covariance → joint spread of two variables

---

### Covariance Matrix

For a dataset with d features:

Σ =
[ Cov(X₁, X₁)  Cov(X₁, X₂)  ... ]
[ Cov(X₂, X₁)  Cov(X₂, X₂)  ... ]
[      .            .      .  ]

- Diagonal elements → variances
- Off-diagonal elements → covariances

---

### Why Covariance is Crucial for PCA

- PCA looks for directions where variance is maximized
- Variance along a direction depends on how features vary together
- Covariance matrix captures all variance + relationships at once

Key result:
Variance after projection onto direction u is:

Var(z) = uᵀ Σ u

PCA chooses u that maximizes this quantity.

---

###  
> Covariance tells us how features move together, and PCA uses this information to find directions of maximum variance.


> The covariance matrix summarizes how all features vary individually and together, and PCA uses its eigenvectors to find directions of maximum variance.

## Linear Transformation ,  Eigen Vector & Eigen Values 

### Linear Transformation

A linear transformation is a function that maps vectors to vectors
while preserving linear structure.

Mathematically:
y = A x

where:
- x is the input vector
- A is a matrix
- y is the transformed vector

In PCA:
- The covariance matrix Σ acts as a linear transformation
- It stretches, compresses, and rotates the data space

---

### Eigenvectors and Eigenvalues

An eigenvector is a special vector that does NOT change its direction
after a linear transformation.

Only its magnitude changes.

Mathematically:
A v = λ v

where:
- v is an eigenvector
- λ (lambda) is the eigenvalue
- A is the transformation matrix

Meaning:
- The transformation scales v by λ
- Direction of v remains the same

---

### Geometric Interpretation

- Eigenvectors → directions that remain fixed under transformation
- Eigenvalues → amount of stretching or compression along that direction

If:
λ > 1 → stretching  
0 < λ < 1 → compression  
λ = 0 → collapse to zero  
λ < 0 → flip + scale  

---

### Why Eigenvectors Matter in PCA

In PCA:
- A = covariance matrix Σ
- Eigenvectors of Σ → principal component directions
- Eigenvalues → variance captured along each direction

Key result:
Variance along eigenvector v = corresponding eigenvalue λ

So:
- Largest eigenvalue → maximum variance
- Corresponding eigenvector → PC1

---

### 
Eigenvectors give the directions of maximum variance,
and eigenvalues tell how much variance lies along those directions.

---

### PCA Pipeline Connection

Data → Covariance Matrix → Eigenvectors → Projection → Dimensionality Reduction


>Eigenvectors are directions that remain unchanged (up to scaling) under a linear transformation.

>  eigenvector corresponding to the largest eigenvalue of the covariance matrix
gives the direction along which the data has the maximum variance.


> so we have to do Eigendecomposition of a Covariance : 

[ Read blog ](https://www.visiondummy.com/2014/04/geometric-interpretation-covariance-matrix/#:~:text=covariance%20matrix%20captures%20the%20spread%20of%20N-dimensional%20data.&text=Figure%203.,is%20captured%20by%20the%20variance)

[Visualizing Linear Transformations](https://www.geogebra.org/m/YCZa8TAH)