# Principal Component Analysis (PCA)
#### PCA is a dimensionality reduction technique used to reduce the number of features while preserving as much information (variance) as possible.

### In simple words:
#### PCA converts many correlated features into fewer uncorrelated new features.

## Why Do We Need PCA?
### When:
<ul>
    <li>Dataset has many features</li>
    <li>Features are highly correlated</li>
    <li>Curse of dimensionality occurs</li>
    <li>Computation becomes expensive</li>
    <li>Overfitting risk increases</li>
</ul>

#### PCA helps by reducing dimensions.

---

## Core Idea of PCA
#### Instead of using original features:
```sh
X1, X2, X3, X4
```

#### PCA creates:
```sh
PC1, PC2, PC3, PC4
```
#### Where:
<ul>
    <li>PC1 → captures maximum variance</li>
    <li>PC2 → captures second highest variance</li>
    <li>PC3 → next</li>
    <li>All PCs are orthogonal (uncorrelated)</li>
</ul>

---

## Mathematical Idea (Conceptual)
<ol>
    <li>Standardize data</li>
    <li>Compute covariance matrix</li>
    <li>Compute eigenvalues & eigenvectors</li>
    <li>Choose top eigenvectors</li>
    <li>Project data onto them</li>
</ol>

---

## Important Terms
## 1️⃣ Principal Components (PCs)
#### New transformed features.

## 2️⃣ Eigenvectors
#### Directions of maximum variance.

## 3️⃣ Eigenvalues
#### Amount of variance captured by each component.

---

## Explained Variance
#### Each component captures some percentage of total variance.
### Example:
```sh
Component	Explained Variance
    PC1	           60%
    PC2	           25%
    PC3	           10%
    PC4	            5% 
```
#### You might keep only PC1 & PC2 (85% variance retained).

---

## In Python
```sh
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA

# 1 Scale data
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# 2 Apply PCA
pca = PCA(n_components=2)
X_pca = pca.fit_transform(X_scaled)

print(pca.explained_variance_ratio_)
```

---

## Choosing Number of Components
### Method 1: Explained Variance
```sh
pca = PCA(n_components=0.95)
```
#### Keeps enough components to retain 95% variance.

### Method 2: Scree Plot
#### Plot cumulative explained variance and find elbow point.

---

## When to Use PCA?
<ul>
    <li>When features are correlated</li>
    <li>When dimensionality is high</li>
    <li>Before KNN or clustering</li>
    <li>For visualization (2D/3D)</li>
</ul>

## Important Points
<ul>
    <li>PCA is unsupervised (does not use target variable)</li>
    <li>PCA creates linear combinations of features</li>
    <li>PCA reduces interpretability</li>
    <li>Must scale data before applying PCA</li>
</ul>

## What PCA Actually Does Geometrically
### Variance = Spread
#### 1. Rotates coordinate system
#### 2. Projects data onto these new axes
#### 3. If most variance is along PC1: We can drop PC2 with little information loss.
#### That’s dimensionality reduction.