# 📜 Evaluation Metrics for Dimensionality Reduction (AI/ML/DL)

---

## 🔹 1. Classical ML Metrics

**Reconstruction Error (PCA, Autoencoders):**

$$
L = \|X - \hat{X}\|^2
$$

Measures how well the original input can be reconstructed.

---

**Explained Variance Ratio (PCA):**

$$
\text{Explained Variance} = \frac{\text{Var}(X_{\text{reduced}})}{\text{Var}(X)}
$$

Fraction of total variance captured by reduced components.

---

**Stress (MDS):**

$$
\text{Stress} = \frac{\sum (d_{ij}^{HD} - d_{ij}^{LD})^2}{\sum (d_{ij}^{HD})^2}
$$

Measures preservation of pairwise distances.

---

## 🔹 2. Manifold Learning Metrics

**Trustworthiness (Kaski & Venna, 2001):**  
Measures how many neighbors in low-d space are true neighbors in high-d.

**Continuity:**  
Reverse of trustworthiness → how many high-d neighbors are preserved in low-d.

**Mean Relative Rank Error (MRRE):**  
Quantifies how well neighborhood ranks are preserved.

**Local Continuity Meta-Criterion (LCMC):**  
Compares overlap of $k$-nearest neighbors between spaces.

---

## 🔹 3. Information-Theoretic Metrics

**KL Divergence (t-SNE):**

$$
KL(P^{HD} \parallel Q^{LD})
$$

Preserves local neighbor probability distributions.

---

**Cross-Entropy (UMAP):**  
Loss based on matching fuzzy neighbor graphs.

---

**Mutual Information:**  
Measures shared structural information between original and reduced representations.

---

## 🔹 4. Deep Learning Metrics

**Autoencoder Losses:**  
MSE, Cross-Entropy, or KL terms (VAEs, β-VAEs).  

**Clustering Performance (Downstream):**  
Apply clustering → measure **NMI, ARI, Purity**.  

**Classification Accuracy (Downstream):**  
Train classifier on embeddings → measure **Accuracy, F1**.  

---

## 🔹 5. Visualization & Perceptual Metrics

**Neighborhood Preservation (NP@k):**  
Fraction of top-$k$ neighbors preserved.  

**Shepard Diagram Correlation:**  
Correlation of pairwise distances (HD vs LD).  

**Stress Per Point (SPP):**  
Local stress per datapoint.  

**Visual Inspection (plots):**  
Often used qualitatively (t-SNE, UMAP scatterplots).  

---

## ✅ Summary Families

- **Reconstruction metrics:** Error, explained variance → PCA, AEs.  
- **Distance preservation:** Stress, Shepard correlation → MDS.  
- **Neighborhood preservation:** Trustworthiness, Continuity, MRRE, LCMC → t-SNE, UMAP.  
- **Information-theoretic:** KL divergence (t-SNE), cross-entropy (UMAP).  
- **Task-based (downstream):** Classification (Accuracy, F1), Clustering (ARI, NMI).  


# 📊 Comparative Table: Dimensionality Reduction Evaluation Metrics (AI/ML/DL)

| Metric                          | Formula (simplified)                                                                 | Intuition                                | Pros                              | Cons                                    | When to Use                  |
|---------------------------------|--------------------------------------------------------------------------------------|------------------------------------------|-----------------------------------|-----------------------------------------|-------------------------------|
| **Reconstruction Error (PCA/AE)** | $$\|X - \hat{X}\|^2$$                                                                 | Measures how well original data is reconstructed | Simple, intuitive                 | Doesn’t measure structure preservation | Autoencoders, PCA             |
| **Explained Variance Ratio (PCA)** | $$\frac{\text{Var}(X_{\text{reduced}})}{\text{Var}(X)}$$                              | How much variance is captured             | Clear interpretability             | Linear only                            | PCA, linear DR                |
| **Stress (MDS)**                | $$\frac{\sum (d_{ij}^{HD} - d_{ij}^{LD})^2}{\sum (d_{ij}^{HD})^2}$$                   | Preserves pairwise distances              | Good for distance-based DR         | Costly on large $n$                    | MDS, Isomap                   |
| **Shepard Diagram Correlation** | $$\text{Corr}(d^{HD}, d^{LD})$$                                                       | Correlation of distances                  | Intuitive                          | No single score                        | Distance preservation tasks   |
| **Trustworthiness**             | Penalizes non-neighbors in LD space                                                   | How many LD neighbors are true HD neighbors | Captures local preservation        | Ignores global                         | t-SNE, UMAP embeddings        |
| **Continuity**                  | Reverse of trustworthiness                                                            | HD neighbors preserved in LD              | Complements trustworthiness        | Ignores false neighbors                | Local manifold DR             |
| **MRRE (Mean Relative Rank Error)** | Avg. rank mismatch between HD & LD neighbors                                         | Preserves rank order                      | Detailed neighborhood quality      | Harder to interpret                     | Nonlinear DR                  |
| **LCMC (Local Continuity Meta-Criterion)** | Overlap of $k$-NN sets                                                           | Captures neighborhood overlap             | Intuitive                          | Sensitive to $k$                        | t-SNE/UMAP evaluation         |
| **KL Divergence (t-SNE)**       | $$KL(P^{HD} \parallel Q^{LD})$$                                                       | Match neighbor distributions              | Optimized by t-SNE                 | Emphasizes local, not global           | t-SNE                         |
| **Cross-Entropy (UMAP)**        | $$-\sum p_{ij}\log q_{ij} + (1-p_{ij})\log(1-q_{ij})$$                                | Match fuzzy neighbor graphs               | Captures local & global            | Sensitive to parameters                | UMAP embeddings               |
| **Mutual Information (MI)**     | MI between HD & LD spaces                                                             | Shared structural info                    | Information-theoretic              | Computationally expensive              | DR preserving global info     |
| **Clustering Accuracy (ACC)**   | $$ACC = \max_{\pi} \frac{1}{n}\sum 1[y_i = \pi(c_i)]$$ (Hungarian matching)           | Aligns clusters with ground truth labels  | Direct interpretability            | Needs labels                           | AE/DEC embeddings             |
| **NMI / ARI / Purity (external)** | Standard clustering metrics on embeddings                                           | Evaluate latent clustering quality        | Well-standardized                  | Requires ground truth labels           | Downstream clustering tasks   |
| **Classification Accuracy (external)** | Classifier performance on reduced features                                       | Task-based evaluation                     | Practical                          | Needs labels                           | DR for supervised downstream  |
| **NP@k (Neighbor Preservation)** | Fraction of top-$k$ neighbors preserved                                               | Local neighborhood retention              | Easy to interpret                  | Needs $k$ choice                       | Embedding quality             |
| **Stress Per Point (SPP)**      | Localized stress per sample                                                           | Point-level quality metric                | Detailed per-point insight         | Less global interpretability           | DR error analysis             |
| **Visual Evaluation**           | Scatterplots (t-SNE/UMAP)                                                             | Intuitive cluster visualization           | Human-interpretable                | Subjective                             | Exploratory analysis           |

---

## ✅ Key Insights

- **Reconstruction metrics** → PCA, Autoencoders.  
- **Variance/distance metrics** → Explained variance, Stress, Shepard correlation.  
- **Neighborhood metrics** → Trustworthiness, Continuity, MRRE, LCMC.  
- **Info-theoretic metrics** → KL (t-SNE), Cross-Entropy (UMAP), MI.  
- **Task-based metrics** → ACC, NMI, ARI, Classification Accuracy.  
- **Visualization metrics** → NP@k, SPP, scatterplots.  
