# 📜 Clustering Loss Functions in AI/ML/DL

---

## 🔹 1. Classical ML Clustering Losses

### a) Partitioning-Based Losses

**k-Means Loss (Distortion / Inertia):**

$$
L = \sum_{i=1}^n \|x_i - c_{z_i}\|^2
$$

- Minimize squared distance between data points and their assigned cluster centroid.  
✅ Simple, efficient  
❌ Sensitive to initialization, assumes spherical clusters  

**k-Medoids Loss:**
- Uses actual data points (medoids) instead of centroids.  
✅ More robust to outliers  
❌ Computationally heavier  

---

### b) Density-Based Clustering (DBSCAN, OPTICS)

- No strict global loss.  
- Based on **density reachability**: clusters = dense regions separated by sparse areas.  
✅ Arbitrary-shaped clusters  
❌ No explicit objective function  

---

### c) Probabilistic Clustering

**Gaussian Mixture Models (GMMs):**

$$
L = -\sum_{i=1}^n \log \sum_{k=1}^K \pi_k \, \mathcal{N}(x_i \,|\, \mu_k, \Sigma_k)
$$

- Negative log-likelihood of the mixture distribution.  
✅ Captures **soft memberships**  
❌ Assumes Gaussian components  

**EM Algorithm:**  
- Maximizes expected complete-data likelihood.  

---

### d) Spectral Clustering

**Normalized Cut Loss:**

$$
\text{Ncut}(A,B) = \frac{\text{cut}(A,B)}{\text{assoc}(A,V)} + \frac{\text{cut}(A,B)}{\text{assoc}(B,V)}
$$

- Minimize edge weights between clusters relative to total edge weights.  
✅ Good for graph-based clustering  
❌ Requires costly eigen-decomposition  

---

## 🔹 2. Clustering Losses in Deep Learning

### a) Autoencoder-Based

**Deep Embedded Clustering (DEC, 2016):**

$$
L = KL(P \parallel Q) = \sum_i \sum_j p_{ij} \log \frac{p_{ij}}{q_{ij}}
$$

- KL divergence between soft assignments and target distribution.  
✅ Joint feature learning + clustering  
❌ Sensitive to initialization  

**Denoising Autoencoder Clustering:**  
- Loss = **Reconstruction error + clustering objective**.  

---

### b) Generative Model-Based

**ClusterGAN (2018):**  
- Combines **GAN adversarial loss** + **clustering loss on latent codes**.  
✅ Joint generation + clustering  
❌ GAN instability  

**VAE + GMM (Variational Deep Embedding):**  
- Combines **VAE reconstruction loss + KL divergence + GMM likelihood**.  

---

### c) Contrastive / Self-Supervised Clustering

**DeepCluster (2018):**  
- Loss = Cross-entropy between pseudo-labels (from k-Means) and predicted features.  

**SwAV (2020):**  
- Loss = **Optimal transport assignment + contrastive loss**.  
- Enables **online clustering**.  

---

## 🔹 3. Clustering Evaluation Losses

(Not training losses, but assessment metrics.)

- **Silhouette Score:**  
$$
s = \frac{b - a}{\max(a, b)}
$$

- **Davies–Bouldin Index:** average similarity between clusters.  
- **Adjusted Rand Index (ARI) / Mutual Information (MI):** compare to ground truth labels.  

---

## ✅ Summary Families

**Classical ML:**
- k-Means → squared distance loss.  
- GMMs → log-likelihood.  
- Spectral clustering → graph cut objective.  

**Deep Learning:**
- DEC → KL divergence.  
- VAE+GMM → likelihood + KL.  
- ClusterGAN → adversarial + clustering.  
- Self-supervised → DeepCluster, SwAV.  


# 📊 Comparative Table: Clustering Loss Functions (AI/ML/DL)

| Loss / Method                  | Formula (simplified)                                                                 | Intuition                                      | Pros                                     | Cons                                       | When to Use                                   |
|--------------------------------|---------------------------------------------------------------------------------------|-----------------------------------------------|------------------------------------------|--------------------------------------------|-----------------------------------------------|
| **k-Means (Distortion)**       | $$L = \sum_{i=1}^n \|x_i - c_{z_i}\|^2$$                                             | Minimize distance between points & centroids  | Simple, efficient                        | Assumes spherical clusters, sensitive init | General-purpose, large $$n$$, fast clustering |
| **k-Medoids**                  | $$L = \sum_{i=1}^n \|x_i - m_{z_i}\|$$                                               | Same as k-Means but center = actual datapoint | Robust to outliers                       | More expensive than k-Means                | Small datasets, robustness needed              |
| **GMM (Gaussian Mixtures)**    | $$L = -\sum_i \log \sum_k \pi_k \,\mathcal{N}(x_i \mid \mu_k,\Sigma_k)$$             | Fit Gaussian mixtures via NLL                 | Soft assignments, flexible                | Assumes Gaussian shape, expensive          | Probabilistic clustering, density estimation   |
| **Spectral Clustering (Ncut)** | $$\text{Ncut}(A,B) = \frac{\text{cut}(A,B)}{\text{assoc}(A,V)} + \frac{\text{cut}(A,B)}{\text{assoc}(B,V)}$$ | Minimize inter-cluster graph connections     | Handles non-convex clusters               | Requires eigen-decomposition               | Graph data, manifold / nonlinear clustering   |
| **DBSCAN / OPTICS**            | Density-based (no closed loss)                                                       | Clusters = dense regions separated by noise   | Arbitrary shapes, noise-resilient         | Sensitive to density params                | Spatial data, irregular shapes                |
| **DEC (2016)**                 | $$L = KL(P \parallel Q) = \sum_i \sum_j p_{ij} \log \frac{p_{ij}}{q_{ij}}$$          | KL divergence between soft & target assigns   | Joint rep. learning + clustering          | Sensitive to initialization                | Deep unsupervised clustering                  |
| **VAE + GMM (VaDE)**           | $$L = L_{\text{recon}} + KL + \text{GMM likelihood}$$                                | Latent VAE + GMM clustering                  | Uncertainty-aware, probabilistic          | Complex training                           | DR + clustering with generative models        |
| **ClusterGAN (2018)**          | GAN loss + clustering penalty                                                        | Align latent GAN space with clusters          | Combines generation + clustering          | GAN instability                            | Generative + clustering tasks                 |
| **DeepCluster (2018)**         | Cross-Entropy between pseudo-labels & predictions                                   | Alt. between k-Means and CNN training         | Scales to large data                      | Sensitive to noise in labels               | Vision feature learning                       |
| **SwAV (2020)**                 | Contrastive + Optimal Transport assignment                                          | Aligns views by swapping assignments          | SSL + clustering jointly                  | Complex optimization                        | Large-scale SSL in vision/NLP                 |

---

## ✅ Insights

**Classical ML losses:**
- **k-Means** → distance-based, efficient for convex clusters.  
- **GMM** → probabilistic, soft memberships.  
- **Spectral** → graph-based, captures nonlinear manifolds.  
- **DBSCAN** → density-based, robust to noise/outliers.  

**Deep Learning clustering losses:**
- **DEC** → KL divergence, autoencoder-based clustering.  
- **VaDE** → VAE + GMM, probabilistic latent clustering.  
- **ClusterGAN** → GAN + clustering in latent space.  
- **DeepCluster / SwAV** → self-supervised clustering for representation learning.  

👉 **When to choose:**  
- **Scalable & simple:** k-Means.  
- **Probabilistic clusters:** GMM.  
- **Nonlinear manifolds / graphs:** Spectral.  
- **High-dimensional deep reps:** DEC, DeepCluster.  
- **Foundation SSL:** SwAV, contrastive clustering.  
