# 📘 Unsupervised Machine Learning Algorithms with Theory & Math

Unsupervised learning is a type of machine learning where **no labeled output is provided**. The model explores the structure of the data to identify patterns, groupings, or dimensions.

---

## 📌 Table of Contents

1. [K-Means Clustering](#1-k-means-clustering)
2. [Hierarchical Clustering](#2-hierarchical-clustering)
3. [Principal Component Analysis (PCA)](#3-principal-component-analysis-pca)
4. [DBSCAN (Density-Based Spatial Clustering)](#4-dbscan)
5. [t-SNE (t-Distributed Stochastic Neighbor Embedding)](#5-t-sne)
6. [Autoencoders (Neural Nets)](#6-autoencoders)
7. [Gaussian Mixture Models (GMM)](#7-gaussian-mixture-models)

---

## 1. K-Means Clustering

### 📌 Definition:
K-Means is a **centroid-based clustering algorithm** that partitions data into **K clusters**, minimizing the variance within each cluster.

### 📖 Theoretical Intuition:
- Iteratively assigns data points to the nearest cluster centroid.
- Recomputes centroids until convergence.

### 📐 Objective Function:
\[
J = \sum_{i=1}^{k} \sum_{x \in C_i} \|x - \mu_i\|^2
\]

Where:
- \( C_i \): Cluster i
- \( \mu_i \): Centroid of cluster i

### ✅ Use Cases:
- Customer segmentation
- Image compression
- Document clustering

---

## 2. Hierarchical Clustering

### 📌 Definition:
Builds a hierarchy of clusters using a **bottom-up (agglomerative)** or **top-down (divisive)** approach.

### 📖 Theoretical Intuition:
- Agglomerative: Each data point starts as its own cluster and merges iteratively.
- Dendrograms are used to visualize the cluster hierarchy.

### 📐 Linkage Criteria:
- **Single linkage**: Minimum distance
- **Complete linkage**: Maximum distance
- **Average linkage**: Average distance

### ✅ Use Cases:
- Gene expression analysis
- Social network analysis

---

## 3. Principal Component Analysis (PCA)

### 📌 Definition:
PCA is a **dimensionality reduction** technique that transforms data into a new set of axes (**principal components**) that maximize variance.

### 📖 Theoretical Intuition:
- Orthogonal transformation
- First principal component accounts for the most variance

### 📐 Math Formulation:
- Given covariance matrix \( \Sigma \), compute eigenvalues and eigenvectors
- Transform:
\[
Z = X \cdot W
\]
Where:
- \( W \): matrix of top k eigenvectors

### ✅ Use Cases:
- Data compression
- Noise reduction
- Visualization (2D or 3D projection)

---

## 4. DBSCAN

### 📌 Definition:
DBSCAN (Density-Based Spatial Clustering of Applications with Noise) groups together points that are **close to each other** (dense regions) and labels others as noise.

### 📖 Theoretical Intuition:
- Doesn’t require specifying the number of clusters.
- Detects arbitrary-shaped clusters and noise.

### 📐 Parameters:
- \( \varepsilon \): Radius
- \( \text{MinPts} \): Minimum number of points to form a dense region

### ✅ Use Cases:
- Outlier detection
- Spatial data (e.g., geolocation)

---

## 5. t-SNE

### 📌 Definition:
t-SNE (t-distributed Stochastic Neighbor Embedding) is a **non-linear dimensionality reduction** technique that is primarily used for **visualizing high-dimensional data**.

### 📖 Theoretical Intuition:
- Converts high-dimensional Euclidean distances into conditional probabilities.
- Minimizes Kullback-Leibler divergence between two distributions.

### 📐 Optimization Goal:
\[
\text{KL}(P \| Q) = \sum_{i \neq j} p_{ij} \log \frac{p_{ij}}{q_{ij}}
\]

Where:
- \( p_{ij} \): similarity in high dimension
- \( q_{ij} \): similarity in low dimension

### ✅ Use Cases:
- Data visualization (e.g., MNIST digits)
- Exploring data embeddings

---

## 6. Autoencoders

### 📌 Definition:
Autoencoders are **neural networks** that learn to compress and then reconstruct data, often used for **dimensionality reduction and anomaly detection**.

### 📖 Theoretical Intuition:
- Encoder compresses input to latent space
- Decoder reconstructs the input from compressed code

### 📐 Loss Function:
\[
L = \|X - \hat{X}\|^2
\]

Where:
- \( X \): Original input
- \( \hat{X} \): Reconstructed input

### ✅ Use Cases:
- Denoising
- Anomaly detection
- Representation learning

---

## 7. Gaussian Mixture Models (GMM)

### 📌 Definition:
GMM is a **probabilistic model** assuming all data points are generated from a mixture of several Gaussian distributions.

### 📖 Theoretical Intuition:
- Uses the **Expectation-Maximization (EM)** algorithm to estimate parameters.

### 📐 Probability Density:
\[
P(x) = \sum_{k=1}^{K} \pi_k \cdot \mathcal{N}(x | \mu_k, \Sigma_k)
\]

Where:
- \( \pi_k \): Mixing coefficient
- \( \mu_k, \Sigma_k \): Mean and covariance of each component

### ✅ Use Cases:
- Speaker recognition
- Clustering with soft boundaries

---

## 📌 Summary Table

| Algorithm         | Purpose                | Key Concept                     | Mathematical Tool                   |
|-------------------|------------------------|----------------------------------|--------------------------------------|
| K-Means           | Clustering             | Minimize intra-cluster distance | Euclidean distance                  |
| Hierarchical Clustering | Clustering       | Cluster tree structure          | Linkage + Dendrograms               |
| PCA               | Dimensionality Reduction | Maximize variance              | Eigen decomposition of covariance   |
| DBSCAN            | Clustering + Outlier Detection | Density-based clustering     | Epsilon + MinPts                     |
| t-SNE             | Visualization          | Preserve local similarity       | KL Divergence                        |
| Autoencoders      | Dimensionality Reduction | Reconstruction learning       | Neural Networks + MSE Loss          |
| GMM               | Soft Clustering        | Mixture of Gaussians            | EM Algorithm + Probability Estimation|

---

