## 1. Gaussian Mixture Models (GMM)

* **What is GMM?**
  A Gaussian Mixture Model is a probabilistic model that assumes the data is generated from a mixture of several Gaussian distributions (normal distributions) with unknown parameters.

* **Why use GMM?**
  Unlike K-Means, which assigns each data point to exactly one cluster (hard clustering), GMM allows for probabilistic (soft) assignment, meaning each point belongs to each cluster with some probability.

* **Mathematical Formulation:**
  Suppose you have $K$ clusters. The probability density function for a data point $x$ is:

  $$
  p(x) = \sum_{k=1}^K \pi_k \cdot \mathcal{N}(x|\mu_k, \Sigma_k)
  $$

  where:

  * $\pi_k$ is the mixing coefficient for cluster $k$, with $\sum_{k=1}^K \pi_k = 1$ and $\pi_k \geq 0$
  * $\mathcal{N}(x|\mu_k, \Sigma_k)$ is the Gaussian distribution with mean $\mu_k$ and covariance matrix $\Sigma_k$

* **Parameters to estimate:**

  * Means $\mu_k$
  * Covariances $\Sigma_k$
  * Mixing weights $\pi_k$

---

## 2. Expectation-Maximization (EM) Algorithm

* **Purpose:**
  EM is an iterative method to find the maximum likelihood estimates of parameters in models with latent variables (hidden data), like GMM where the cluster assignment is hidden.

* **How EM works in GMM?**
  The cluster assignments (which Gaussian generated each point) are unknown latent variables.

* **Steps:**

  **Initialization:**
  Start with initial guesses for $\pi_k$, $\mu_k$, and $\Sigma_k$.

  **E-step (Expectation):**
  Calculate the probability (responsibility) that cluster $k$ generated data point $x_i$:

  $$
  \gamma_{ik} = \frac{\pi_k \mathcal{N}(x_i|\mu_k, \Sigma_k)}{\sum_{j=1}^K \pi_j \mathcal{N}(x_i|\mu_j, \Sigma_j)}
  $$

  This is the "soft" assignment of point $i$ to cluster $k$.

  **M-step (Maximization):**
  Update the parameters using these responsibilities:

  $$
  N_k = \sum_{i=1}^N \gamma_{ik}
  $$

  $$
  \pi_k = \frac{N_k}{N}
  $$

  $$
  \mu_k = \frac{1}{N_k} \sum_{i=1}^N \gamma_{ik} x_i
  $$

  $$
  \Sigma_k = \frac{1}{N_k} \sum_{i=1}^N \gamma_{ik} (x_i - \mu_k)(x_i - \mu_k)^T
  $$

* **Repeat** E and M steps until convergence (parameters stabilize or likelihood stops increasing).

---

## 3. Soft Clustering

* **What is soft clustering?**
  In soft clustering, each data point has a probability of belonging to each cluster instead of being assigned to only one cluster. This contrasts with hard clustering (like K-Means).

* **Soft Clustering in GMM:**
  The responsibilities $\gamma_{ik}$ computed in the E-step are exactly soft assignments — each point belongs to each cluster with some fractional responsibility (probability).

* **Benefits of Soft Clustering:**

  * Models uncertainty about cluster membership.
  * Can handle overlapping clusters better.
  * Provides richer information about the structure of the data.

---

### Summary Table

| Concept                  | Description                                     | Key Idea                                              |
| ------------------------ | ----------------------------------------------- | ----------------------------------------------------- |
| Gaussian Mixture Model   | Data modeled as a mix of Gaussian distributions | Mixture of Gaussians                                  |
| Expectation-Maximization | Iterative algorithm to estimate parameters      | E-step (soft assignments), M-step (parameter updates) |
| Soft Clustering          | Probabilistic cluster assignment                | Membership probabilities instead of hard labels       |
