# Gaussian Mixture models (GMM) Implementation from Scratch

GMM is a probabilistic model that assumes all the data points are generated from a mixture of a finite number of Gaussian distributions with unknown parameters. It is used for clustering (soft clustering) and density estimation.

## Expectation-Maximization (EM) Algorithm
1. **Initialization**: Initialize means $\mu_k$, covariances $\Sigma_k$, and mixing weights $\pi_k$.
2. **Expectation (E-step)**: Calculate the responsibility $\gamma_{nk}$ that component $k$ has for data point $x_n$.
3. **Maximization (M-step)**: Re-estimate the parameters $\mu_k, \Sigma_k$, and $\pi_k$ using the current responsibilities.
4. **Convergence Check**: Repeat until the parameters or log-likelihood stabilize.

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from gmm_scratch import GMM
from sklearn.datasets import make_blobs
%matplotlib inline

## Generating Multi-modal Data

In [None]:
X, y = make_blobs(n_samples=400, centers=4, cluster_std=0.60, random_state=0)
X = X[:, ::-1] # flip axes for better visualization

plt.figure(figsize=(8, 6))
plt.scatter(X[:, 0], X[:, 1], s=40)
plt.title("Input Data")
plt.show()

## Fitting GMM

Soft clustering allows data points to belong to multiple clusters with varying degrees of responsibility.

In [None]:
gmm = GMM(n_components=4)
gmm.fit(X)
labels = gmm.predict(X)

plt.figure(figsize=(10, 8))
plt.scatter(X[:, 0], X[:, 1], c=labels, s=40, cmap='viridis', zorder=2)
plt.title("GMM Soft Clustering")

# Visually indicate centroids
plt.scatter(gmm.means[:, 0], gmm.means[:, 1], c='red', marker='X', s=200, label='Means')
plt.legend()
plt.show()