### Gaussian Mixture Models

- The k-means clustering model explored in the previous section is simple and relatively easy to understand, but its simplicity leads to practical challenges in its application. 
- In particular, the non-probabilistic nature of k-means and its use of simple distance-from-cluster-center to assign cluster membership leads to poor performance for many real-world situations. 
- In this section we will take a look at `Gaussian mixture models (GMMs)`, which can be viewed as an extension of the ideas behind `k-means`, but can also be a powerful tool for estimation beyond simple clustering.

### So What is a Gaussian Mixture Model (GMM)?

A `Gaussian Mixture Model` is a probabilistic model that assumes your data is generated from a mixture of several Gaussian (normal) distributions with unknown parameters, each corresponding to a cluster. 

Unlike simple clustering methods that assign each data point to a single cluster, GMM incorporates the concept of probability and uncertainty. 

`Example`: A person 150 cm tall might be 80% likely to be a child, 20% likely to be an adult → GMM captures this uncertainty. 

This allows for more flexible cluster shapes as well as soft clustering, where data points can belong to multiple clusters with varying degrees of membership.

### Key Concepts and Terminologies in GMM

#### 1. Gaussian Distribution: 

Also known as the `normal distribution`, characterized by its bell-shaped curve, defined primarily by its mean (center) and variance (width).

#### 2. Mixture Model

- Instead of assuming data comes from one distribution, we assume it comes from a combination (mixture) of multiple Gaussians.
- Each Gaussian corresponds to a cluster.

#### 3. Components

- Each Gaussian in the mixture is called a `component`.
- If you set k=3, your model assumes there are 3 Gaussian components (clusters).

#### 4. Soft Assignment

- Unlike K-Means (hard assignment: one point = one cluster),
- GMM assigns probabilities of belonging to each cluster.
- Example: A point could be 70% cluster A, 30% cluster B.

### 5. Covariance Types

Covariance controls the shape of each Gaussian cluster.

- Spherical → same variance in all directions (circular blobs).
- Diagonal → axis-aligned ellipses.
- Full → ellipses of any orientation.

#### Expectation-Maximization (EM) Algorithm

- The optimization method used to fit GMMs:
- `E-Step (Expectation)`: Calculate probabilities (responsibilities) that each point belongs to each cluster.
- `M-Step (Maximization)`: Update means, covariances, and weights using those probabilities.
- Repeat until convergence.

### How it works

1. `Assumption`: Your data points come from k different Gaussian distributions (clusters).
2. Each cluster has:
    - A `mean` (center of the Gaussian).
    - A `covariance matrix` (shape/spread of the Gaussian).
    - A `mixing weight` (how much of the data belongs to that cluster).
3. GMM tries to learn these parameters using the Expectation-Maximization (EM) algorithm.

### Advantages of GMM Over Traditional Clustering Techniques
- `Flexibility in Cluster Covariance`: GMM allows for clusters to have different shapes and sizes, adapting to the intrinsic distribution of the data rather than assuming all clusters are spherical (as in K-means).
- `Soft Clustering Capabilities`: Unlike hard clustering methods that assign each data point to a single cluster, GMM assigns a probability to each data point for belonging to each of the mixture components, allowing for a more nuanced understanding of data groupings.
- `Modeling Complex Distributions`: GMM can model complex distributions that may be multimodal (having multiple peaks), which is a significant advantage in real-world data analysis where single-mode assumptions (one peak per cluster) are often insufficient.