# Clustering with Gaussian Mixture Models


###  Limitations of Simple IID Gaussian Models

Sofar, model inference was solved analytically, but we
used strong assumptions
- IID sampling, $p(D) = \prod_n p(x_n)$
- Simple Gaussian (or multinomial) PDFs, $p(x_n) \sim \mathcal{N}(x_n|\mu,\Sigma)$
- Some limitations of Simple Gaussian Models with IID Sampling
  1. What if the PDF is **multi-modal** (or is just not Gaussian in any other way)?
  2. Covariance matrix $\sigma$ has $D(D+1)/2$ parameters.
    - This quickly becomes **a very large number** for increasing dimension $D$.
  3. Temporal signals are often **not IID**.



###  Towards More Flexible Models

-  What if the PDF is multi-modal (or is just not Gaussian in any other way)?
  -   **Discrete latent** variable models (a.k.a. **mixture** models).
    
-  Covariance matrix $\Sigma$ has $D(D+1)/2$ parameters. This quickly becomes very large for increasing dimension $D$.
  -  **Continuous latent** variable models (a.k.a. **dimensionality reduction** models).
    
-  Temporal signals are often not IID.
  -  Introduce **Markov dependencies** and **latent state** variable models.
    





###  What if the Data are Not like This ...
\begin{center}\includegraphics[height=8cm]{./figures/fig-2-class-data}\end{center}


###  ... but like This
\begin{center}\includegraphics[height=8cm]{./figures/fig-unlabeled-data}\end{center}

###  Unobserved Classes

Consider again a set of observed data $D=\{x_1,\dotsc,x_N\}$

- This time we suspect that there are unobserved class labels that would help explain (or predict) the data, e.g.,
  - the observed data are the color of living things; the unobserved classes are animals and plants.
  - observed are wheel sizes; unobserved categories are trucks and personal cars.
  - observed is an audio signal; unobserved classes include speech, music, traffic noise, etc.
    
Classification problems with unobserved classes are called **Clustering** problems. The learning algorithm needs to **discover the classes from the observed data**.


###  Latent Variable Model Specification
 
If the categories were observed as well, these data could be nicely modeled by the previously discussed generative classification framework.

-  Introduce the 1-of-$K$ variable $z = (z_1,\ldots,z_K)^T$ to represent the unobserved classes.
  - NB: our notation is: $Y_k$ for observed targets; $Z_k$ for unobserved outputs.
-  Use completely **equivalent model assumptions to linear generative classification**, (except now the class
    labels $z_k$ are not observed),
    
\begin{align}
p(x_n) &= \sum_{k=1}^K p(z_{nk}) \, p(x_n|z_{nk})  \\
	&= \sum_k \pi_k \mathcal{N}\left(x_n|\mu_k,\Sigma_k \right)
\end{align}

This model is called a **Gaussian Mixture Model**.


###  Gaussian Mixture Models
GMMs are **universal approximators of densities** (as long as there are enough Gaussians of course)

\begin{figure}
\begin{center}
\includegraphics[width=10.5cm]{./figures/fig-ZoubinG-GMM-universal-approximation}
\end{center}
The red curves show the (weighted) Gaussians; the blue curve the resulting density.
\end{figure}
