# The EM Algorithm for Gaussian Mixture Model (GMM)



## Finite Mixture Models

### Definition

​		We are given a data set  $\mathbf{X} = \left\{ { \boldsymbol{x}_1, \boldsymbol{x}_2, \cdots, \boldsymbol{x}_i \cdots, \boldsymbol{x}}_N \right\}$, where $\boldsymbol{x_i}$ is a $D$-dimensional vector measurement. Assume that the points are generated in an IID fashion from probability density function $p(\mathbf{X})$.

​	We further assume that $p(\mathbf{X})$ is defined as a finite mixture model by taking linear combinations of more basic distributions ($K$ components) such as Gaussians：

$$
p (  \mathbf{X} _i  | \boldsymbol{\Theta} ) = \sum _ { k = 1 } ^ { K } \underbrace{\alpha _ { k }}_{p_k (\mathbf{Z}_{ik}=1)} * \underbrace{p _ { k } \left(  \mathbf{X}_i |  \mathbf{Z}_{ik}=1 , \boldsymbol{\theta} _ { k } \right)}_{p _ { k } \left(  \mathbf{X}_i |  \mathbf{Z}_{ik}=1 , \boldsymbol{\theta} _ { k } \right)}
$$

​		where: $i \in [1,N], k \in [1,K]$

+ $p _ { k } \left(  \mathbf{X} |  \mathbf{Z}_{k} , \boldsymbol{\theta} _ { k } \right)$ are mixture components. In general, the components can be any distribution or density function, and need not all have the same functional form.

+ $\mathbf{Z}_{i} = \left\{ \mathbf{Z}_{i1},  \mathbf{Z}_{i2},  \cdots, \mathbf{Z}_{ik}, \cdots, \mathbf{Z}_{iK} \right\}  $ plays the role of indicator random variable representing the identity of the mixture component that generated $X_i$. (i.e., one and only one of the $\mathbf{Z}_{i}$ is equal to 1, and the others are 0).

+ $\alpha _k = p(\mathbf{Z}_{ik} = 1)$ are the mixture weights, representing the probability that a randomly selected $i$ was generated by component $k$, where $\sum_{k=1}^{K} \alpha_k = 1$
+ $\mathbf{\Theta} = \left\{ \alpha_1, \cdots, \alpha_K, \boldsymbol{\theta}_1, \cdots,  \boldsymbol{\theta}_K \right\}$  is a the complete set of a parameters for a mixture model with $K$ components.

### Application

+ One general application is in ***density estimation***: they allow us to build complex models out of simple parts. For example, a mixture of K multivariate Gaussians may have up to $K$ modes, allowing us to model multimodal densities.
+  A second motivation for using mixture models is where there is an ***underlying true categorical variable z***, but we cannot directly measure it: a well-known example is pulling fish from a lake and measuring their weight $\mathbf{X}_i$ ,  where there are known to be K types of fish in the lake but the types were not measured in the data we were given. In this situation the $\mathbf{Z} _i$ correspond to some actual physical quantity that could have been measured but that wasn't.
+ A third motivation, similar to the second, is where we believe their might be K underlying groups in the data, each characterized by different parameters, e.g., K sets of customers which we wish to infer from purchasing data xi. This is often referred to as ***model-based clustering***: there is not necessarily any true underlying interpretation to the z’s, so this tends to be more exploratory in nature than in the second case.

## Gaussian Mixture Models

​		For $\boldsymbol{x}_i \in \mathcal{R}^D$, we definite a Gaussian mixture model by making each of the $K$ components a Gaussian density with parameters $\mu _k$ and $\Sigma _k$. Each component is a multivariate Gaussian density with its own parameters $\boldsymbol{\theta}_k = \left\{  \boldsymbol{\mu}_k, \boldsymbol{\Sigma}_k \right\}$.

$$
\mathcal { N } (\boldsymbol{x}_i | \boldsymbol{\theta}_k ) = \frac { 1 } { ( 2 \pi ) ^ { D / 2 } } \frac { 1 } { | \boldsymbol { \Sigma }_k | ^ { 1 / 2 } } \exp \left\{ - \frac { 1 } { 2 } ( \boldsymbol{x}_i - \boldsymbol { \mu }_k ) ^ { \mathrm { T } } \boldsymbol { \Sigma }_k ^ { - 1 } ( \boldsymbol{x}_i - \boldsymbol { \mu }_k ) \right\}
$$

​	where $\boldsymbol { \mu }$ is a $D$-dimensional mean vector, $\boldsymbol{\Sigma}$ is a $D \times D$ covariance matrix, and $|\boldsymbol{\Sigma}|$
denotes the determinant of $\boldsymbol{\Sigma}$ . 

​	 We therefore consider the superposition of  $K$ Gaussian densities of the form

$$
\begin{align} p_k ( \boldsymbol { x }_i |\boldsymbol{\theta}) &= \sum _ { k = 1 } ^ { K } \boldsymbol{\pi} _ { k } \mathcal { N } \left( \boldsymbol { x }_i | \boldsymbol { \mu } _ { k } , \mathbf { \Sigma } _ { k } \right)\\
 p ( \mathbf { X } |\boldsymbol{\theta}) &= \prod _ { i = 1 } ^ { N }   \sum _ { k = 1 } ^ { K } \left[ \boldsymbol{\pi} _ { k } \mathcal { N } \left( \boldsymbol { x }_i | \boldsymbol { \mu } _ { k } , \mathbf { \Sigma } _ { k } \right) \right]
\end{align}
$$

​		where, $\mathbf{X} = \left\{ { \boldsymbol{x}_1,  \ldots, \boldsymbol{x}}_N \right\}, \boldsymbol { \pi } = \left\{ \boldsymbol{\pi} _ { 1 } , \ldots , \boldsymbol{\pi} _ { K } \right\} , \boldsymbol { \mu } = \left\{ \boldsymbol { \mu } _ { 1 } , \ldots , \boldsymbol { \mu } _ { K } \right\}$ and $
\boldsymbol{\Sigma} \equiv \left\{\boldsymbol{ \Sigma} _ { 1 } , \dots \boldsymbol{\Sigma} _ { K } \right\}$, $\boldsymbol{\theta} = \left\{ \boldsymbol{\pi}, \boldsymbol{\mu}, \boldsymbol{\Sigma} \right\} $. $i \in [1,N], k \in [1,K]$.  $\sum_{k=1}^{K} \boldsymbol{\pi}_k = 1$

​		We immediately see that the situation is now much more complex than with a single Gaussian, due to the presence of the summation over k inside the logarithm. As a result, <u>the maximum likelihood solution for the parameters no longer has a closed-form analytical solution</u>.



### The E-Step for Mixture Models

we introduce “latent” variable $\mathbf{Z} = \left\{ { \boldsymbol{z}_1,  \ldots, \boldsymbol{z}}_N \right\}, $ each $\boldsymbol{z}_i$ indicates which mixture component $\boldsymbol{x}_i$ belong to. Recall the EM algorithm in Equation (1)， we need to define both $p ( \mathbf { X }, \mathbf { Z } | \boldsymbol { \theta } ) \  $ and $ \ p( \mathbf { Z }| \mathbf { X } , \boldsymbol { \theta ^{(t)}})$,
$$
\begin{align} \\
p _k \left(  \mathbf{Z}_i | \boldsymbol{\theta} \right) & = \boldsymbol{\pi} _ { k } \\
p_k \left( \mathbf{X}_i | \boldsymbol{\theta} \right) &= \sum _{i=1} ^{N} p _k \left( \mathbf{X}_i, \mathbf{Z}_i|\boldsymbol{\theta} \right)\\
& = \sum _{i=1} ^{K} p _k \left( \mathbf{X}_i| \mathbf{Z}_i, \boldsymbol{\theta}\right) p _k \left(  \mathbf{Z}_i | \boldsymbol{\theta} \right)\\
p \left( \mathbf{X}_i| \mathbf{Z}_i, \boldsymbol{\theta} \right) &= \sum _{i=1} ^{N} p _k \left( \mathbf{X}_i| \mathbf{Z}_i, \boldsymbol{\theta} \right)\\
p(x) = \sum_z p(x,z) = \sum p(x|z)p(z)\\
p \left( \mathbf{X}_i, \mathbf{Z}_i|\boldsymbol{\theta} \right) &= \sum _{i=1} ^{K} p _k \left( \mathbf{X}_i, \mathbf{Z}_i|\boldsymbol{\theta} \right)\\
& = \sum _{i=1} ^{K} p _k \left( \mathbf{X}_i| \mathbf{Z}_i, \boldsymbol{\theta}\right) p _k \left(  \mathbf{Z}_i | \boldsymbol{\theta} \right)\\
p \left( \mathbf{X}, \mathbf{Z}| \boldsymbol{\theta}\right) &= \prod _{i=1} ^{N} p \left( \mathbf{X}_i, \mathbf{Z}_i| \boldsymbol{\theta}\right)\\
& = \prod _{i=1} ^{N} \left\{ \sum _{i=1} ^{K}  p _k \left( \mathbf{Z}_i| \mathbf{X}_i, \boldsymbol{\theta} \right)  p _k \left( \mathbf{Z}_i| \mathbf{X}_i, \boldsymbol{\theta} \right) \right\}\\
&= \prod _ { i = 1 } ^ { N }   \sum _ { k = 1 } ^ { K }  \mathcal { N } \left( \boldsymbol { x }_i | \boldsymbol { \mu } _ { k } , \mathbf { \Sigma } _ { k } \right) \boldsymbol{\pi} _ { k }\\
p \left( \mathbf{Z}| \mathbf{X}, \boldsymbol{\theta}\right) & = \prod _{i=1} ^{N}  p \left( \mathbf{Z}_i| \mathbf{X}_i, \boldsymbol{\theta}\right)\\
&= \prod _{i=1} ^{N} \frac   {p \left( \mathbf{X}_i, \mathbf{Z}_i, \boldsymbol{\theta} \right)}  {p \left( \mathbf{X}_i, \boldsymbol{\theta} \right)}   \\
& = \prod _{i=1} ^{N} \frac   {\sum _{i=1} ^{K} p _k \left( \mathbf{X}_i, \mathbf{Z}_i, \boldsymbol{\theta} \right)}  {\sum _{i=1} ^{K} p_k \left( \mathbf{X}_i, \mathbf{Z}_i|\boldsymbol{\theta} \right)}   \\
&= \prod _ { i = 1 } ^ { N }  \frac {\mathcal { N } \left( \boldsymbol { x }_i | \boldsymbol { \mu } _ { k } , \mathbf { \Sigma } _ { k } \right) \boldsymbol{\pi} _ { k }} {\sum _ { k = 1 } ^ { K }  \mathcal { N } \left( \boldsymbol { x }_i | \boldsymbol { \mu } _ { k } , \mathbf { \Sigma } _ { k } \right) \boldsymbol{\pi} _ { k }}\\
\end{align}
$$
