Gaussian Mixture Model
===
_Leo Lu_

Basic Rules
---
Joint and Conditional probability
$$p(A,B)=p(A\mid B)p(B)=p(B\mid A)p(A)$$

Bayes' rule
$$p(A\mid B)=\frac{p(A,B)}{p(B)}$$

If $A_i$s are mutually exclusive events
$$p(B)=\sum_ip(B\mid A_i)p(A_i)$$
$$p(A\mid B)=\frac{p(B\mid A)p(A)}{\sum_ip(B\mid A_i)p(A_i)}$$

Normal Distribution
---
Gaussian pdf (probability density function)
$$p(x\mid\theta)=\frac{1}{\sqrt{2\pi\sigma^2}}exp\left(-\frac{(x-\mu)^2}{2\sigma^2}\right)$$
where $\theta = [\mu, \sigma^2]$

Likelihood Function
---
$x_0,x_1,……,x_{N-1}$ is a set of independent observations from pdf parameterised by $\theta$. Assume $\sigma^2$ known and $\mu$ is the mean of the density. 
$$
\begin{split}
L(X\mid\mu)=L(x_0,x_1,...x_{N-1}\mid\mu)&=\prod^{N-1}_{i=0}p(x_i\mid \mu) \\
&=\frac{1}{(2\pi\sigma^2)^{\frac{N}{2}}}exp(-\frac{1}{2\sigma^2}\sum^{N-1}_{i=0}(x_i-\mu)^2)
\end{split}
$$
$L(X\mid\mu)$ is a function of  $\mu$ and is called likelihood function.

Maximum Likelihood Estimator
---
Given $x_0,x_1,...,x_{N-1}$ and pdf parameterised by $\theta=(\theta_0, \theta_1, ...)^T$.
We form likelihood function $$L(X\mid\theta)=\prod^{N-1}_{i=0}p(x_i\mid\theta)$$
and we can have $$\left(\hat\theta\right)_{MLE}=\arg\max_{\theta}L(X\mid\theta)$$

And the example above can be calculated in the following way
$$
\begin{split}
L(X\mid\mu)&=\frac{1}{(2\pi\sigma^2)^{\frac{N}{2}}}exp(-\frac{1}{2\sigma^2}\sum^{N-1}_{i=0}(x_i-\mu)^2) \\
\log L(X\mid\mu)&=-\frac{N}{2}\log 2\pi\sigma^2-\frac{1}{2\sigma^2}\sum^{N-1}_{i=0}(x_i-\mu)^2 \\
\frac{\partial\log L}{\partial\mu}&=\frac{1}{\sigma^2}\sum^{N-1}_{i=0}(x_i-\mu) \\
\end{split}
$$
Let the $\frac{\partial\log L}{\partial\mu}=0$ to find the maximum value (Think about why?)
$$
\begin{split}
\frac{1}{\sigma^2}\sum^{N-1}_{i=0}(x_i-\mu)&=0 \\
\sum^{N-1}_{i=0}(x_i-\mu)&=0 \\
\sum^{N-1}_{i=0}x_i-N\mu&=0 \\
\left(\hat\mu\right)_{MLE}&=\frac{1}{N}\sum^{N-1}_{i=0}x_i
\end{split}
$$

Bayesian Estimation
---
$\theta$ is assumed random with pdf $p(\theta)$ called prior knowledge.
$$p(\theta\mid x)=\frac{p(x\mid\theta)p(\theta)}{p(x)}\propto p(x\mid\theta)p(\theta)$$

Assume that $\mu\sim N(\gamma,v^2)$, so
$$p(\mu)=\frac{1}{\sqrt{2\pi v^2}}exp\left(-\frac{1}{2v^2}(x-\gamma)^2\right)$$
then we can calculate the estimator in this way
$$
\begin{split}
p(\mu\mid X)&\propto p(X\mid\mu)p(\mu) \\
&\propto exp\left(-\frac{\sum^{N-1}_{i=0}(x_i-\mu)^2}{2\sigma^2}\right)exp\left(-\frac{(\mu-\gamma)^2}{2v^2}\right) \\
&\propto exp\left(\left(\frac{N}{\sigma^2}+\frac{1}{v^2}\right)\mu^2-2\left(\frac{\sum^{N-1}_{i=0}x_i}{\sigma^2}+\frac{\gamma}{v^2}\right)\mu\right) \\
&\propto exp\left(\mu-\frac{\left(\frac{\sum^{N-1}_{i=0}x_i}{\sigma^2}+\frac{\gamma}{v^2}\right)}{\frac{N}{\sigma^2}+\frac{1}{v^2}}\right)^2 \\
&\propto exp\left(\mu-\frac{N\bar{x}v^2+\sigma^2\gamma}{Nv^2+\sigma^2}\right)^2
\end{split}
$$
Let $\frac{\partial\log{p(\mu\mid X)}}{\partial\mu}=0$ to find the maximum value (MAP)
$$
\begin{split}
\frac{\partial\log{p(\mu\mid X)}}{\partial\mu}&=0 \\
\mu-\frac{N\bar{x}v^2+\sigma^2\gamma}{Nv^2+\sigma^2}&=0 \\
\left(\hat{\mu}\right)_{MAP}&=\frac{N\bar{x}v^2+\sigma^2\gamma}{Nv^2+\sigma^2}
\end{split}
$$