# Bayesian inference recap

Let's have a quick overview of the Bayesian methods targeted at performing full Bayesian inference on hidden latent variables.

Given:

- Data $\pmb{X}$.
- Latent variables $\pmb{Z}$.
- Model parameters $\pmb{\theta}$.

The following methods are presented in ascending order of computation, which at the same time translates into a descending order of accuracy.

## Full inference

Performing full inference is usually prohibitive, even for small models.

$$
P(\pmb{Z}, \pmb{\theta} | \pmb{X})
$$


## EM algorithm

EM algorithm provides a point estimate of the parameters $\pmb{\theta} = \pmb{\theta}_{MAP}$.

As a reminder, we define the variational lowerbound in terms of the marginal likelihood (i.e. likelihood of parameters marginalized over the latent variables), which we want to maximize

$$
\log \ P(\pmb{X} | \pmb{\theta}) \ge \mathcal{L}(\pmb{\theta}, q) = \mathbb{E}_q(Z) \log \ \frac{P(\pmb{X}, Z | \pmb{\theta})}{q(Z)}dZ
$$

### E-step

Maximizes lower bound fixing the parameters, which equals to minimizing the KL divergence between the variational distribution and the posterior:

$$
\pmb{q} = \underset{\pmb{q}}{\mathrm{argmax}} \ \mathcal{L}(\pmb{\theta}\pmb{q}) \longleftrightarrow \underset{\pmb{q}}{\mathrm{argmin}} \ KL[\pmb{q}(\pmb{Z})||P(\pmb{Z}|\pmb{X}, \pmb{\theta})]
$$

### M-step

Maximizes lower bound fixing the variational distribution, maximizing the expected value of the joint distribution under the variational distribution:

$$
\pmb{q} = \underset{\pmb{\theta}}{\mathrm{argmax}} \ \mathcal{L}(\pmb{\theta}\pmb{q}) \longleftrightarrow  \underset{\pmb{\theta}}{\mathrm{argmax}} \ \mathbb{E}_{\pmb{q}(\pmb{Z})} \log \ P(\pmb{X}, \pmb{Z}|\pmb{\theta})
$$

## Variational EM

For many models, in step EM from EM algorithm, it is yet no feasible to compute the posterior. Therefore, we need to approximate using a variational family $\mathcal{Q}$.

### Variational E-step

$$
\pmb{q} = \underset{\pmb{q}}{\mathrm{argmax}} \ \mathcal{L}(\pmb{\theta}\pmb{q}) \longleftrightarrow \underset{\pmb{q} \in \mathcal{Q}}{\mathrm{argmin}} \ KL[\pmb{q}(\pmb{Z})||P(\pmb{Z}|\pmb{X}, \pmb{\theta})]
$$

### Crips EM

When Variational EM cannot even be applied, we can use Crips EM, where both parameters and latent variables are approximated to a point estimate:

$$
\pmb{\theta} = \pmb{\theta}_{MAP} \\
\pmb{Z} = \pmb{Z}_{MAP} \\
$$