# Variational inference

We can't compute the posterior distribution for many (interesting) models. We then need to use approximate methods, such as variational inference. Here we want to infer hidden variables.

2 - Unlike MCMC VI is:
- Deterministic.
- Easy to gauge convergence.
- Requires dozens of iterations.

Also it doesn't require conjugacy. Now let:

- x be the observed variables.
- z be the hidden variables.
- $\theta$ be the parameters in the model.

We want the posterior distribution:

$$p(z | x, \theta) = \frac{p(z, x | \theta)}{\int p(z, x | \theta) dz}$$

However $\int p(z, x | \theta) dz$ may be intractable.

## Main idea

Create a variational distribution over the latent variables, i.e. $q(z | v)$ where v is the variational parameters.

Find settings of v s.t. q is close to the posterior $p(z | x, \theta)$, i.e. we want $q(z|x)$ s.t. $KL(q(|x) | p(z|x))$ is small. It can be shown that (using the definition of conditional probability and KL):

$$\log(p(x)) = KL(q(|x) | p(z|x)) + E(log(p(x|z))) - KL(q(z|x) | q(z)) = KL(q(|x) | p(z|x)) + \text{ELBO}$$

where $\text{ELBO} = E(log(p(x|z))) - KL(q(z|x) | q(z))$ and stands for (Evidence Lower Bound). We note that $\log(p(x))$ is independent of $q$, so if we maximize ELBO then we minimize $KL(q(|x) | p(z|x))$ since they are constant w.r.t. q. This is our original goal!

## Mean field VI

One common assumption is that the variational distribution factorizes:

$$p(z_1, ..., z_m) = \Pi_{j=1}^m p(z_j) $$

You may want to group some hidden variables togheter. Does not contain the true posterior distribution because the hidden variables are dependent, however can be a good approximation.

# General blueprint for VI

- Choose a distribution q.
- Derive ELBO.
- Coordinate ascent for each $q_i$.
- Repeat until convergence.