# Variational Bayesian Inference

In this notebook, we will review the variational Bayes process, beginning with a technical introduction to the formalism and derivation, followed by a python implementation. 
The material covered here references Blei et al., 2018, Varitional Inference: A Review for Statisticians and Chappel et al., 2016. The FMRIB Variational Bayes Tutorial. 


## Technical Overview

### Problem Statement 

Similarly to problems addressed by sampling methods, the goal of variational inference (VI) is to approximate parameter distributions from data, specifically in cases where an analytical treatment is intractable. 

Consider the following example (from Blei et al., 2018): 

For latent variabels $\mathbf{z} = z_{1:m}$ and observations $\mathbf{x} = x_{1:n}$, the posterior conditional density is given by : 

$$ p(\mathbf{z} \mid \mathbf{x}) = \frac{p(\mathbf{x} \mid \mathbf{z}) * p(\mathbf{z})}{p(\mathbf{x})} $$

The denominator, ${p(\mathbf{x})}$, whose value is needed to compute the posterior, is calculated by: 

$$ {p(\mathbf{x})} = \int p(\mathbf{z} , \mathbf{x}) dz $$


This integral is often intractable or too computationally expensive to be feasible. For similar reasons (notably the number of latent variables)sampling methods are slow to converge.

VI aims to circumvent the large time complexity by approaching the problem through optimisation. 

The process begins by positing a contrived _approximate_ density, $\mathfrak{D}$ of latent variables $\mathbf{z}$. Then, using this density can find a set of valuers for $q(\mathbf{z}) \in \mathfrak{D}$ whose values maximise the Kullback-Liebler divergence between the approximate density and the true posterior. 

$$ q^{*}(\mathbf{z}) = \mathop{argmin}_{q(\mathbf{z}) \in \mathfrak{D}} \mathrm{KL}(q(\mathbf{z}) \mid \mid p(\mathbf{z} \mid \mathbf{x})) $$


### VI Computation 

The ELBO


$$\mathop{argmin}_{q(\mathbf{z}) \in \mathfrak{D}}$$