## Variational Bayesian methods
Variational Bayesian methods are a family of techniques for approximating intractable integrals arising in Bayesian inference and machine learning.


In variational inference, the posterior distribution over a set of unobserved variables ${\displaystyle \mathbf {Z} =\{Z_{1}\dots Z_{n}\}}$ given some data ${\displaystyle \mathbf {X} }$  is approximated by a so-called variational distribution, ${\displaystyle Q(\mathbf {Z} )}$:

${\displaystyle P(\mathbf {Z} \mid \mathbf {X} )\approx Q(\mathbf {Z} ).}$



The distribution ${\displaystyle Q(\mathbf {Z} )}$ is restricted to belong to a family of distributions of simpler form (e.g. a family of Gaussian distributions) than ${\displaystyle P(\mathbf {Z} \mid \mathbf {X} )}$, selected with the intention of making ${\displaystyle Q(\mathbf {Z} )}$ similar to the true posterior, ${\displaystyle P(\mathbf {Z} \mid \mathbf {X} )}$.

## Bayesian inference
We have an evidence and we would like to know which $H_1, H_2, \dots$ is more probable.

$P(H|E)_{\text{posterior probability}}=\frac{P(E|H)_{ \text{likelihood}} .P(H)_{\text{prior probability}}}{P(E)_{evidence}}$


- $H$:  Any hypothesis whose probability may be affected by data (evidence). Often there are competing hypotheses, and the task is to determine which is the most probable.
- $P(H)$, the **prior** probability, is the estimate of the probability of the hypothesis $H$  before the data $E$.
- $E$, the **evidence**, corresponds to new data that were not used in computing the prior probability.
- $P(H\mid E)$, the **posterior probability**, is the probability of $H$ H given $E$, i.e., after $E$ is observed. 
- $P(E\mid H)$ is the probability of observing $E$ given $H$, and is called the **likelihood**.
- $P(E)$ is sometimes termed the **marginal likelihood** or "model evidence". This factor is the same for all possible hypotheses being considered

## Variable elimination
## Inference in graphical models (Bayes net or a Markov random fields)
### Marginal inference: 
what is the probability of a given variable in our model after we sum everything else out 
$p(y=1) = \sum_{x_1} \sum_{x_2} \cdots \sum_{x_n} p(y=1, x_1, x_2, \dotsc, x_n).$

### Maximum a posteriori (MAP) inference
what is the most likely assignment to the variables in the model (possibly conditioned on evidence)?
$\max_{x_1, \dotsc, x_n} p(y=1, x_1, \dotsc, x_n)$

Refs: [1](https://ermongroup.github.io/cs228-notes/inference/ve/)

## Approximate solutions to the inference problem

- Variational methods: Variational inference methods take their name from the calculus of variations, which deals with optimizing functions that take other functions as arguments., which formulate inference as an optimization problem
- Sampling methods, which produce answers by repeatedly generating random numbers from a distribution of interest.

$E_{x \sim p}[f(x)] = \sum_x f(x) p(x).$

$E_{x \sim p}[f(x)] \approx I_T = \frac{1}{T} \sum_{t=1}^T f(x^t),$

where $x^1, \dotsc, x^T$ are samples drawn according to 

### unbiased estimator