<a href="https://colab.research.google.com/github/alekriley/alekriley.github.io/blob/master/importance_sampling.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Primer on Basic MonteCarlo Method
We can approximate the function of a random variable using an average of the function computed over all samples.
> $ E[f(x)]\approx\frac{1}{M}\sum_{m=1}^M f(x^{(m)})$ for samples $x^{(1)},x^{(2)},\dots,x^{(M)}$

In the same way we can approximate the variance of $f(x)$.
>  $ Var[f(x)]\approx\frac{1}{M-1}\sum_{m=1}^M [f(x^{(m)})-u_f]^2$ for samples $x^{(1)},x^{(2)},\dots,x^{(M)}$ with $u_f = \hat E[f(x)]$.

# The Idea of Importance Sampling

Consider the expectation of any function of random variable.

> $$E[f(x)]=\int_x f(x)p(x)dx$$ 

where $p(x)$ is the probability density function corresponding to the random variable x. 
That is, we are taking the expectation with respect to $p(x)$. Notice however that we can easily do the following.
> $$E[f(x)]=\int_x q(x)f(x)\frac{p(x)}{q(x)}dx=E_{q(x)}\left[f(x)\frac{p(x)}{q(x)}\right]$$

We can express the expectation of $f(x)$ with respect to a probability distribution $q(x)$. Typically we refer to $\omega(x^{(i)})=\frac{p(x^{(i)})}{q(x^{(i)})}$ as _importance weights_.

We might choose to do this for a few reasons:
> 1. $q(x)$ is easier to sample from.
> 2. We only know $p(x)$ up to a normalizing constant.
> 3. To reduce the variance of a Monte Carlo Estimate.

The primary focus of the rest of the notebook is on (3) ***variance reduction*** and specifically how _Importance Sampling_ helps us achieve this goal.

Consider a theoretical example where 
> $$q(x) = \frac{f(x)p(x)}{\int_{x} f(x')p(x')dx'}$$ 

and $f(x)\geq0$, then it is clear that the expectation of $f(x)\frac{p(x)}{q(x)}$ is constant with respect to $q(x)$ and thus has ***zero variance***.

While it is clearly impossible to construct such a distribution in practice it nonetheless helps to show that by choosing a _good importance distribution_ $q(x)$ we can reduce the variance of our Monte Carlo Estimate.

At this point it helps to think about what properties make a good _importance distribution_.
> 1. It should be easy to sample from.
> 2. Wherever $p(x)$ is non-zero $q(x)$ should be non-zero.
> 3. It should be easy to compute $q(x)$ for all values of $x$.
> 4. The closer it is to being proportional to $|f(x)|$ the better.

A point to watch out for is that the tails of the distribution matter. If $q(x)$ approaches zero significantly faster than $p(x)$ than the importance weights will be very high (in theory tend towards infinity) and the variance of your estimator will most likely increase. 
