In [2]:
import numpy as np

# Importance Sampling 


The expectation value of a function $H({\bf x})$ is defined as

$$
\begin{align}
L = \int d{\bf x} H({\bf x}) f({\bf x; u}) \equiv \langle H({\bf x})\rangle_{f},
\end{align}
$$

where $f({\bf x})$ is the probability distribution. The simple trick for importance sampling is the re-write of the previous expectation value as 
\begin{align}
L &= \int d{\bf x} H({\bf x}) \frac{f({\bf x; u})}{g({\bf x})} g({\bf x}) \\
  &=  \int d{\bf x} H({\bf x}) w(x) g({\bf x}) \\
  &\equiv \langle H({\bf x})\rangle_{g}.
\end{align}

The variance of this estimate is given by the integral
\begin{align}
\sigma^2_L = \int d{\bf x} \left( w(x)H(x)-\mu \right)^2 g(x),
\end{align}
where $\mu$ is the expectation value of the function $H(x)$. Minimizing the variance of this expression with respect to $w(x)$, we get

\begin{align}
\frac{d\sigma^2_L}{dw} = 0 &= \int d{\bf x} \left( w(x)H(x)-\mu \right)H(x)\delta(x-y)  g(x) \\
&\Rightarrow \\
w(y) &= \frac{\mu}{H(y)}
\end{align}

Re-writing $w(y)$ in terms of $f$ and $g$ gives,

\begin{align}
g^*({\bf y}) = \frac{H({\bf y})f({\bf y; u})}{\mu}.
\end{align}


This last expression for $g^*$ represents the optimal choice of the importance sampling function $g$. In otherwords, if we want to find the best function $g({\bf y})$ to carry out importance sampling, we must choose $g=g^*$ given above.


# Cross Entropy Estimation


Let us return to the problem of estimating the following integral

\begin{align}
L = \int d{\bf x} H({\bf x}) f({\bf x; u}) \equiv \langle H({\bf x})\rangle_{f}
\end{align}

we know that the best importance sampling function to use is

\begin{align}
g^*(x) = \frac{H(x)f(x;u)}{\mu}.
\end{align}

Instead of using this function directly with the unknown parameter $\mu$, we choose instead to approximate the function $g^*(x)$ using our parametrized function f({\bf x;v}). This can be carried out using the Kullback-Leibler divergence

\begin{align}
D(g^*,f(x,v)) = \int dx \ g^*(x) {\rm ln}(g^*(x)) - \int dx \ g^*(x) {\rm ln}(f(x,v)).
\end{align}

The above expression vanishes when the two functions $g^*$ and $f$ are qual. To minimize the above expression in terms of the parameter ${\bf v}$, we must maximize the expression
\begin{align}
D({\bf v}) &= \int dx \ g^*({\bf x}) {\rm ln}(f({\bf x; v})),\\
 &= \frac{1}{\mu}\int dx \ H({\bf x}) f({\bf x; u}) {\rm ln}(f({\bf x; v})),\\
  &= \frac{1}{\mu}\int dx \ H({\bf x}) \frac{f({\bf x; u})}{f({\bf x,w})} {\rm ln}(f({\bf x; v})) f({\bf x,w}),\\
  &= \frac{1}{\mu}\int dx \ H({\bf x}) w({\bf x;u,w}) {\rm ln}(f({\bf x; v})) f({\bf x,w}),
\end{align}


## Using Cross Entropy for Optimization

Having described in


#### References 