<a href="https://colab.research.google.com/github/anthonyhu25/Variance-Reduction-Metropolis/blob/main/Variance_Reduction_for_Metropolis_Hastings_Example_3_1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
import numpy as np
from numpy import random
from numpy import linalg
import math
import scipy
import scipy.stats
import matplotlib.pyplot as plt
from scipy.stats import rv_continuous, rv_discrete
from scipy.stats._distn_infrastructure import rv_frozen
from scipy.special import logsumexp
import warnings
import sys
import statistics
import pandas as pd
from IPython.display import display, Math, HTML

This notebook, as well as the code in the other notebooks in this directory, will come from [this paper](https://arxiv.org/pdf/2203.02268).

# Example 3.1: Simulated Data Example: Gaussian Target


There are a couple of things I must note about the setup to this problem:
1. The coefficient-less estimator of $F: \mu_{n,G}(F):= \frac{1}{n}\sum_{i=0}^{n-1}[F(x_{i}) + \int Œ±(x_{i}, y)(G(y) - G(x))q(y|x_{i})dy]$ needs a specified function $G(x)$ and also analytically evaluate the integral inside the estimator.

To first get an estimate $G$ to approximate $F$ (which can be estimated by expectation of $F$ with respect to target distribution $\pi(F)$), we need a Gaussian approximation of $\pi(x)$ first. We hope that $F_{\pi ÃÉ}$ is a good approximation of the ideal function $F ^{ ÃÉ}$, which is also an estimate of $F$. For this estimation, we set $G$

To estimate the integral $‚à´ \alpha(x_{i}, y)(G(y) - G(x))q(y|x_{i})dy$, we use Monte-Carlo estimates $Œ±(x_{i}, y_{i})(G(y_{i}) - G(x_{i})), y_{i} \sim q(y|x_{i})$. To further reduce the variance of this estimator (since the Monte-Carlo estimates of the integral can have a high variance) we add in control variate $h(x_{i}, y)$ and $E_{q(y|x_{i})}(h(x_{i},y))$. Note that these terms $E_{q(y|x_{i})}(h(x_{i},y))$ and $h(x_{i},y)$ are static control variates, and also depends on the Gaussian approximation of $\pi(x)$.

So, to estimate the coefficient-less estimator of $F$ above, we use Monte-Carlo methods and use:

$\mu_{n, G}(F) := \frac{1}{n}\sum_{i=0}^{n-1}[F(x_{i}) + \alpha(x_{i}, y_{i})(G(y_{i}) - G(x_{i})) + h(x_{i}, y_{i}) - E_{q(y|x_{i})}[h(x_{i}, y)]]$

2. To obtain our static control variate $h(x_{i}, y)$, we first need Gaussian approximations of our target $\pi(x)$ and proposal $q(y|x)$ - let us name them $\pi^{ÃÉ}(x)$ and $q^{ÃÉ}(y|x)$ respectively - and the function $G(x)$. Then, we set $h(x,y)$ to be the product of the Metropolis-Hastings acceptance ratio between $\pi^{ÃÉ}(x)$ and $q^{ÃÉ}(y|x)$, and the difference between $G(y)$ and $G(x)$. Formally,

$h(x,y) = min(1, r^{ÃÉ}(x,y))[G(y)-G(x)]$

where $r^{ÃÉ}(x,y) = \frac{\pi^{ÃÉ}(y)q^{ÃÉ}(x|y)}{\pi^{ÃÉ}(x)q^{ÃÉ}(y|x)}$

We hope that the acceptance ratio of the Gaussian approximations also approximates the true acceptance ratio between the proposal and the density distributions.

Back to the beginning...

The paper begins by assuming a Markov transition kernel $P$ invariant to a target $\pi$ (if the Markov Chain transition kernel is defined as $P$ and if the current state is distributed as some distribution $\pi$, then after one step the current state is still distributed as $\pi$), a function $G(x)$, and conditional next-step expectation of $G(x)$ with respect to transition kernel $P$ as $PG(x)$, given current state $x$.

We can represent the conditional expectation $PG(x)$ as:

$PG(x) := \int P(x, dy)G(y) = G(x) + \int \alpha(x,y)(G(y) - G(x))q(y|x)dy$

where $\alpha(x,y)  = min(1, r(x,y))$, and $r(x,y) := \frac{\pi(y)q(x|y)}{\pi(x)q(y|x)}$

Suppose we have $n$ correlated samples from target density $\pi$. The estimator $\mu_{n,G}$ is unbiased:

$\mu_{n,G}(F) = \frac{1}{n}\sum_{i=1}^{n}[F(x_{i}) + PG(x_{i}) - G(x_{i})]$

We substitute $PG(x)$ into $\mu_{n,G}(F)$ and obtain:

$\mu_{n,G}(F) = \frac{1}{n}\sum_{i=1}^{n}[F(x_{i}) + \int \alpha(x_{i},y)(G(y)-G(x))q(y|x_{i})dy]$

Then, we approximate the integral $\int \alpha(x_{i},y)(G(y)-G(x))q(y|x_{i})dy$ using a single-sample Monte-Carlo estimate $\alpha(x_{i},y_{i})(G(y_{i}) - G(x_{i})), y_{i} \sim q(y|x_{i})$.

Also, we seek to reduce the variance of the unbiased estimator $\alpha(x_{i},y_{i})(G(y_{i}) - G(x_{i}))$ by adding in a static control variate terms $h(x_{i}, y_{i})$ and $ùîº_{q(y|x_{i})}[h(x_{i},y)]$, which both depends on the Gaussian approximation $\pi^{ÃÉ}(x) = N(x|\mu, \Sigma)$ of the target distribution $\pi(x)$.

So, the final estimator $\mu_{n,G}(F)$ becomes:

$\mu_{n,G}(F) = \frac{1}{n}\sum_{i=1}^{n}[F(x_{i}) + \alpha(x_{i},y_{i})(G(y_{i})-G(x_{i})) + h(x_{i}, y_{i}) - ùîº_{q(y|x_{i})}{h(x_{i},y)}]$

## How to construct the static control variates

So we need to construct $h(x_{i}, y_{i})$ and $ùîº_{q(y|x_{i})}{h(x_{i},y)}$ from the Gaussian approximation of the target density $\pi^{ÃÉ}(x) \sim N(x|\mu \Sigma)$. Note that $h(x_{i}, y_{i})$ is similar to $Œ±(x_{i}, y_{i})$ in that it is the acceptance ratio between the target $œÄ^{ÃÉ}(x)$ and corresponding proposal $q^{ÃÉ}(x)$ multiplied by the difference in function G. Formally, $h(x,y) = min(1, r^{ÃÉ}(x,y))[G(y)-G(x)]$.

To construct our other static control variate $ùîº_{q(y|x)}{h(x,y)}$, we note that we can reuse the construction of the original $PG(x)$ [here](https://arxiv.org/pdf/2203.02268#page=5)...

$ùîº_{q(y|x)}{h(x,y)} = \int h(x,y)q(y|x)dy = \int min(1, r^{ÃÉ}(x,y))[G(y)-G(x)]q(y|x)dy$

Note that we stated that $G(x) = G_{0}(L^{-1}(x-\mu))$, so we substitute this identity back into the above equation. In essense, this is performing the "change of variables" transformation when going from the target/proposal to the Gaussian approximation of the target/proposal.

$ = \int min(1, r^{ÃÉ}(x,y))[G_{0}(L^{-1}(y-\mu))-G_{0}(L^{-1}(x-\mu))]q(y|x)dy$

The target distribution of this example is a d-variate standard Gaussian distribution $N(0_{d}, I_{d})$ with a proposal distribution $q(y|x) ‚àº N(y|x, c^{2}I_{d})$ where $c^{2} = 2.38^{2}/d$ for the Random Metropolis Walk case. We are interested in estimating the expected value of the first coordinate of the target, so $F(x) = x^{(1)}$

To begin, we need to construct our function $G(x)$ and its conditional expectation $PG(x) = ùîº_{x}(G(X_{1})) = ùîº_{x}[G(X_{1})|X_{0} = x]$. Note that we are given a $G_{0}(x)$ function, and transform it back to $G(x) = G_{0}(L^{-1}(x - \mu))$, where $L$ is the Cholesky factor for the Gaussian approximation of the target distribution $œÄ(x)$, and $\mu$ is the mean of the Gaussian approximation of the target $\pi(x)$ -- we call this approximation $\pi^{ÃÉ}(x) \sim N(x|\mu, \Sigma)$.

Since the target $N(0_{d}, I_{d})$ is already a standard Gaussian distribution, the $G(x)$ in this problem equals $G_{0}(x)$, which is defined as:

$G_{0}(x) = b_{0}(e^{b_{1}x^{(j)}} - e^{-b_{1}x^{(j)}}) * e^{-b_{2}||x||^{2}} +
c_{0}(e^{-c_{1}(x^{(j)} - c_{2})^{2}} - e^{-c_{1}(x^{(j)} + c_{2})^{2}}) * e^{-c_{1} \sum_{j^{`} \neq j }(x^{(j^{`})})^{2}}$

Note that $j$ is the coordinate we are trying to estimate from $F(x) = x^{(j)}$, so in this case $j$ equals 1. Also, $b_{0}, b_{1}, b_{2}, c_{0}, c_{1}, c_{2}$ are parameters used for the closed-form approximation of $\alpha_{g}(x)$

In [None]:
def G_0_x(dict_params, x):



In [None]:
# Dictionary of parameters
## Given in example for RWM

rwm_params = dict(
    b0=8.7078,
    b1=0.2916,
    b2=0.0001,
    c0=-3.5619,
    c1=0.1131,
    c2=3.9162
)
