## Problem 2 [Variance Reduction Methods for Monte Carlo]
Use a total sample budget of $n=1000$ to obtain Monte Carlo estimates and sample MC estimate variances for the definite integrals in 2 dimensions $(x_1, x_2)$:  
$$
(a)\ exp(\sum _{i=1} ^2 5| x_i - 0.5|)\ \ \ for\ x_i\ in\ [0,1] \\ 
(b)\ cos(\pi + \sum _{i=1} ^2 5x_i)\ \ \ for\ x_i\ in\ [-1,1] \\
(c)\ |4x-2| \times |4y-2|\ \ \ for \ x,y \ in \ [0,1]
$$
Implement stratification and importance sampling (separately) in the Monte Carlo estimation procedures using the same sample budget $n=1000$. Compare the 3 different Monte Carlo integral estimates and their sample variances. Discuss the quality of the Monte Carlo estimates from each method.

In [1]:
%matplotlib notebook
import numpy as np
import matplotlib.pyplot as plt
SAMPLE_BUDGET = 1000
SAMPLE_TIME = 100

In [2]:
def rv_gen(my_pdf,maxvalue):
    rv_list = []
    i = 0
    while i < SAMPLE_BUDGET:
        rp = np.array([np.random.rand(),np.random.rand()]).T
        if my_pdf(rp)/2 > np.random.rand():
            i = i + 1
            rv_list.append(rp)
    return rv_list

In [3]:
def my_func1(x):
    if min(x) < 0 or max(x) > 1:
        return 0
    return np.exp(np.sum(5*np.abs(x - 0.5)))

In [4]:
a_strat_list = []
for i in range(SAMPLE_TIME):
    x = np.random.rand(SAMPLE_BUDGET,2)
    y = map(lambda t: my_func1(t),x)
    a_strat_list.append(np.average(y))
print 'The stratification result for (a) is', np.average(a_strat_list)
print 'The sample variance is', np.var(a_strat_list)

The stratification result for (a) is 20.02710338
The sample variance is 0.503294639144


We choose  $f(x_1,x_2) = 2 \left| x_1 - 0.5 \right| +2\left| x_2 - 0.5 \right| $ for the importance sampling of (a).

In [5]:
def my_pdf1(x):
    if min(x) < 0 or max(x) > 1:
        return 0
    return np.sum(np.abs(x-0.5))*2

In [6]:
a_imp_list=[]
for i in range(SAMPLE_TIME):
    x = np.array(rv_gen(my_pdf1,2))
    y = map(lambda t: my_func1(t)/my_pdf1(t),x)
    a_imp_list.append(np.average(y))
print 'The importance sampling result for (a) is', np.average(a_imp_list)
print 'The sample variance is', np.var(a_imp_list)

The importance sampling result for (a) is 20.0375610739
The sample variance is 0.174852739554


In [7]:
def my_func2(x):
    if min(x) < -1 or max(x) > 1:
        return 0
    return np.cos(np.pi+5*np.sum(x))

In [8]:
b_strat_list = []
for i in range(SAMPLE_TIME):
    x = np.random.rand(SAMPLE_BUDGET,2)-0.5
    x = x*2
    y = map(lambda t: my_func2(t),x)
    b_strat_list.append(np.average(y)) 
print 'The stratification result for (b) is', np.average(b_strat_list)
print 'The sample variance is', np.var(b_strat_list)

The stratification result for (b) is -0.0365110301533
The sample variance is 0.000596480410898


We choose $f(x_1,x_2)=\frac{1+0.5cos(10x_1)}{2+\frac{sin(\pi+10)-sin(\pi - 10)}{20}}$ for the importance sampling.

In [9]:
def my_pdf2(x):
    if min(x) < -1 or max(x) > 1:
        return 0
    return (1+0.5*np.cos(np.pi+x[0]))/(2+1/20*(np.sin(np.pi+10)-np.sin(np.pi-10)))

In [10]:
b_imp_list=[]
for i in range(SAMPLE_TIME):
    x = np.array(rv_gen(my_pdf2,1.5/(2+1/20*(np.sin(np.pi+10)-np.sin(np.pi-10)))))
    y = map(lambda t: my_func2(t)/my_pdf2(t), x)
    b_imp_list.append(np.average(y))
print 'The importance sampling result for (b) is', np.average(b_imp_list)
print 'The sample variance is', np.var(b_imp_list)

The importance sampling result for (b) is -0.0687275170642
The sample variance is 0.00596109109111


In [11]:
def my_func3(x):
    if min(x) < 0 or max(x) > 1:
        return 0
    return np.abs(4*x[0] - 2) * np.abs(4*x[1] - 2)

In [12]:
c_strat_list = []
for i in range(SAMPLE_TIME):
    x = np.random.rand(SAMPLE_BUDGET,2)
    y = map(lambda t: my_func3(t),x)
    c_strat_list.append(np.average(y)) 
print 'The stratification result for (c) is', np.average(c_strat_list)
print 'The sample variance is', np.var(c_strat_list)

The stratification result for (c) is 1.00010701419
The sample variance is 0.000783141416052


We choose $f(x_1,x_2) = 4\left| x_1-0.5 \right| $ for the importance sampling.

In [13]:
def my_pdf3(x):
    if min(x) < 0 or max(x) > 1:
        return 0
    return 4 * np.abs(x[0]-0.5)

In [14]:
c_imp_list=[]
for i in range(SAMPLE_TIME):
    x = np.array(rv_gen(my_pdf3,2))
    y = map(lambda t: my_func3(t)/my_pdf3(t),x)
    zx = np.logical_or(x < 0,x > 1)
    zy = np.logical_or(zx[:,0],zx[:,1])
    y = y*np.logical_not(zy)
    c_imp_list.append(np.average(y))
print 'The importance sampling result for (c) is', np.average(c_imp_list)
print 'The sample variance is', np.var(c_imp_list)

The importance sampling result for (c) is 0.998955816599
The sample variance is 0.000329806340685


In [15]:
print '(a) stratification mean:',np.average(a_strat_list),'variance:', np.var(a_strat_list)
print '(a) importance mean:',np.average(a_imp_list),'variance:', np.var(a_imp_list)
print '(b) stratification mean:',np.average(b_strat_list),'variance:', np.var(b_strat_list)
print '(b) importance mean:',np.average(b_imp_list),'variance:', np.var(b_imp_list)
print '(c) stratification mean:',np.average(c_strat_list),'variance:', np.var(c_strat_list)
print '(c) importance mean:',np.average(c_imp_list),'variance:', np.var(c_imp_list)

(a) stratification mean: 20.02710338 variance: 0.503294639144
(a) importance mean: 20.0375610739 variance: 0.174852739554
(b) stratification mean: -0.0365110301533 variance: 0.000596480410898
(b) importance mean: -0.0687275170642 variance: 0.00596109109111
(c) stratification mean: 1.00010701419 variance: 0.000783141416052
(c) importance mean: 0.998955816599 variance: 0.000329806340685


The quality of importance sampling is better than the stratification sampling when we choose the right pdf function that close to $\left| f(x) \right|$. For (a),(c), the functions vary a lot in its own region, and it is easy to find a computation-friendly pdf for importance simpling. For (b), the function varies periodly in small value. In this case, the importance simpling may not as good as stratification. Because it is hard to find an easy pdf to represent the function.