# Simulation methods: Monte Carlo

In this notebook, we present a simple example of a Monte Carlo simulation. We will use the `numpy` package to generate random numbers and the `matplotlib` package to plot the results.
We will also use the `seaborn` package to make the plots look nicer and plot confidence intervals.

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

## A first example

We consider $X$ a random variable on $\{0;1;-1\}$ with probabilities:
1. $\mathbb P(X=-1)=\frac{1}{3}$.
2. $\mathbb P(X=0)=\frac{1}{6}$.
3. $\mathbb P(X=1)=\frac{1}{2}$.

We also consider $Y$ another random variable such that $\mathbb P(Y=0)=\frac45$.

### Equaling the expectations
We first want to have $\mathbb E[X]=\mathbb E[Y]$.
We are going to compute $\mathbb P(Y=-1)$ and $\mathbb P(Y=1)$ such that we have this equality.
$$\mathbb E[X]=\mathbb E[Y]\iff \sum_{x\in\{-1;0;1\}}x\mathbb P(X=x)=\sum_{y\in\{-1;0;1\}}y\mathbb P(Y=y)\iff -\frac{1}{3}+\frac{1}{2}=\mathbb P(Y=1)-\mathbb P(Y=-1)$$
Then, we have: $$\mathbb P(Y=1)-\mathbb P(Y=-1)=\frac{1}{6}$$ and $$\mathbb P(Y=1)+\mathbb P(Y=-1)=\frac15$$
Therefore, we have:
$$\mathbb P(Y=1)=\frac{11}{60}\qquad \text{and}\qquad \mathbb P(Y=-1)=\frac{1}{60}$$

### Computing and comparing variances

We now want to compute the variance of $X$ and $Y$ and compare them.
To do so, we are going to use the following formula:
$$\mathbb V[X]=\mathbb E[X^2]-\mathbb E[X]^2$$
We have:
$$\mathbb E[X^2]=\frac{1}{3}+\frac{1}{2}=\frac{5}{6}$$
$$\mathbb E[X]=\frac12-\frac13=\frac{1}{6}$$
We need the same for $Y$:
$$\mathbb E[Y^2]=\frac{1}{60}+\frac{11}{60}=\frac{1}{5}$$
$$\mathbb E[Y]=-\frac{1}{60}+\frac{11}{60}=\frac{1}{6}$$
We can now compute the variances:
$$\mathbb V[X]=\frac{5}{6}-\left(-\frac{1}{6}\right)^2=\frac{29}{36}$$
$$\mathbb V[Y]=\frac{1}{5}-\left(-\frac{10}{60}\right)^2=\frac{31}{180}$$

### Experimental comparison
We now wish to compare the theoretical and experimental expectancies and variances.

In [None]:
def sampleX(n):
    A=np.zeros(n)
    for i in range(n):
        u = np.random.random()
        if u <= 1/3:
            A[i] = -1
        elif u <= 1/2:
            A[i] = 0
        else:
            A[i] = 1
    return A

def sampleY(n):
    A=np.zeros(n)
    for i in range(n):
        u = np.random.random()
        if u <= 1/60:
            A[i] = -1
        elif u <= 1/60 + 4/5:
            A[i] = 0
        else:
            A[i] = 1
    return A

Using the previous functions, we can compute the mean of $X$ and $Y$ and compare them.

In [None]:
N=1000
X=[np.mean(sampleX(i)) for i in range(1,N)]
Y=[np.mean(sampleY(i)) for i in range(1,N)]
plt.figure(figsize=(10,7))
plt.scatter(range(1,N),X,label='X')
plt.scatter(range(1,N),Y,label='Y')
plt.plot(range(1,N),[1/6 for _ in range(1,N)],label='E', color='red')
plt.legend()
plt.show()

We therefore notice that both mean are quite equal, when excluding the first value.

Now, we are going to compare the variances.

In [None]:
plt.figure(figsize=(10,7))
plt.scatter(range(1,N),[np.std(sampleX(i)) ** 2 for i in range(1,N)],label='X')
plt.plot(range(1,N),[29/36 for _ in range(1,N)],label='V[X]', color='red')
plt.scatter(range(1,N),[np.std(sampleY(i)) ** 2 for i in range(1,1000)],label='Y')
plt.plot(range(1,N),[31/180 for _ in range(1,N)],label='V[Y]', color='green')
plt.legend()
plt.show()

These two have the same expected value, but the variance of $X$ is much higher than the variance of $Y$.

### Confidence intervals
Now, we are going to compute the confidence intervals, and compare them.
We will take a threshold of 0.05.

In [None]:
plt.figure(figsize=(10,7))
plt.scatter(range(1,N),X,label='X', color='yellow')
confX=np.array([1.96 * np.std(sampleX(i)) / np.sqrt(i) for i in range(1,N)])
plt.title("Confidence interval for X")
plt.fill_between(range(1,N), X-confX, X+confX, color='b', alpha=.3,label='0.95 confidence interval')
plt.plot(range(1,N),[1/6 for _ in range(1,N)],label='E[X]', color='red')
plt.legend()
plt.show()

plt.figure(figsize=(10,7))
plt.title("Confidence interval for Y")
plt.scatter(range(1,N),Y,label='Y', color='yellow')
confY=np.array([1.96 * np.std(sampleY(i)) / np.sqrt(i) for i in range(1,N)])
plt.fill_between(range(1,N), Y-confY, X+confY, color='b', alpha=.3,label='0.95 confidence interval')
plt.plot(range(1,N),[1/6 for _ in range(1,N)],label='E[Y]', color='red')
plt.legend()
plt.show()

We notice that the confidence intervals are going smaller when $n$ increases.
This can be used to determine any estimation error.

### Estimation error

Using Python, we are going to numerically compute the estimation error and find the value of $n$ such that the estimation error is less than $0.001$.

In [None]:
n=100
threshold = 0.001
sample = sampleX(n)
conf = 1.96 * np.std(sample) / np.sqrt(n)
while conf * 2 > threshold:
    n += 1
    sample = sampleX(n)
    conf = 1.96 * np.std(sample) / np.sqrt(n)
print(f"Value of n for X: {n}")

n=100
sample = sampleY(n)
conf = 1.96 * np.std(sample) / np.sqrt(n)
while conf * 2 > threshold:
    n += 1
    sample = sampleY(n)
    conf = 1.96 * np.std(sample) / np.sqrt(n)
print(f"Value of n for Y: {n}")

## Normal law and Monte-Carlo simulation