# Random Variables and Distribution Simulation

In [None]:
%%html
<link rel="stylesheet" type="text/css" href="../styles/styles.css">

In [None]:
import matplotlib.pyplot as plt
import matplotlib.patches as mpatches
import numpy as np
from scipy import stats
import math

## Distribution Simulation

The objective of this section is to study how different random phenomena can be simulated from the uniform distribution on $[0, 1]$, $U \sim \mathcal{U}_{[0,1]}$​.

To generate a number from the uniform distribution, we can use the function `random.random()` or `random.uniform(a, b)`, where `a`
and `b` are the bounds of the interval on which the random variable is defined.

In [None]:
# pseudo-random real number from uniform distribution on [0, 1) and uniform distribution
from random import random, uniform

In [None]:
# random number between 0 and 1
u1 = random()
print(u1)
u2 = uniform(a=0, b=1)
print(u2)

We will consider 2 cases: discrete and continuous.

### Discrete Distributions

Let $U \sim \mathcal{U}_{[0,1]}$, $X$  be a discrete random variable with values $\{x_1, x_2, ..., x_n\}$ and $\mathbb{P}(X = x_i) = p_i$, $\sum_{i}p_i = 1$.

To simulate the distribution of $X$, we divide the interval $[0,1]$ into sub-intervals such that the length of sub-interval $i$ is $p_i$​:

$$X = \left\{\begin{array}{ll} x_0 & \text{if } U < p_0 \\ 
x_1 & \text{if } p_0 \leq U < p_0 + p_1 \\  
... \\
x_j & \text{if } \sum_{k=0}^{j-1}p_k \leq U < \sum_{k=0}^{j}p_k 
\end{array}\right.$$

In other words:
$$X = x_j \text{ if } F(x_{j-1}) \leq U < F(x_j)$$

where $F(x)$ is the cumulative distribution function of $X$.

$\mathbb{P}(X=x_j) = \mathbb{P}\left(\sum_{k=0}^{j_1}p_k \leq U < \sum_{k=0}^{j}p_k \right) = p_j$


#### Bernoulli Distribution

Let $X$ be a discrete random variable that follows the Bernoulli distribution with parameter $p$, $\mathcal{B}(p)$, e.g. a coin flip. Then, we can present the random variable $X$ as follows:

$$X = \left\{\begin{array}{ll} 1 & \text{if } U < p \\ 
0 & \text{if } U \geq p\end{array}\right.$$

Then:

$$\mathbb{P}(success) = \mathbb{P}(X = 1) = \mathbb{P}(U < p) = p$$

Note that in this case, the interval $[0, 1]$ is divided into 2 parts: one of length $p$ and the other of length $1-p$. The value of $X$ is then defined by the condition in which part the value of $U$ falls.

Write the function `bernoulli(p)` that calculates a value of $X$ that follows the Bernoulli distribution with parameter $p$.

```
def bernoulli(p=0.5):
    """
    Simulation of a Bernoulli with parameter p.
    
    Keyword arguments:
    p -- probability of success. Default, 0.5
    
    Return:
    binary value of Bernoulli distribution realization
    """
```

In [None]:
# ANSWER
def bernoulli(p=0.5):
    """
    Simulation of a Bernoulli with parameter p.
    
    Keyword arguments:
    p -- probability of success. Default, 0.5
    
    Return:
    binary value of Bernoulli distribution realization
    """
    
    return None


#### Coin Toss

Write a simulation of 1000 tosses of a balanced coin and plot the estimated probability (proportion of heads) as a function of the number of tosses (from 0 to 1000). Plot the result. Compare it to the theoretical value.

```
def coin_toss(nsimu=1000, p=0.5):
    """
    Simulation of nsimu Bernoulli realizations (coin toss) with parameter p.
    
    
    Keyword arguments:
    nsimu -- number of simulations. Default, 1000
    p -- probability of success. Default, 0.5
    
    Return:
    the cumulative number of successes starting with 0 over 1000 runs and the proportion of successes
    """
```

To calculate the number of heads on $i$-th tosses, it is possible to calculate the cumulative sum with
[`numpy.cumsum()`](https://numpy.org/doc/stable/reference/generated/numpy.cumsum.html).

In [None]:
## SOLUTION
def coin_toss(nsimu=1000, p=0.5):
    """
    Simulation of nsimu Bernoulli realizations (coin toss) with parameter p.
    
    
    Keyword arguments:
    nsimu -- number of simulations. Default, 1000
    p -- probability of success. Default, 0.5
    
    Return:
    the cumulative number of successes starting with 0 over 1000 runs and the proportion of successes
    """
    
    return None, None

p = 0.5
cumsum_x, avg = coin_toss(p)

In [None]:
## SOLUTION
# visualization


**Conclusions / comments**: <span style="color: red;">YOUR COMMENT HERE</span>

### Continuous Distributions

#### Standard Normal Distribution

Among continuous distributions, we will focus on the case of the standard normal distribution, N($\mathcal{N}(0, 1)$ with density function $\varphi(z) = \frac{e^{-\frac{z^2}{2}}}{\sqrt{2\pi}}$​.

For simulation purposes, we will study the [Box-Muller method](https://en.wikipedia.org/wiki/Box%E2%80%93Muller_transform):
Let $U_1$  and $U_2$  be two independent random variables that follow the uniform distribution. Then a pair of random variables $Z_1$ and $Z_2$
 following the standard normal distribution can be generated according to the following transformation:

$$\left\{ \begin{array}{l} Z_1 = \sqrt{-2 \ln{U_1}} \cos(2\pi U_2) \\ Z_2 = \sqrt{-2 \ln{U_1}} \sin(2\pi U_2) \end{array} \right.$$


Write the function `normal_dist()` that generates 2 random variables that follow the standard normal distribution with the Box-Muller method.

```
def normal_dist():
    """
    Generates two random variables following the standard normal distribution with the Box-Muller method.
    
    Return:
    z1, z2 -- two random variables from the standard normal distribution
    """
```

Some tips:

- To calculate the square root of $x$, we can use the function `math.sqrt(x)`
- To calculate the log of $x$, we can use the function `math.log(x)`
- The number $\pi$ can be obtained as `math.pi`

In [None]:
import math

In [None]:
## SOLUTION
def normal_dist():
    """
    Generates two random variables following the standard normal distribution with the Box-Muller method.
    
    Return:
    z1, z2 -- two random variables from the standard normal distribution
    """
    
    return None, None

Generate 5000 pairs of random variables following the standard normal distribution.

In [None]:
## SOLUTION


Visualize the variables $Z_1$  in the form of a histogram. Plot the curve of the standard normal distribution on top. Comment.

Some tips:

- To create a histogram, we can use the function `plt.hist()`
- For the histogram to be based on frequency, use the option `density=True` of `plt.hist()`, e.g. `plt.hist(z1, density=True)`
- Before plotting the standard normal distribution, you can create values to plot as follows:

In [None]:
from scipy.stats import norm # standard normal distribution
mu = 0
variance = 1
sigma = math.sqrt(variance)
x = np.linspace(mu - 4*sigma, mu + 4*sigma, 100)
y = norm.pdf(x, mu, sigma) # density function with parameters mu and sigma

In [None]:
## SOLUTION
# visualization


**Conclusions / comments**: <span style="color: red;">YOUR COMMENT HERE</span>