# First Lab Session: introduction to Monte Carlo methods

UE Computational Statistics - Aix-Marseille Université / Faculté des Sciences

Author: me

**Warning**
Remember to put your name in the header of the document.


Monte Carlo methods are a class of computational algorithms that rely on repeated random sampling to obtain numerical results. They are often used in statistics, numerical analysis, and mathematical finance. In this lab session, we will introduce the basics of Monte Carlo commands using Python, and in particular the `scipy.stats` module.

In [1]:
import numpy as np
import scipy
import scipy.stats as stats
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

## Part 1: Getting started with `scipy.stats`

First, you can check the version of `scipy` installed on your computer:

In [2]:
print(scipy.__version__)

1.14.1


You can then look at the documentation of the `scipy.stats` module: [https://docs.scipy.org/doc/scipy-1.15.0/reference/stats.html](https://docs.scipy.org/doc/scipy-1.15.0/reference/stats.html) and choose your version in the top right corner of the webpage.

This module provides a large number of probability distributions, which can be used to generate random numbers. For example, to generate 10000 random numbers from a normal distribution with mean 0 and standard deviation 1, you can use the following code:

In [3]:
data = stats.norm.rvs(size=10000, loc=0, scale=1)
print(f"Mean: {np.mean(data):.3f}, Standard deviation: {np.std(data):.3f}") 

Mean: 0.008, Standard deviation: 0.987


Each probability distribution in `scipy.stats` is a Python that inherits from the `rv_continuous` or `rv_discrete` class. You can check the list of available distributions in the documentation.

### 1.1 Continuous distributions

Find, in the documentation of `rv_continuous` which methods return

- the probability density function,
- the cumulative distribution function,
- the quantile function,
- random numbers drawn from this distribution.

In [22]:
# Answer

All these methods are available in subclasses of `rv_continuous`. Which classes are used to represent the normal, exponential, Student, and chi-squared distributions?

In [23]:
# Answer

**Exercise:**

1. Compute the probability density function and the cumulative distribution function of the normal distribution with mean 0 and variance 4 at 0, 1, and 2.
2. Compute the quantile of order $0.95$ of the normal distribution with mean 0 and variance 4.
3. Generate 10000 random numbers from the normal distribution with mean 0 and variance 4 and plot the histogram of the data.
4. Same three questions with the exponential distribution with rate $\lambda=2$.

In [24]:
# Answer

### 1.2 Discrete distributions

The `rv_discrete` class is used to represent discrete distributions. Which classes are used to represent the Poisson, binomial, and geometric distributions?

In [25]:
# Answer

Find, in the documentation of `rv_discrete` which methods return

- the probability mass function,
- the cumulative distribution function,
- the quantile function,
- random numbers drawn from this distribution.

In [26]:
# Answer

**Exercise:**

1. Compute the probability mass function and the cumulative distribution function of the Poisson distribution with rate $\lambda=2$ at 0, 1, and 2.
2. Compute the quantile of order $0.95$ of the Poisson distribution with rate $\lambda=2$.
3. Generate 10000 random numbers from the Poisson distribution with rate $\lambda=2$ and show the barplot of the data.
4. Same three questions with the binomial distribution with parameters $n=10$ and $p=0.25$.

In [27]:
# Answer

## Part 2: Law of large numbers and central limit theorem

We recall the following results:


**Theorem (Law of Large Numbers)**  
Let $(X_n)_{n\geq 1}$ be a sequence of i.i.d. random variables from a distribution $F$. Assume that, $\mathbb{E}|X_1|<\infty$. Then, the sample mean $\bar{X}_n=\frac{1}{n}\sum_{i=1}^n X_i$ converges almost surely to the expectation of $X_1$, i.e., 
$$
\lim_{n\to\infty}\bar{X}_n = \mathbb{E}[X_1] \quad \text{a.s.}
$$



**Theorem (Central Limit Theorem)**  
Let $(X_n)_{n\geq 1}$ be a sequence of i.i.d. random variables from a distribution $F$. Assume that $\mathbb{E}(X_1^2)<\infty$ and denote the mean by $\mu$ and the variance by $\sigma^2$. Then, the sample mean $\bar{X}_n=\frac{1}{n}\sum_{i=1}^n X_i$ converges in distribution up to a rescaling, or, more precisely,
$$
\sqrt{\frac{n}{\sigma^2}}(\bar{X}_n-\mu) \xrightarrow{d} \mathcal{N}(0, 1).
$$


### 2.1 Examples and counterexamples of the LLN

**Exercise:** Set $N=10^5$.

1. Generate $N$ random numbers from a normal distribution with mean 1 and variance 2. 
2. Save the sample mean for $n=1,\ldots,N$ into a numpy array.
3. Plot the sample mean as a function of $n$ in black, for $n=10001,\ldots,N$ and add an red, horizontal bar at the theoretical mean.
4. Do the same with a Cauchy distribution.
5. What is the difference between the two plots?


In [28]:
# Answer

### 2.2 Examples and counterexamples of the CLT

**Exercise:** Set $N=10^3$.

1. Generate $N$ sample means from a sample of size $n=200$ of a uniform distribution on $[0,1]$.
2. Plot the histogram of the sample means. Add in red the density of the normal distribution that approximates the density of the sample mean according to the CLT if it exists.
3. Same question with a Student distribution with 2 degrees of freedom.
3. What do you observe?

In [32]:
# Answer

**Exercise:** Set $N=10^5$.

1. Generate $N$ random numbers from the Student distribution with 2 degrees of freedom. 
2. Save the sample mean into a numpy array.
3. Plot the sample mean as a function of $n$ in black, for $n=10001,\ldots,N$ and add an red, horizontal bar at the theoretical mean.
4. What is your conclusion, an what is the difference with the uniform distribution?

In [35]:
# Answer

## Part 3: Example of Monte Carlo Simulation

Let us now examine how a Monte Carlo approach can be used in a real problem, following an Example from Dagpunar (2007) *Simulation and Monte Carlo: With applications in finance and MCMC*. 

A company owns $K$ skips that can be hired out. During the $n$-th day, $Y_n$ customers approach the company, each wishing to rent a single skip. We assume that $Y_1, Y_2,\ldots$ are independent random variables having a Poisson distribution with mean $\lambda$. The skips are available, they are let as ``new hires''; otherwise, the company loses the business. An individual hire may last for several days: the probability that a skip currently on hire to an individual is returned the next day is $p$. Skips are always returned at the beginning of a day. The company charges $c$ per day for each skip hired out, as well as a fixed charge $f$ for each new hire. The $K$ skips have to be maintained irrespective of how many are on hire on a particular day. The company has to pay $m$ per day for each skip in its possession. 

Let 

- $X_n$ be the number of skips hired out on the $n$-th day,
- $H_n$ be the number of new hires on the $n$-th day.

For the numerical simulations, fix $\lambda = 5$, $p=0.2$, $X_0=0$ and consider $n$ from 1 to $n_\text{max}=100$.

1. Write a Python function `simulate` that simulates the number of skips hired out on the $n$-th day and returns both vectors of length $n_\text{max}$: $X_1,X_2,\ldots$ and $Y_1,Y_2,\ldots$.
Parameters of this function should be

- `n_max` the number of days to simulate,
- `K` the number of skips,
- `lambda` the mean of the Poisson distribution,
- `p` the probability that a skip is returned the next day,
- `x0` the number of skips hired out on the first day.

2. Write a Python function `compute` that computes the total profit of the company over $n_\text{max}$ days. Parameters of this function should be

- `n_max` the number of days to simulate,
- `X` the vector of skips hired out on each day,
- `Y` the vector of new hires on each day,
- `c`, `f`, and `m` the charges and costs of the company as defined above.

3. Try all values of $K$ between 20 and 30 and plot the total profit of the company over $n_\text{max}$ days.

4. (difficult, food for thought) Adapt the code of question 3 to use the same seed for the random number generator each time you call `simulate`. To do this, add the command `np.random.seed(0)` before each call to `simulate`. You can replace $0$ by another value. Compare the two plots. What do you observe? And why?

In [36]:
# Answer