# TP4- Probability Distributions and Simulation

## Instructions

In this practical session, you will use Python (NumPy, SciPy, Matplotlib) to simulate probability distributions, compute empirical statistics, and compare them with the theoretical values. Each exercise includes simulation tasks and interpretation questions.

## Exercise 1- Simulation of a Discrete Distribution and a Normal Distribution

Consider the following two random variables:

- A discrete random variable $X$ taking values $0, 1, 2$ with probabilities:

$$
P(X = 0) = 0.2, \quad P(X = 1) = 0.5, \quad P(X = 2) = 0.3
$$

- A continuous random variable $Z$ following a standard normal distribution:

$$
Z \sim \mathcal{N}(0, 1)
$$

1. Simulate $n = 10 000$ realizations of $X$ and $Z$ in Numpy.

2. Estimate the empirical probabilities $\hat{P}(X = k)$ for $k = 0, 1, 2$ and compare them with the theoretical probabilities.

3. Estimate emprically the probability $$P(-1 \leq Z \leq 1)$$ and compare it with the theoretical probability $$P(-1 \leq Z \leq 1) = \phi(1) - \phi(-1),$$ where $\phi$ is the cumulative distribution function of the standard normal distribution.

### Solution


In [None]:
import numpy as np
import scipy.stats as stats

# Setup
np.random.seed(42)
n = 10000

# 1. Simulate Discrete Variable X
# X takes values 0, 1, 2 with given probabilities
values_X = [0, 1, 2]
probs_X = [0.2, 0.5, 0.3]
X_sim = np.random.choice(values_X, size=n, p=probs_X)

# Simulate Continuous Variable Z ~ N(0,1)
# Using scipy.stats.norm.rvs instead of np.random.normal
Z_sim = stats.norm.rvs(loc=0, scale=1, size=n)


# 2. Analysis of Discrete Variable X
print("--- Analysis of X (Discrete) ---")
# Count how many times each value appears
unique, counts = np.unique(X_sim, return_counts=True)
empirical_probs_X = counts / n

# Compare empirical vs theoretical probabilities
for val, prob_emp, prob_theo in zip(values_X, empirical_probs_X, probs_X):
    print(f"P(X={val}): Empirical = {prob_emp:.4f} | Theoretical = {prob_theo:.4f}")


# 3. Analysis of Continuous Variable Z
print("\n--- Analysis of Z (Continuous) ---")
# Count how many Z values fall between -1 and 1
empirical_prob_Z = np.mean((Z_sim >= -1) & (Z_sim <= 1))

# Calculate theoretical probability using CDF
theoretical_prob_Z = stats.norm.cdf(1) - stats.norm.cdf(-1)

print(f"P(-1 <= Z <= 1):")
print(f"Empirical   = {empirical_prob_Z:.4f}")
print(f"Theoretical = {theoretical_prob_Z:.4f}")

---


## Exercise 2- Empirical Mean and Density of a Normal Distribution

Let $X$ be a random variable following a normal distribution:

$$
X \sim \mathcal{N}(\mu, \sigma^2), \quad \mu = 2, \quad \sigma = 3
$$

1. Simulate $n = 5000$ realizations of $X$.

2. Compute the empirical mean $\hat{X}$ and empirical variance, then compare them to the theoretical values $\mu = 2$ and $\sigma^2 = 9$.

3. Plot the normalized histogram of the simulated data and superimpose the theoretical density of the normal distribution $\mathcal{N}(2, 9)$.

### Solution


In [None]:
import numpy as np
import matplotlib.pyplot as plt
import scipy.stats as stats

# Setup
np.random.seed(42)
n = 5000
mu = 2
sigma = 3

# 1. Simulate X ~ N(2, 9) using scipy
X_sim = stats.norm.rvs(loc=mu, scale=sigma, size=n)

# 2. Compute Empirical Statistics
empirical_mean = np.mean(X_sim)
empirical_var = np.var(X_sim)
theoretical_var = sigma**2

print(f"Mean:     Empirical = {empirical_mean:.4f} | Theoretical = {mu}")
print(f"Variance: Empirical = {empirical_var:.4f} | Theoretical = {theoretical_var}")

# 3. Plot Histogram vs Theoretical Density
plt.figure(figsize=(10, 6))

# Histogram of simulated data
plt.hist(
    X_sim,
    bins=50,
    density=True,
    alpha=0.6,
    color="skyblue",
    edgecolor="black",
    label="Empirical Histogram",
)

# Theoretical density curve
x_vals = np.linspace(X_sim.min(), X_sim.max(), 1000)
pdf_vals = stats.norm.pdf(x_vals, loc=mu, scale=sigma)
plt.plot(
    x_vals,
    pdf_vals,
    "r-",
    linewidth=2.5,
    label=r"Theoretical Density $\mathcal{N}(2, 9)$",
)

plt.title("Histogram vs Theoretical Normal Density")
plt.xlabel("Value of X")
plt.ylabel("Density")
plt.legend()
plt.grid(True, alpha=0.3)
plt.show()

---


## Exercise 3- Binomial Distribution: Simulation vs. Theory

Let $Y$ be a binomial random variable:

$$
Y \sim \mathcal{B}(n, p), \quad n=20, \quad p=0.4
$$

1. Recall the theoretical formulas for the expectation and variance of a binomial distribution: $$E[Y] = np, \quad Var(Y) = np(1-p).$$

2. Simulate $N = 5\ 000$ realizations of $Y$.

3. Compute the empirical mean and empirical variance, and compare them with the theoretical values.

4. Estimate empirically the probabilities $\hat{P}(Y=k)$ for $k=0, 1, ..., 20$.

5. Plot on the same graph:

- a bar chart of the empirical probabilities,
- the theoretical PMF of the binomial distribution.

Comment on the accuracy of the empirical approximation.

### Solution


In [None]:
import numpy as np
import matplotlib.pyplot as plt
import scipy.stats as stats

# Setup
np.random.seed(42)
n = 20
p = 0.4
N = 5000

# 1. Theoretical Values
theoretical_mean = n * p
theoretical_var = n * p * (1 - p)

print("--- Theoretical Values ---")
print(f"E[Y] = {theoretical_mean}")
print(f"Var(Y) = {theoretical_var:.2f}")

# 2. Simulate Y ~ Binomial(20, 0.4) using scipy
Y_sim = stats.binom.rvs(n=n, p=p, size=N)

# 3. Empirical Statistics
empirical_mean = np.mean(Y_sim)
empirical_var = np.var(Y_sim)

print("\n--- Empirical Statistics ---")
print(f"Empirical Mean = {empirical_mean:.4f}")
print(f"Empirical Var  = {empirical_var:.4f}")

# 4 & 5. Plot Empirical vs Theoretical Probabilities
plt.figure(figsize=(10, 6))

# Empirical probabilities (histogram)
bins = np.arange(-0.5, n + 1.5, 1)
plt.hist(
    Y_sim,
    bins=bins,
    density=True,
    rwidth=0.8,
    color="skyblue",
    edgecolor="black",
    label="Empirical",
)

# Theoretical PMF
k_values = np.arange(0, n + 1)
theoretical_probs = stats.binom.pmf(k_values, n, p)
plt.plot(
    k_values,
    theoretical_probs,
    "ro-",
    linewidth=2,
    markersize=6,
    label="Theoretical PMF",
)

plt.title(f"Binomial Distribution: Simulation vs Theory (n={n}, p={p})")
plt.xlabel("Number of Successes (k)")
plt.ylabel("Probability P(Y=k)")
plt.xticks(np.arange(0, n + 1, 2))
plt.legend()
plt.grid(axis="y", alpha=0.3)
plt.show()

## Exercise 4- Poisson Distribution

A call center receives an average of $\lambda = 4$ calls per minute.

1. Simulate $10000$ minutes of activity ($n = 10000$).
2. Calculate the **Empirical Mean** and **Empirical Variance**.
3. Verify if the are approximately qual (validating the poisson property).
4. Calculate the theoretical probability of reveiving exactly 0 calls in a minute ($P(X=0)$) and compare it to your simulation.

### Solution


In [None]:
import numpy as np
from scipy.stats import poisson

# 1. Setup
lam = 4  # Average rate (lambda)
n = 10000  # Number of simulations

# 2. Simulate
# generating n minutes of call counts
X_sim = poisson.rvs(mu=lam, size=n)

# 3. Verify Mean = Variance Property
emp_mean = np.mean(X_sim)
emp_var = np.var(X_sim)

print(f"Theoretical Lambda: {lam}")
print(f"Empirical Mean:     {emp_mean:.4f}")
print(f"Empirical Variance: {emp_var:.4f}")
# These two values should be very close!

# 4. Probability of 0 calls (Silence)
# Theoretical PMF: (lambda^0 * e^-lambda) / 0!
prob_zero_theo = poisson.pmf(0, mu=lam)
prob_zero_emp = np.mean(X_sim == 0)

print(f"\nP(X=0) Theoretical: {prob_zero_theo:.4f}")
print(f"P(X=0) Empirical:   {prob_zero_emp:.4f}")