# 1 Laplace Distribution MLE Simulation

## 1.1 Mathematical Setup

### 1.1.1 True data generating process for a Laplace distribution with parameters $\mu$ (location) and $b$ (scale)

$$
X_i \sim  Laplace(\mu, b)
$$


### 1.1.2 Probability Density Function
$$
p(x|\mu, b) = \frac{1}{2b} \exp (-\frac{|x-\mu|}{b})
$$

### 1.1.3 Observed data notation for $T$ samples
$$
\tilde{X}_i \sim p(x|\mu, b) \quad \text{for } i=1,\dots,T \\
\tilde{y}_i \sim p(y|X, \mu, b) \quad \text{for } i=1,\dots,T \\
$$
realizations from the true distribution

### 1.1.4 Candidate model specification using asterisk notation
$$
\begin{align}
y_i^* =& f^*(X_i^*|\theta^*) + \epsilon_i^* \\
\epsilon^*_i \sim& p^*( \epsilon^*| \theta) \\
y_i^* \sim& p^*(y^*|X^*_i, \theta^*)
\end {align}
$$


according to the script we would have $\epsilon^*_i \sim p^*( \epsilon*| \theta)$ but that makes no sense to me. why are we drawing from laplace? Shouldnt we draw from the normal distribution?

i think there is abuse of notation on script  page 16 arent we using p* for multiple things?

### 1.1.5 Likelihood and Log-Likelihood functions
Given observed samples $\{\tilde{X}_i\}_{i=1}^T$

**Likelihood function**
How likely the datapoints are given our distribution and parameters
$$
\tilde{\mathcal{L}}(\tilde{\theta}) = \prod_{i=1}^T \tilde{p}(\tilde{X_i}|\tilde{\theta})
$$


**Log-Likelihood function**
$$
\tilde{\ell}(\tilde{\theta}) = \sum_{i=1}^T \log \tilde{p}(\tilde{X_i}|\tilde{\theta})
$$

REMINDER: log 1 = 0, log 0.5 = -0.30, log 0.2 =  -0.69 etc.

### 1.1.6 MLE Optimization Problem
$$
\hat \theta = \arg \max_{\tilde{\theta}} \sum_{i=1}^T \log \tilde{p}(\tilde{X_i}|\tilde{\theta}) \\
\hat \theta = \arg \max_{\tilde{\theta}} \tilde{\ell}(\tilde{\theta})
$$

### 1.1.7 Estimated Model specification after optimization
$$
\begin{align}
\hat\theta &= \arg \min_{\theta^*} L^*(\theta^*) \\
\hat y_i &= \hat f(X_i^*|\hat \theta) \\
\hat y_i &\sim \hat p(\hat y | X_i^*, \hat \theta)
\end{align}
$$

Just everything with the estimated parameters that minimize the loss function 

## 1.2 Implementation

### 1.2.1 Import

In [46]:
import numpy as np
import matplotlib.pyplot as plt
np.random.seed = 42


### 1.2.2 True Data generation process

In [None]:
def generate_laplace_data(mu, b, n, K: int):
    """ Generate K datasets of n samples from Laplace distribution with parameters mu and b """
    X = np.random.laplace(mu, b, (K, n) if K > 1 else n)
    noise = np.random.normal(0, 1, (K, n) if K > 1 else n)
    return X + noise # not sure if the noise should be normal or laplace

In [71]:
def laplace_pdf(x, mu, b):
    """ Probability density function of the Laplace distribution """
    return (1 / (2 * b)) * np.exp(-np.abs(x - mu) / b)

### 1.2.3 MLE Estimation 

In [None]:
def MLE_laplace_est(X):
    """ Compute MLE estimates for parameters of Laplace distribution given data X.
        X is a 2D array where each row is a dataset.
        Returns two 1D arrays: mu_hats and b_hats, containing the estimates for each dataset.
    """
    n = X.shape[1]
    K = X.shape[0]
    
    # sample median
    mu_hats = np.median(X, axis=1)
    # mean absolute deviation from median
    b_hats = (1/n)*np.sum([np.abs(X[k] - mu_hats[k]) for k in range(K)], axis=1) 

    return mu_hats, b_hats

### 1.2.4 Simulation Loop & Result Storage

With simple for loop

In [None]:
# number of samples per experiment
n = 1000
# number of experiments
K = 500

# true parameters
mu = 0 # location
b = 2 # scale


X = np.zeros((K,n))
theta_hats = np.zeros((K,2))
for k in range(K):
    X[k] = np.random.laplace(mu, b, n)
    
    mu_hat = np.median(X[k])
    b_hat = (1/n)*sum(np.abs(X[k] - mu_hat))
    theta_hats[k] = (mu_hat, b_hat)

Optimized operations

In [None]:
# number of samples per experiment
n = 1000
# number of experiments
K = 500

# true parameters
mu = 0 # location
b = 2 # scale

X = generate_laplace_data(mu, b, n, K)
y_true  = laplace_pdf(X, mu, b)
mu_hats, b_hats = MLE_laplace_est(X)
theta_hats = np.column_stack((mu_hats, b_hats))

mu_hats_mean = np.mean(mu_hats)
mu_hats_std = np.std(mu_hats)

b_hats_mean = np.mean(b_hats)
b_hats_std = np.std(b_hats)

y_hats = laplace_pdf(X, mu_hats_mean, b_hats_mean)

residuals = y_true - y_hats
# mle error variance estimate
sigma2_hat = np.sum(residuals**2) / n

### 1.2.5 Summary Stats and Visualization

In [90]:

print(f"Simulation settings:")
print(f"Number of samples per simulation (n): {n}")
print(f"Number of repetitions (K): {K}")
print(f"\nLaplace Distribution - True parameters:")
print(f"mu = {mu} (location)")
print(f"b = {b} (scale)")
print(f"\nLaplace Distribution - Estimated parameters (mean over {K} simulations):")
print(f"mu_hat = {mu_hats_mean:.8f} (location)")
print(f"b_hat = {b_hats_mean:.8f} (scale)")
print(f"\nStandard deviation of estimates over {K} simulations:")
print(f"std(mu_hat) = {mu_hats_std:.8f} (location)")
print(f"std(b_hat) = {b_hats_std:.8f} (scale)")
print(f"\nMLE error variance estimate (sigma^2_hat): {sigma2_hat:.8f}")

Simulation settings:
Number of samples per simulation (n): 1000
Number of repetitions (K): 500

Laplace Distribution - True parameters:
mu = 0 (location)
b = 2 (scale)

Laplace Distribution - Estimated parameters (mean over 500 simulations):
mu_hat = -0.00225510 (location)
b_hat = 1.99805678 (scale)

Standard deviation of estimates over 500 simulations:
std(mu_hat) = 0.06814306 (location)
std(b_hat) = 0.06004149 (scale)

MLE error variance estimate (sigma^2_hat): 0.00001873


In [None]:
def run_simulation(mu, b, n, K):
    X = generate_laplace_data(mu, b, n, K)
    mu_hats, b_hats = MLE_laplace_est(X)
    theta_hats = np.column_stack((mu_hats, b_hats))
    return theta_hats