#  Mean Reverting Process (MRP) — Notes

---

## 1. Motivation

Equity prices are often modeled using geometric Brownian motion, but this is *not* suitable for many commodities such as oil.  
Commodity prices tend to fluctuate randomly **around a long-term mean**, which requires a **mean-reverting** process.

The standard mean-reverting stochastic differential equation (SDE) is

$$
dX_t = \theta(\mu - X_t)\,dt + \sigma\, dB_t,
$$

where:

- $\theta > 0$: speed of mean reversion  
- $\mu$: long-run mean  
- $\sigma > 0$: volatility  
- $B_t$: standard Brownian motion  

When $X_t > \mu$: drift is negative (pulls $X_t$ down)  
When $X_t < \mu$: drift is positive (pulls $X_t$ up)

---

## 2. Euler–Maruyama Simulation (Discrete–Time MRP)

To simulate the SDE numerically at times  
$$
s,\ s+\Delta,\ s+2\Delta,\ \ldots,
$$  
we use:

$$
X_{s+(j+1)\Delta}
=
X_{s+j\Delta}
+
\theta(\mu - X_{s+j\Delta})\Delta
+
\sigma \delta_j \sqrt{\Delta},
$$

where each $\delta_j \sim N(0,1)$ independently.

This yields fast Monte Carlo realizations.

---

## 3. Exact Conditional Distribution of the MRP

Using Itô’s Lemma on

$$
Y_t = e^{\theta t}X_t,
$$

one can derive an explicit formula for $X_t$:

$$
X_t
=
e^{-\theta (t-s)}X_s
+ \mu(1 - e^{-\theta (t-s)})
+ \sigma e^{-\theta t}
\left(
 \int_s^t e^{\theta u} dB_u
\right).
$$

The stochastic integral term is **normally distributed**, with mean 0 and variance:

$$
\operatorname{Var}\left[
\sigma e^{-\theta t}
\int_s^t e^{\theta u} \, dB_u
\right]
=
\sigma^2
\frac{1 - e^{-2\theta(t-s)}}{2\theta}.
$$

Thus we obtain the key result:

### Conditional Distribution

For $t > s$,

- **Mean**
  $$
  m = e^{-\theta (t-s)} X_s + \mu(1 - e^{-\theta (t-s)})
  $$

- **Variance**
  $$
  v = \sigma^2 \frac{1 - e^{-2\theta(t-s)}}{2\theta}
  $$

Therefore,

$$
X_t \mid X_s \sim N(m, v).
$$

---

## 4. Generating a Realization at Arbitrary Time Points

Given a time grid  
$$
0 = t_0 < t_1 < \ldots < t_N,
$$  
and initial value $X_{t_0}$, we generate:

$$
X_{t_i} \sim N\!\left(
e^{-\theta \Delta t_i} X_{t_{i-1}}
+ \mu(1 - e^{-\theta \Delta t_i}),
\;\;
\sigma^2 \frac{1 - e^{-2\theta \Delta t_i}}{2\theta}
\right),
$$

where $\Delta t_i = t_i - t_{i-1}$.

---

## 5. Long–Run Behavior

Letting $t \to \infty$:

- **Mean**
  $$
  \lim_{t\to\infty} E[X_t \mid X_0] = \mu.
  $$

- **Variance**
  $$
  \lim_{t\to\infty} \operatorname{Var}(X_t) = \frac{\sigma^2}{2\theta}.
  $$

Thus the process approaches a stationary normal distribution:
$$
X_\infty \sim N\!\left(\mu,\; \frac{\sigma^2}{2\theta}\right).
$$

---

## 6. Covariance Structure

For $0 < s < t$:

$$
\operatorname{Cov}(X_s, X_t)
=
\frac{\sigma^2}{2\theta}
\left(
e^{-\theta(t-s)}
-
e^{-\theta(t+s)}
\right).
$$

This follows from writing $X_s$ and $X_t$ in stochastic-integral form and applying Itô isometry.

---

## 7. Likelihood Function for Observed MRP Data

Suppose we observe:

- Time points: $t_0, t_1, \ldots, t_N$  
- Values: $X_{t_0}, X_{t_1}, \ldots, X_{t_N}$  

Because the MRP is a **Markov process**, the joint density simplifies:

$$
f(x_{t_0}, \ldots, x_{t_N})
= f(x_{t_0})
\prod_{i=1}^N f(x_{t_i} \mid x_{t_{i-1}}),
$$

where  
$$
X_{t_i} \mid X_{t_{i-1}}
\sim N(m_i, v_i).
$$

Ignoring the constant first term, the log-likelihood becomes:

$$
\ell(\mu,\sigma,\theta)
=
\sum_{i=1}^N
\left[
-\frac12 \log(2\pi v_i)
-\frac{(X_{t_i}-m_i)^2}{2v_i}
\right].
$$

---

## 8. Simplifying Notation

Define:

$$
\Delta t_i = t_i - t_{i-1},\qquad
w_i = e^{-\theta \Delta t_i}.
$$

Then:

- Mean  
  $$
  m_i = w_i X_{t_{i-1}} + \mu(1 - w_i)
  $$

- Variance  
  $$
  v_i = \sigma^2 \frac{1 - w_i^2}{2\theta}
  $$

---

## 9. Maximization Strategy

Surprisingly, maximizing $\ell(\mu,\sigma,\theta)$ is simplified:

### Step 1: For fixed $\theta$, the MLE for $\mu$ is

$$
\mu(\theta)
=
\frac{\sum_{i=1}^N (X_{t_i} - X_{t_{i-1}}w_i)/(1 + w_i)}
     {\sum_{i=1}^N (1 - w_i)/(1 + w_i)}.
$$

### Step 2: For fixed $(\theta, \mu)$, the MLE for $\sigma$ is

$$
\sigma(\theta)
=
\sqrt{ \frac{2\theta}{N}
\sum_{i=1}^N
\frac{(X_{t_i} - m̂_i)^2}{1 - w_i^2} }.
$$

### Step 3: Perform a **grid search over $\theta$**

Evaluate:

$$
\ell(\mu(\theta),\sigma(\theta),\theta)
$$

for $\theta$ in a fine grid (e.g. 0.00001 to 0.01),  
and choose $\theta$ that maximizes it.

Finally output:

- $\theta$
- $\mu(\theta)$
- $\sigma(\theta)$

---

## 10. Parametric Bootstrap for Confidence Intervals

To quantify uncertainty in $(\theta, \mu, \sigma)$:

1. Fit model → get actual estimates $(\theta,\mu,\sigma)$  
2. For $j = 1, \ldots, M$:  
   - simulate a dataset from the MRP using the fitted parameters  
   - refit to obtain $(\theta^{(j)}, \mu^{(j)}, \sigma^{(j)})$  
3. Use sample percentiles (2.5%, 97.5%) to form **95% bootstrap CIs**.

Example:

- CI for $\theta$:  
  $$
  (\theta_L,\; \theta_U)
  $$
- CI for $\mu$:  
  $$
  (\mu_L,\; \mu_U)
  $$
- CI for $\sigma$:  
  $$
  (\sigma_L,\; \sigma_U)
  $$

---

### **Problem 1: Simulation**
Write code to simulate MRP at user-specified time points.

In [1]:
import numpy as np

def simulate_mrp(theta, mu, sigma, t_grid, X0):
    """
    Simulate the Mean Reverting Process (Ornstein–Uhlenbeck):
        dX_t = theta*(mu - X_t) dt + sigma dB_t

    using the exact conditional distribution.

    Parameters
    ----------
    theta : float
        Speed of mean reversion (>0)
    mu : float
        Long-run mean
    sigma : float
        Volatility parameter
    t_grid : array-like
        Sorted time points t0 < t1 < ... < tN
    X0 : float
        Initial value X(t0)

    Returns
    -------
    np.ndarray
        Simulated values [X(t0), ..., X(tN)]
    """

    t_grid = np.array(t_grid)
    N = len(t_grid)
    X = np.zeros(N)
    X[0] = X0

    for i in range(1, N):
        dt = t_grid[i] - t_grid[i-1]
        w = np.exp(-theta * dt)

        # Conditional mean and variance
        mean = w * X[i-1] + mu * (1 - w)
        var = (sigma**2) * (1 - w**2) / (2 * theta)

        # Draw from normal(mean, var)
        X[i] = np.random.normal(mean, np.sqrt(var))

    return X

### **Problem 2: MLE Estimation**
Implement the likelihood-based MLE estimation described above.

In [2]:
import numpy as np

# Helper 1: Compute μ̂(θ)
def mu_hat(theta, t_grid, X):
    t_grid = np.asarray(t_grid)
    X = np.asarray(X)
    N = len(X) - 1

    num = 0.0
    den = 0.0

    for i in range(1, N+1):
        dt = t_grid[i] - t_grid[i-1]
        w = np.exp(-theta * dt)
        num += (X[i] - w * X[i-1]) / (1 + w)
        den += (1 - w) / (1 + w)

    return num / den

# Helper 2: Compute σ̂(θ)
def sigma_hat(theta, mu, t_grid, X):
    t_grid = np.asarray(t_grid)
    X = np.asarray(X)
    N = len(X) - 1

    ssum = 0.0
    for i in range(1, N+1):
        dt = t_grid[i] - t_grid[i-1]
        w = np.exp(-theta * dt)
        m = w * X[i-1] + mu * (1 - w)
        ssum += (X[i] - m)**2 / (1 - w**2)

    return np.sqrt((2*theta/N) * ssum)


# Helper 3: Exact Gaussian log-likelihood
def mrp_loglik(theta, mu, sigma, t_grid, X):
    t_grid = np.asarray(t_grid)
    X = np.asarray(X)
    N = len(X) - 1

    loglik = 0.0
    for i in range(1, N+1):
        dt = t_grid[i] - t_grid[i-1]
        w = np.exp(-theta * dt)

        m = w * X[i-1] + mu * (1 - w)
        v = sigma**2 * (1 - w**2) / (2*theta)

        loglik += -0.5 * (np.log(2*np.pi*v) + ((X[i] - m)**2)/v)

    return loglik


# Problem 2: MLE Estimation via θ-grid search
def fit_mrp_mle(t_grid, X, theta_grid):
    best_ll = -np.inf
    best = None

    for theta in theta_grid:
        if theta <= 0:
            continue

        mu = mu_hat(theta, t_grid, X)
        sigma = sigma_hat(theta, mu, t_grid, X)
        ll = mrp_loglik(theta, mu, sigma, t_grid, X)

        if ll > best_ll:
            best_ll = ll
            best = (theta, mu, sigma, ll)

    return {
        "theta_hat": best[0],
        "mu_hat": best[1],
        "sigma_hat": best[2],
        "loglik": best[3],
    }

### **Problem 3: Testing**
Generate synthetic MRP data and fit parameters for:
- $n = 5, 10, 20, 100$  
Estimates should become more accurate as $n$ increases.

In [4]:
def problem3_test_estimation(
        theta_true=0.4,
        mu_true=2.0,
        sigma_true=0.7,
        T=10.0,
        n_values=(5, 10, 20, 100),
        seed=123
    ):
    """
    Problem 3:
    Test MRP parameter estimation accuracy for n = 5, 10, 20, 100.

    Parameters
    ----------
    theta_true, mu_true, sigma_true : floats
        True parameters used to simulate data.
    T : float
        Final time for simulation interval.
    n_values : tuple
        Different sample sizes to test.
    seed : int
        Random seed for reproducibility.
    """

    np.random.seed(seed)

    print("TRUE PARAMETERS:")
    print(f"theta = {theta_true}, mu = {mu_true}, sigma = {sigma_true}")
    print("\nEstimations for increasing sample size:\n")

    for n in n_values:
        t_grid = np.linspace(0, T, n + 1)
        X = simulate_mrp(theta_true, mu_true, sigma_true, t_grid, X0=1.0)
        theta_grid = np.linspace(0.01, 1.0, 200)
        fit = fit_mrp_mle(t_grid, X, theta_grid)

        theta_hat = fit["theta_hat"]
        mu_hat_est = fit["mu_hat"]
        sigma_hat_est = fit["sigma_hat"]

        print(f"n = {n}")
        print(f"  theta_hat = {theta_hat:.4f}")
        print(f"  mu_hat    = {mu_hat_est:.4f}")
        print(f"  sigma_hat = {sigma_hat_est:.4f}")
        print("-" * 40)

In [5]:
problem3_test_estimation()

TRUE PARAMETERS:
theta = 0.4, mu = 2.0, sigma = 0.7

Estimations for increasing sample size:

n = 5
  theta_hat = 1.0000
  mu_hat    = 1.4968
  sigma_hat = 0.8546
----------------------------------------
n = 10
  theta_hat = 1.0000
  mu_hat    = 1.6893
  sigma_hat = 0.9092
----------------------------------------
n = 20
  theta_hat = 0.2289
  mu_hat    = 1.9155
  sigma_hat = 0.8669
----------------------------------------
n = 100
  theta_hat = 0.9801
  mu_hat    = 2.1917
  sigma_hat = 0.7599
----------------------------------------


### **Problem 4: WTI Crude Oil Data**
Fit MRP parameters to daily historical WTI crude oil prices.

In [16]:
import numpy as np
import pandas as pd

def load_wti_data(csv_path):
    """
    Load daily WTI crude oil prices from your CSV.
    Uses the exact column name ' value' (with a leading space).
    """
    df = pd.read_csv(csv_path)
    prices = df[" value"].astype(float).values
    log_prices = np.log(prices)
    t_grid = np.arange(len(log_prices), dtype=float)
    return t_grid, log_prices

# Problem 4: Fit MRP to WTI log-price data
def problem4_fit_wti(csv_path, theta_min=0.0001, theta_max=0.02, n_theta=200):
    """
    Fit the MRP model to WTI crude oil log-prices using θ-grid search.
    """
    t_grid, X = load_wti_data(csv_path)
    theta_grid = np.linspace(theta_min, theta_max, n_theta)
    fit = fit_mrp_mle(t_grid, X, theta_grid)

    print("\n=== MRP Fit for WTI Crude Oil Log-Prices ===")
    print(f"theta_hat = {fit['theta_hat']}")
    print(f"mu_hat    = {fit['mu_hat']}")
    print(f"sigma_hat = {fit['sigma_hat']}")
    print(f"loglik    = {fit['loglik']}\n")

    return fit

In [17]:
fit_wti = problem4_fit_wti("DailyWTICrudeOilPrices.csv")


=== MRP Fit for WTI Crude Oil Log-Prices ===
theta_hat = 0.004
mu_hat    = 4.106191721309191
sigma_hat = 0.030629322365514623
loglik    = 5236.278623750432



### **Problem 5: Bootstrap CIs**
Compute 95% bootstrap confidence intervals using $M = 1000$ bootstrap samples.

In [18]:
import numpy as np
import pandas as pd

def load_wti_data(csv_path):
    df = pd.read_csv(csv_path)
    prices = df[" value"].astype(float).values    
    log_prices = np.log(prices)
    t_grid = np.arange(len(log_prices), dtype=float)
    return t_grid, log_prices


# Problem 5: Parametric Bootstrap
def bootstrap_mrp_params(csv_path,
                         M=200,          # number of bootstrap samples
                         theta_min=0.0001,
                         theta_max=0.02,
                         n_theta=200,
                         seed=123):
    """
    Perform parametric bootstrap to estimate confidence intervals
    for θ, μ, σ in the MRP model.
    """
    t_grid, X = load_wti_data(csv_path)
    N = len(X) - 1
    theta_grid = np.linspace(theta_min, theta_max, n_theta)

    fit = fit_mrp_mle(t_grid, X, theta_grid)
    theta_hat = fit["theta_hat"]
    mu_hat = fit["mu_hat"]
    sigma_hat = fit["sigma_hat"]

    print("\n=== Original Fit (observed data) ===")
    print(f"theta_hat = {theta_hat}")
    print(f"mu_hat    = {mu_hat}")
    print(f"sigma_hat = {sigma_hat}\n")

    theta_boot = np.zeros(M)
    mu_boot = np.zeros(M)
    sigma_boot = np.zeros(M)
    np.random.seed(seed)

    # Step 2: Bootstrap Loop
    for j in range(M):
        X_sim = simulate_mrp(theta_hat, mu_hat, sigma_hat, t_grid, X[0])
        fit_sim = fit_mrp_mle(t_grid, X_sim, theta_grid)

        theta_boot[j] = fit_sim["theta_hat"]
        mu_boot[j] = fit_sim["mu_hat"]
        sigma_boot[j] = fit_sim["sigma_hat"]

        if (j + 1) % max(1, M // 10) == 0:
            print(f"Bootstrap {j+1}/{M} complete")

    CI_theta = np.percentile(theta_boot, [2.5, 97.5])
    CI_mu = np.percentile(mu_boot, [2.5, 97.5])
    CI_sigma = np.percentile(sigma_boot, [2.5, 97.5])

    print("\n=== Bootstrap 95% Confidence Intervals ===")
    print(f"theta CI  = {CI_theta}")
    print(f"mu CI     = {CI_mu}")
    print(f"sigma CI  = {CI_sigma}\n")

    return {
        "theta_hat": theta_hat,
        "mu_hat": mu_hat,
        "sigma_hat": sigma_hat,
        "theta_boot": theta_boot,
        "mu_boot": mu_boot,
        "sigma_boot": sigma_boot,
        "theta_CI": CI_theta,
        "mu_CI": CI_mu,
        "sigma_CI": CI_sigma,
    }

In [20]:
boot = bootstrap_mrp_params("DailyWTICrudeOilPrices.csv", M=200)


=== Original Fit (observed data) ===
theta_hat = 0.004
mu_hat    = 4.106191721309191
sigma_hat = 0.030629322365514623

Bootstrap 20/200 complete
Bootstrap 40/200 complete
Bootstrap 60/200 complete
Bootstrap 80/200 complete
Bootstrap 100/200 complete
Bootstrap 120/200 complete
Bootstrap 140/200 complete
Bootstrap 160/200 complete
Bootstrap 180/200 complete
Bootstrap 200/200 complete

=== Bootstrap 95% Confidence Intervals ===
theta CI  = [0.0023    0.0105025]
mu CI     = [3.81869141 4.44105895]
sigma CI  = [0.02978662 0.03152713]

