In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from scipy import stats

plt.style.use('fivethirtyeight')
%matplotlib inline

### Data 145 Spring 2026
#### Adhikari/Fithian

# Worksheet 1 

## Homework 1 (due 5 PM Monday 1/26)
- Problem 1 (in the notebook; can be done based on Tuesday's lecture)
- Problem 2 (on paper)
- Problem 5 (on paper)

Problems 3 and 4 will be covered in section, along with some warm-ups.

### 1. The Power of the MLE and Delta Method

In this problem, you'll estimate a quantity related to earthquake frequency and discover how the MLE + delta method dramatically outperforms a naive approach.

**Setup:** Let $\lambda$ be the rate of M $\geq$ 4.0 mainshock earthquakes in California, measured in earthquakes per day. The number of earthquakes in a (non-leap) year is approximately $\text{Poisson}(365\lambda)$.

**Estimand:** We want to estimate the **90th percentile** of the annual earthquake count:
$$q(\lambda) = \text{90th percentile of } \text{Poisson}(365\lambda)$$

This tells us: "In 90% of years, we expect at most $q(\lambda)$ earthquakes."

In [None]:
# Load the earthquake data
earthquakes = pd.read_csv('data/california_earthquakes_declustered.csv')
earthquakes['time'] = pd.to_datetime(earthquakes['time'], format='ISO8601')

# Filter to M >= 4.0 mainshocks
mainshocks = earthquakes[(earthquakes['is_mainshock']) & (earthquakes['mag'] >= 4.0)].copy()
mainshocks = mainshocks.sort_values('time').reset_index(drop=True)

# Compute interarrival times (in days)
interarrivals = mainshocks['time'].diff().dt.total_seconds() / (60 * 60 * 24)
interarrivals = interarrivals.dropna().values

n = len(interarrivals)
print(f"Number of interarrival times: {n}")
print(f"Mean interarrival time: {np.mean(interarrivals):.2f} days")

---
**(a) Gaussian Approximation for Poisson Quantiles**

For large $\mu$, the Poisson distribution is approximately Gaussian:
$$\text{Poisson}(\mu) \approx N(\mu, \mu)$$

Using this approximation, derive a formula for the 90th percentile of $\text{Poisson}(\mu)$ in terms of $\mu$.

[Use `stats.norm.ppf` to get the 90th percentile of the standard normal distribution.]

**Your answer:**

*[Write your derivation here]*



In [None]:
...

---
**(b) MLE for the 90th Percentile**

From the interarrival time data, we can estimate $\lambda$ using the MLE:
$$\hat{\lambda} = \frac{1}{\overline{X}_n}$$


- Compute $\hat{\lambda}$ from the data.
- Using your formula from part Part **a**, compute the MLE-based estimate $\hat{q} = q(\hat{\lambda})$ for the 90th percentile of annual earthquake counts.

In [None]:
# You may use as many lines as you need

lambda_hat = ...

q_hat = ...

---
**(c) Delta Method for the Distribution of $\hat{q}$**

We want to predict the sampling distribution of our estimator $\hat{q} = q(\hat{\lambda})$.

Recall that for exponential interarrivals with rate $\lambda$:
- $\overline{X}_n$ has mean $1/\lambda$ and variance $1/(n\lambda^2)$
- $\hat{\lambda} = 1/\overline{X}_n$

By the delta method, $\hat{\lambda}$ is approximately normal with:
$$\hat{\lambda} \approx N\left(\lambda, \frac{\lambda^2}{n}\right)$$

Apply the delta method again to find the approximate distribution of $\hat{q} = q(\hat{\lambda})$, in the following steps.

- Compute $q'(\lambda) = \frac{d}{d\lambda} q(\lambda)$
- Use the delta method to find $\text{Var}(\hat{q})$ in terms of $\lambda$ and $n$
- Plug in $\hat{\lambda}$ to get a numerical estimate of $\text{SD}(\hat{q})$

**Your answer:**

*[Write your derivation here]*



In [None]:
...

var_q_hat_estimate = ...
sd_q_hat_estimate = ...

---
**(d) Bootstrap Verification**

Check your delta method prediction by bootstrapping the interarrival times in the following steps.


- Bootstrap the interarrival times (resample with replacement)
- For each bootstrap sample, compute $\hat{\lambda}^*$ and then $\hat{q}^* = q(\hat{\lambda}^*)$
- Plot the histogram of $\hat{q}^*$ values and overlay your predicted normal distribution

In [None]:
n_bootstrap = 10000
np.random.seed(42)

boot_q = np.zeros(n_bootstrap)

...

In [None]:
# Plot the bootstrap distribution vs delta method prediction
fig, ax = plt.subplots(figsize=(10, 6))

# Bootstrap histogram
ax.hist(boot_q, bins=50, density=True, alpha=0.7, color='steelblue', 
        edgecolor='white', label='Bootstrap distribution')

# Delta method normal approximation
x_grid = np.linspace(boot_q.min(), boot_q.max(), 200)
delta_pdf = stats.norm.pdf(x_grid, q_hat, sd_q_hat_estimate)
ax.plot(x_grid, delta_pdf, '--', color='firebrick', linewidth=2.5,
        label=f'Delta method: N({q_hat:.1f}, {sd_q_hat_estimate:.2f}²)')

ax.axvline(q_hat, color='black', linestyle='--', linewidth=1.5, alpha=0.7,
           label=f'MLE estimate: q̂ = {q_hat:.1f}')

ax.set_xlabel('Estimated 90th Percentile (earthquakes/year)', fontsize=12)
ax.set_ylabel('Density', fontsize=12)
ax.set_title('Bootstrap vs Delta Method Prediction for $\hat{q}$', fontsize=14, fontweight='bold')
ax.legend(fontsize=11)

plt.tight_layout()
plt.show()

print("The delta method nailed it!")

---
**e) A Naive Alternative — Sample Quantile of Annual Counts**

Instead of using the MLE and delta method, we could try a more direct approach:

First count the number of earthquakes in each calendar year. Then take the sample 90th percentile of these annual counts

Use the steps below to execute this plan and examine its performance using the bootstrap. 
- Compute the annual earthquake counts for each year in the data.
- Compute the sample 90th percentile.
- Bootstrap this estimator by resampling *years* (not interarrivals).
= Compare the bootstrap standard deviation to the MLE-based estimator.

In [None]:
# Count earthquakes per year and compute sample 90th percentile

...

In [None]:
# Bootstrap the sample 90th percentile by resampling years
np.random.seed(42)
boot_q_naive = np.zeros(n_bootstrap)

...

In [None]:
# Compare the two bootstrap distributions
fig, ax = plt.subplots(figsize=(10, 6))

ax.hist(boot_q, bins=50, density=True, alpha=0.6, color='steelblue',
        label=f'MLE + Delta Method (SD = {np.std(boot_q):.2f})')
ax.hist(boot_q_naive, bins=30, density=True, alpha=0.6, color='coral',
        label=f'Sample 90th Percentile (SD = {np.std(boot_q_naive):.2f})')

ax.set_xlabel('Estimated 90th Percentile (earthquakes/year)', fontsize=12)
ax.set_ylabel('Density', fontsize=12)
ax.set_title('MLE-Based vs Naive Estimator', fontsize=14, fontweight='bold')
ax.legend(fontsize=11)

plt.tight_layout()
plt.show()

print(f"\nThe MLE-based estimator is MUCH more precise!")
print(f"Variance ratio ≈ {np.var(boot_q_naive) / np.var(boot_q):.0f}×")
print(f"\nThis is equivalent to having {np.var(boot_q_naive) / np.var(boot_q):.0f}× more data!")

---
**Summary**

**Why is the MLE-based estimator so much better?**

1. **The naive approach** uses only ~45 data points (one per year) and estimates a quantile, which is inherently noisy.

2. **The MLE-based approach** uses all ~600 interarrival times to estimate $\lambda$ precisely, then transforms via a known formula.

3. **The delta method** lets us predict the variance analytically — no simulation needed!

**The moral:** When you have a good parametric model, the MLE + delta method can dramatically outperform nonparametric approaches. The model lets you "borrow strength" from all of your data.

### 2. Convergence in Quadratic Mean

Let $X_1, X_2, \ldots$ and $X$ be defined on the same space. We say that $X_n$ *converges in quadratic mean* to $X$ if $E((X_n - X)^2) \to 0$ as $n \to \infty$. In other words, convergence in quadratic mean is the same as the mean squared difference going to $0$.

Show that convergence in quadratic mean implies convergence in probability. It's a good idea to start by writing the event $|X_n - X| > \epsilon$ in terms of $(X_n - X)^2$. Then review the main ideas used to prove [Chebyshev's inequality](https://data140.org/textbook/content/chapter-12/bounds/#chebyshevs-inequality) and [Chernoff's bound](https://data140.org/textbook/content/chapter-19/chernoff-bound/#exponential-bounds-on-tails).

### 3. Convergence, or not
Let $U$ be uniform on the interval $(0, 1)$. For $n \ge 1$ define $X_n = nI(U \leq 1/n)$.

**(a)** What is the distribution of $X_n$? Sketch the cdf of $X_n$.

**(b)** Does $X_n$ converge in distribution? If so, to what? Justify your answer, and remember that to establish convergence in distribution you need to show convergence of the cdf sequence at all continuity points of the limit cdf.

**(c)** Does $X_n$ converge in probability? If so, to what? Justify your answer.

**(d)** Does the numerical sequence $E(X_n)$ converge? If so, to what?

**(e)** Does convergence in probability imply convergence in quadratic mean? Why or why not?

### 4. Sums and Convergence in Probability

Suppose $X_n \stackrel{P}{\to} X$ and $Y_n \stackrel{P}{\to} Y$. Show that $X_n + Y_n \stackrel{P}{\to} X+Y$, in the following steps.

**(a)** Find an upper bound for $\vert X_n + Y_n - (X+Y) \vert$ using $\vert X_n - X \vert$ and $\vert Y_n - Y \vert$.

**(b)** Fix $\epsilon > 0$. For the event $\vert X_n + Y_n - (X+Y) \vert > \epsilon$ to occur, what must $\vert X_n - X \vert$ and $\vert Y_n - Y \vert$ do? Answer this by filling in the blank with a phrase: "At least one of them must be $\underline{~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~}$".

**(c)** Complete the argument.

### 5. Standardizing the Sample Mean

Let $X_1, X_2, \ldots$ be i.i.d. with mean $\mu$ and variance $\sigma^2$. Let $\bar{X}_n = \frac{1}{n}\sum_{i=1}^n X_i$. The CLT says that the sequence $\displaystyle{\frac{\bar{X}_n - \mu}{\sigma/\sqrt{n}}}$ converges in distribution to standard normal. This allows us to use $\bar{X}_n \pm 2\sigma/\sqrt{n}$ as an approximate 95% confidence interval for $\mu$.

In practice, we typically don't know $\sigma$. In the prerequisite classes we told you that it's fine to substitute the unknown $\sigma$ with the SD of the sample. In this exercise you'll see why that's OK. 

**(a) Preliminary fact about any list of numbers:** For numbers $x_1, x_2, \ldots, x_n$, let $\bar{x} = \frac{1}{n}\sum_{i=1}^n x_i$. Show that

$$
\frac{1}{n} \sum_{i=1}^n (x_i - \bar{x})^2 ~ = ~ \frac{1}{n} \sum_{i=1}^n x_i^2 ~ - ~ \bar{x}^2
$$

Only do algebra if you must. It's better to identify the appropriate application of the familiar random variable fact $Var(X) = E(X^2) - (E(X))^2$.

**(b) Convergence of the plug-in estimator of $\sigma^2$:** Let $\hat{\sigma}_n^2 = \frac{1}{n}\sum_{i=1}^n (X_i - \bar{X_n})^2$ be the mean squared deviation from the sample average. This is the natural "plug in" estimator of the underlying variance $\sigma^2$.

Show that $\hat{\sigma}_n^2$ converges in probability to a constant, and identify the constant. You'll need Part **a** and Problem 4 as well as facts from lecture.

**(c) Using the plug-in estimator of $\sigma^2$:** Typically, you don't know $\sigma^2$. But you can always calculate $\hat{\sigma}^2$. Does $\displaystyle{\frac{\bar{X}_n - \mu}{\hat{\sigma}_n/\sqrt{n}}}$ converge in distribution? If so, to what? Justify your answer carefully, and then use it to provide a formula for an approximate 95\% confidence interval for $\mu$ when $n$ is large. 

**(d) Using the "sample variance" instead:** The less natural estimator $S_n^2 = \frac{1}{n-1}\sum_{i=1}^n (X_i - \bar{X_n})^2 = \frac{n}{n-1}\hat{\sigma}_n^2$ is called the *sample variance.* In your probabiilty class you showed that $S_n^2$ is an unbiased estimator of $\sigma^2$. Does $\displaystyle{\frac{\bar{X}_n - \mu}{S_n/\sqrt{n}}}$ converge in distribution? If so, to what? Justify your answer carefully, and then use it to provide a formula for an approximate 95\% confidence interval for $\mu$ when $n$ is large.  