# Lecture Notes Toolkit - Fundamentals of Estimation (Ch. 5) + Random Variable Generation (Ch. 6)

This notebook is a **reusable toolkit** for the material in the provided notes (pages 88-101).
It mixes:
- short explanations (what a definition/theorem means in practice), and  
- **Python functions you can directly reuse** by swapping `X`, `y`, `n`, `alpha`, `a,b,M`, etc.

> Tip: Run the notebook top-to-bottom once. Then treat the function sections as your "library." ✅


In [None]:
# Core imports
import numpy as np
import math
from dataclasses import dataclass
from typing import Callable, Dict, Tuple, Optional, Iterable, List

# Optional plotting (safe to ignore if you don't need plots)
import matplotlib.pyplot as plt


## Chapter 5 - Fundamentals of Estimation

### 5.1-5.2 Point estimation (the practical view)

- **Data**: an i.i.d. sample `X = (X1, …, Xn)` from some unknown distribution (or a parametric family).
- **Statistic / estimator**: any function of the data. In code: a Python function `theta_hat = estimator(x)`.
- **Bias**: `E[theta_hat] - theta*` (usually unknown in real life).
- **Standard error (SE)**: `sqrt(Var(theta_hat))`.
- **MSE**: `E[(theta_hat - theta*)^2]` and the key identity:

\[ \mathrm{MSE} = \mathrm{Var}(\hat\theta) + (\mathrm{Bias})^2 \]

In practice, bias/SE/MSE are commonly assessed by **simulation** (if you know the data-generating model) or by **bootstrap** (if you only have data).


In [None]:
# -----------------------------
# Generic estimation utilities
# -----------------------------

@dataclass
class EstimatorDiagnostics:
    """Holds Monte Carlo diagnostics for an estimator."""
    theta_star: float
    mean_hat: float
    bias: float
    se: float
    mse: float

def simulate_sampling_distribution(
    sampler: Callable[[int], np.ndarray],
    estimator: Callable[[np.ndarray], float],
    n: int,
    theta_star: float,
    n_sims: int = 10_000,
    seed: Optional[int] = 0,
) -> Tuple[np.ndarray, EstimatorDiagnostics]:
    """
    Monte Carlo approximation of the sampling distribution of an estimator.

    Parameters
    ----------
    sampler : function(n) -> sample array of length n
    estimator : function(sample) -> float
    n : sample size per simulation
    theta_star : true parameter value for the simulation model
    n_sims : number of Monte Carlo replications
    seed : RNG seed (None for random)

    Returns
    -------
    theta_hats : array of estimated values across simulations
    diagnostics : EstimatorDiagnostics (mean_hat, bias, se, mse)
    """
    rng = np.random.default_rng(seed)
    theta_hats = np.empty(n_sims, dtype=float)
    for s in range(n_sims):
        x = sampler(n) if seed is None else sampler_with_rng(sampler, n, rng)
        theta_hats[s] = estimator(x)

    mean_hat = float(np.mean(theta_hats))
    bias = mean_hat - float(theta_star)
    se = float(np.std(theta_hats, ddof=0))
    mse = float(np.mean((theta_hats - theta_star) ** 2))
    return theta_hats, EstimatorDiagnostics(theta_star, mean_hat, bias, se, mse)

def sampler_with_rng(sampler: Callable[[int], np.ndarray], n: int, rng: np.random.Generator) -> np.ndarray:
    """
    Helper: allow samplers that internally call numpy global RNG.
    If your sampler already uses `rng`, you can ignore this and pass seed=None above.
    """
    return sampler(n)

def bootstrap_se(
    x: np.ndarray,
    estimator: Callable[[np.ndarray], float],
    n_boot: int = 2000,
    seed: Optional[int] = 0,
) -> float:
    """
    Bootstrap estimate of standard error of an estimator.
    Works when the true distribution/parameter is unknown.
    """
    rng = np.random.default_rng(seed)
    n = len(x)
    hats = np.empty(n_boot, dtype=float)
    for b in range(n_boot):
        xb = rng.choice(x, size=n, replace=True)
        hats[b] = estimator(xb)
    return float(np.std(hats, ddof=1))


### Example 5.7 + 5.11 + 5.16 - Bernoulli mean estimator

If `Xi ~ Bernoulli(theta*)`, the sample mean \(\bar X\) is:
- unbiased, and
- \(\mathrm{SE}(\bar X) = \sqrt{\theta^*(1-\theta^*)/n}\) → 0, so it's consistent.

Below: compare theory vs simulation.


In [None]:
def bernoulli_sampler(theta: float, seed: Optional[int] = None) -> Callable[[int], np.ndarray]:
    rng = np.random.default_rng(seed)
    def _s(n: int) -> np.ndarray:
        return rng.binomial(1, theta, size=n).astype(float)
    return _s

def sample_mean(x: np.ndarray) -> float:
    return float(np.mean(x))

theta_star = 0.3
n = 50
sampler = bernoulli_sampler(theta_star, seed=123)
theta_hats, diag = simulate_sampling_distribution(
    sampler=lambda m: sampler(m),
    estimator=sample_mean,
    n=n,
    theta_star=theta_star,
    n_sims=5000,
    seed=None,  # sampler has its own rng
)

theory_se = math.sqrt(theta_star * (1 - theta_star) / n)
diag, theory_se


In [None]:
# Quick visualization of the sampling distribution
plt.figure()
plt.hist(theta_hats, bins=40)
plt.title("Sampling distribution of sample mean (Bernoulli)")
plt.xlabel("theta_hat")
plt.ylabel("count")
plt.show()


### 5.14-5.15 - MSE = SE² + Bias², and a practical consistency check

If you can argue (or estimate) that **bias → 0** and **SE → 0** as `n` grows, then the estimator is consistent.

Below: a quick numeric "consistency curve" for the Bernoulli mean.


In [None]:
def theory_se_bernoulli_mean(theta: float, n: int) -> float:
    return math.sqrt(theta * (1 - theta) / n)

ns = [10, 20, 50, 100, 200, 500, 1000]
theory = [theory_se_bernoulli_mean(theta_star, k) for k in ns]

plt.figure()
plt.plot(ns, theory, marker='o')
plt.title("SE of Bernoulli sample mean shrinks as n grows")
plt.xlabel("n")
plt.ylabel("theoretical SE")
plt.xscale("log")
plt.show()


### 5.17-5.18 - Sub-Gaussian tails ⇒ MSE bound (a usable template)

The notes show: if a centered RV `Y` has a sub-Gaussian tail bound
\(P(|Y| \ge \epsilon) \le 2e^{-c_0\epsilon^2}\),
then you can bound the second moment `E[Y^2]` by a constant times `1/c0`.

A common way this appears in ML:
- If `X1,…,Xn` are i.i.d. sub-Gaussian with parameter `σ`, then the sample mean has
  \(\mathrm{MSE}(\bar X) \lesssim \sigma^2/n\).

Below is a **practical helper**: estimate MSE of sample mean from simulations, compare to a `sigma^2/n` rate.


In [None]:
def mse_of_sample_mean_gaussian(sigma: float, n: int, n_sims: int = 5000, seed: int = 0) -> float:
    rng = np.random.default_rng(seed)
    means = rng.normal(0, sigma, size=(n_sims, n)).mean(axis=1)
    return float(np.mean(means**2))  # theta* = 0

sigma = 2.0
for n in [10, 20, 50, 100, 200, 500]:
    mse = mse_of_sample_mean_gaussian(sigma, n, n_sims=20000, seed=1)
    print(f"n={n:4d}  empirical MSE={mse:.5f}  sigma^2/n={sigma**2/n:.5f}")


## 5.3 Non-parametric DF estimation

### Empirical distribution function (EDF / ECDF)

Given i.i.d. data `X1,…,Xn`, the empirical CDF is:
\[ \hat F_n(x) = \frac{1}{n}\sum_{i=1}^n \mathbf{1}\{X_i \le x\} \]

Key properties from the notes:
- **Unbiased**: `E[Fn(x)] = F(x)` for each fixed `x`
- `Var(Fn(x)) = F(x)(1-F(x))/n`
- **Hoeffding**: for fixed `x`, `P(|Fn(x)-F(x)|>ε) ≤ 2 exp(-2nε²)`
- **DKW inequality** (uniform over x):  
  `P( sup_x |Fn(x)-F(x)| > ε ) ≤ 2 exp(-2nε²)`

The DKW inequality gives an **easy confidence band**:
Choose `ε = sqrt(log(2/α) / (2n))` then  
`P( sup_x |Fn(x)-F(x)| ≤ ε ) ≥ 1-α`.


In [None]:
# -----------------------------
# ECDF + DKW confidence bands
# -----------------------------

def ecdf(x_sample: np.ndarray) -> Callable[[np.ndarray], np.ndarray]:
    """Returns a callable Fhat(x) for the empirical CDF built from x_sample."""
    xs = np.sort(np.asarray(x_sample))
    n = xs.size

    def Fhat(x: np.ndarray) -> np.ndarray:
        x = np.asarray(x)
        return np.searchsorted(xs, x, side='right') / n

    return Fhat

def dkw_epsilon(n: int, alpha: float = 0.05) -> float:
    """DKW half-width for a (1-alpha) uniform confidence band."""
    if not (0 < alpha < 1):
        raise ValueError("alpha must be in (0,1).")
    return math.sqrt(math.log(2/alpha) / (2*n))

def ecdf_confidence_band(
    x_sample: np.ndarray,
    grid: np.ndarray,
    alpha: float = 0.05,
) -> Tuple[np.ndarray, np.ndarray, np.ndarray]:
    """Returns (Fn(grid), lower(grid), upper(grid)) where lower/upper use DKW band."""
    Fhat = ecdf(x_sample)
    Fn = Fhat(grid)
    eps = dkw_epsilon(len(x_sample), alpha=alpha)
    lower = np.clip(Fn - eps, 0.0, 1.0)
    upper = np.clip(Fn + eps, 0.0, 1.0)
    return Fn, lower, upper


In [None]:
# Demo: ECDF + DKW band on a known distribution (Normal)
rng = np.random.default_rng(0)
n = 200
x = rng.normal(loc=0.0, scale=1.0, size=n)

grid = np.linspace(-3, 3, 400)
Fn, lo, hi = ecdf_confidence_band(x, grid, alpha=0.05)

# True CDF for comparison (Normal) using erf approximation
def normal_cdf(z: np.ndarray, mu: float = 0.0, sigma: float = 1.0) -> np.ndarray:
    z = (np.asarray(z) - mu) / (sigma * math.sqrt(2))
    return 0.5 * (1 + np.vectorize(math.erf)(z))

Ftrue = normal_cdf(grid)

plt.figure()
plt.plot(grid, Fn, label="ECDF Fn")
plt.plot(grid, Ftrue, label="True F")
plt.plot(grid, lo, linestyle="--", label="DKW lower")
plt.plot(grid, hi, linestyle="--", label="DKW upper")
plt.title("ECDF with DKW (uniform) confidence band")
plt.xlabel("x")
plt.ylabel("CDF")
plt.legend()
plt.show()


### Extra: Supremum deviation (Kolmogorov-Smirnov style statistic)

The DKW inequality controls `sup_x |Fn(x)-F(x)|`. The empirical version of that sup distance is what the KS test uses.
Below is a small helper you can use whenever you have a known target CDF.


In [None]:
def sup_cdf_distance(sample: np.ndarray, cdf: Callable[[np.ndarray], np.ndarray]) -> float:
    """
    Computes sup_x |Fn(x) - F(x)| using the sample support.
    (Exact supremum over R occurs at sample points for ECDF.)
    """
    xs = np.sort(np.asarray(sample))
    n = xs.size
    F = cdf(xs)

    Fn_right = np.arange(1, n+1) / n
    Fn_left = np.arange(0, n) / n

    d1 = np.max(np.abs(Fn_right - F))
    d2 = np.max(np.abs(Fn_left - F))
    return float(max(d1, d2))

d = sup_cdf_distance(x, lambda t: normal_cdf(t, 0.0, 1.0))
d


### Relative entropy (KL) risk viewpoint (practical helper)

The notes define a loss leading to KL divergence (up to constants). A common computational form is KL between two discrete distributions on a grid.


In [None]:
def kl_divergence_discrete(p: np.ndarray, q: np.ndarray, eps: float = 1e-12) -> float:
    """
    KL(p||q) for discrete distributions on the same grid.
    `p` and `q` should be nonnegative and sum to 1.
    """
    p = np.asarray(p, dtype=float)
    q = np.asarray(q, dtype=float)
    p = p / np.sum(p)
    q = q / np.sum(q)
    p = np.clip(p, eps, 1.0)
    q = np.clip(q, eps, 1.0)
    return float(np.sum(p * np.log(p / q)))


## Chapter 6 - Random Variable Generation

Computers are deterministic, so we generate **pseudo-random** sequences.
The notes start with **discrete uniform pseudo-randomness** over  
`M = {0,1,…,M-1}` and then study **congruential (linear) generators**:

\[ u_{i} = (a u_{i-1} + b) \bmod M \]

Key concepts:
- **Period**: how long before the sequence repeats.
- If the period is `M` (full period), then the sequence visits every state once per cycle ⇒ uniform frequencies.
- **Hull-Dobell theorem** gives conditions for full period.


In [None]:
# -----------------------------
# Congruential generator (LCG)
# -----------------------------

def lcg(a: int, b: int, M: int, seed: int):
    """
    Linear congruential generator (LCG):
        u_{t+1} = (a*u_t + b) mod M
    Yields an infinite sequence of integers in {0,...,M-1}.
    """
    if M <= 0:
        raise ValueError("M must be positive.")
    u = seed % M
    while True:
        yield u
        u = (a * u + b) % M

def lcg_sequence(a: int, b: int, M: int, seed: int, n: int) -> np.ndarray:
    """Get first n values from LCG as a numpy array."""
    gen = lcg(a, b, M, seed)
    return np.fromiter((next(gen) for _ in range(n)), dtype=int, count=n)

def estimate_period(a: int, b: int, M: int, seed: int, max_steps: int = 5_000_000) -> int:
    """
    Detect period by finding first repeat of the seed state.
    Educational/testing for smaller M.
    """
    gen = lcg(a, b, M, seed)
    first = next(gen)
    for t in range(1, max_steps + 1):
        if next(gen) == first:
            return t
    raise RuntimeError("Period not found within max_steps; increase max_steps or use smaller M.")


### 6.1 - Checking "uniformly pseudorandom" on a finite set

Definition (informal): frequencies of each value should approach `1/M`.
Below: frequency checker + a small demo.


In [None]:
def frequency_table(seq: np.ndarray, M: int) -> np.ndarray:
    """Returns normalized frequencies for values 0..M-1 from seq."""
    counts = np.bincount(seq.astype(int), minlength=M)
    return counts / counts.sum()

def uniformity_score(freqs: np.ndarray) -> float:
    """One simple score: max absolute deviation from uniform frequency."""
    M = len(freqs)
    return float(np.max(np.abs(freqs - 1.0 / M)))

# Demo: (3,0,16) example from notes
a, b, M, seed = 3, 0, 16, 1
seq = lcg_sequence(a, b, M, seed, n=200)
freqs = frequency_table(seq, M)
period = estimate_period(a, b, M, seed)
period, uniformity_score(freqs)


### Lemma 6.9 - Restricting a generator to a smaller range

Map `ui` in `{0,…,M-1}` to `vi` in `{0,…,K-1}` via:
\[ v_i = \lfloor (u_i / M) \cdot K \rfloor \]

Reusable helper below.


In [None]:
def map_to_range(u: np.ndarray, M: int, K: int) -> np.ndarray:
    """Map integer sequence u in [0, M-1] to integers in [0, K-1]."""
    u = np.asarray(u, dtype=float)
    return np.floor((u / M) * K).astype(int)

K = 8
v = map_to_range(seq, M=M, K=K)
freqs_v = frequency_table(v, K)
uniformity_score(freqs_v), freqs_v


### Hull-Dobell theorem (full period conditions)

LCG `(a,b,M)` has full period `M` iff:
1) `gcd(b, M) = 1`  
2) For every prime `p` dividing `M`, `p | (a-1)`  
3) If `4 | M` then `4 | (a-1)`

Reusable checker below.


In [None]:
def prime_factors(n: int) -> List[int]:
    """Return unique prime factors of n."""
    n = abs(int(n))
    factors = []
    if n < 2:
        return factors
    if n % 2 == 0:
        factors.append(2)
        while n % 2 == 0:
            n //= 2
    p = 3
    while p * p <= n:
        if n % p == 0:
            factors.append(p)
            while n % p == 0:
                n //= p
        p += 2
    if n > 1:
        factors.append(n)
    return factors

def hull_dobell_full_period(a: int, b: int, M: int) -> Dict[str, object]:
    """Checks Hull-Dobell conditions and returns which ones pass."""
    if M <= 0:
        raise ValueError("M must be positive.")
    cond1 = math.gcd(b, M) == 1
    primes = prime_factors(M)
    cond2 = all(((a - 1) % p == 0) for p in primes)
    cond3 = True if (M % 4 != 0) else ((a - 1) % 4 == 0)
    return {
        "gcd(b,M)=1": cond1,
        "p | (a-1) for all primes p|M": cond2,
        "if 4|M then 4|(a-1)": cond3,
        "ALL (full period expected)": (cond1 and cond2 and cond3),
        "prime_factors(M)": primes,
    }

hull_dobell_full_period(a=3, b=0, M=16)


### Lemma 6.11 - Long-run mean and variance for a uniform sequence on M

If `u_i` is uniformly pseudorandom on `{0,…,M-1}`, then (empirically):

- mean → `(M-1)/2`
- variance → `(M^2-1)/12`


In [None]:
def theoretical_discrete_uniform_moments(M: int) -> Tuple[float, float]:
    mean = (M - 1) / 2.0
    var = (M**2 - 1) / 12.0
    return mean, var

def empirical_moments(u: np.ndarray) -> Tuple[float, float]:
    u = np.asarray(u, dtype=float)
    return float(np.mean(u)), float(np.var(u, ddof=0))

# Demo on a full-period LCG example (common choice when M=2^k)
a2, b2, M2, seed2 = 5, 1, 2**10, 7
hull_dobell_full_period(a2, b2, M2)["ALL (full period expected)"]


In [None]:
u2 = lcg_sequence(a2, b2, M2, seed2, n=100000)
emp = empirical_moments(u2)
theo = theoretical_discrete_uniform_moments(M2)
emp, theo


In [None]:
plt.figure()
plt.hist(u2 / M2, bins=50)
plt.title("LCG values scaled to [0,1] (histogram)")
plt.xlabel("u/M")
plt.ylabel("count")
plt.show()


## Quick "function index" (copy/paste friendly)

### Estimation (Ch. 5)
- `simulate_sampling_distribution(sampler, estimator, n, theta_star, n_sims, seed)`
- `bootstrap_se(x, estimator, n_boot, seed)`
- `ecdf(x_sample)`
- `dkw_epsilon(n, alpha)`
- `ecdf_confidence_band(x_sample, grid, alpha)`
- `sup_cdf_distance(sample, cdf)`
- `kl_divergence_discrete(p, q)`

### Random generation (Ch. 6)
- `lcg(a,b,M,seed)` / `lcg_sequence(a,b,M,seed,n)`
- `estimate_period(a,b,M,seed)` (educational for small M)
- `frequency_table(seq,M)` / `uniformity_score(freqs)`
- `map_to_range(u, M, K)`
- `prime_factors(M)` / `hull_dobell_full_period(a,b,M)`
- `theoretical_discrete_uniform_moments(M)` / `empirical_moments(u)`
