# Lecture 4 Hands-On: Parameter Estimation

This notebook accompanies Lecture 4. We estimate parameters for Poisson processes, M/M/1 queues, and M/G/1 queues. We also build plug-in estimators for performance metrics and quantify uncertainty via asymptotic CIs and bootstrap.

## Objectives
- Poisson rate MLE from counts on [0, T]; CI and coverage.
- M/M/1: estimate $\lambda$ from arrivals and $\mu$ from service samples or busy-time exposure; plug-in $W_q, L$.
- M/G/1: estimate $\lambda$, service moments $(m_1, m_2)$; Pollaczek–Khinchine plug-in for $\mathbb E[W_q]$ with CI (delta or bootstrap).
- Discuss stability (\rho < 1) and near-critical sensitivity.

In [None]:
import numpy as np
import math
import matplotlib.pyplot as plt
from dataclasses import dataclass

rng = np.random.default_rng(42)
from typing import Tuple
try:
    from scipy.stats import norm
    def z_critical(alpha: float=0.05) -> float:
        # Compute critical z-value for two-sided CI
        return float(norm.ppf(1.0 - alpha/2.0))
except Exception:
    def z_critical(alpha: float=0.05) -> float:
        # Fallback to 1.96 for 95% if scipy unavailable
        return 1.959963984540054 if abs(alpha-0.05)<1e-9 else 1.96

def ci_two_sided(est: float, se: float, alpha: float=0.05) -> Tuple[float,float]:
    # Compute two-sided (1-alpha) CI
    z = z_critical(alpha)
    return est - z*se, est + z*se

### What we'll do

- Model: homogeneous Poisson process observed on [0, T].
- Task: MLE of the rate (λ̂ = N(T)/T), derive a 95% CI, and check coverage by repetition.
- Takeaway: counts-and-exposure are sufficient; exact variance is λ/T; CI stabilises as T grows.

## Part A — Poisson Rate Estimation

In [None]:
def simulate_poisson_process(lam: float, T: float, rng=np.random.default_rng()):
    # Simulate Poisson process with rate lam up to time T
    t = 0.0
    times = []
    while True:
        t += rng.exponential(1.0/lam)
        if t > T:
            break
        times.append(t)
    return np.array(times)

def poisson_mle_from_counts(n_events: int, T: float):
    # MLE for Poisson rate lambda from n_events in time T
    lam_hat = n_events / T
    se = math.sqrt(max(lam_hat, 1e-12) / T)
    lo, hi = ci_two_sided(lam_hat, se)
    return lam_hat, (lo, hi)

In [None]:
# Single-run demo
lam_true, T = 0.8, 1000.0
times = simulate_poisson_process(lam_true, T, rng)
lam_hat, (lo, hi) = poisson_mle_from_counts(len(times), T)
print(f'True λ={lam_true:.3f},  n={len(times)},  T={T}')
print(f'λ̂={lam_hat:.4f},  95% CI=({lo:.4f}, {hi:.4f})')

In [None]:
# Coverage experiment (modest size for runtime)
def coverage_poisson(lam_true=0.8, T=200.0, reps=500, seed=123):
    r = np.random.default_rng(seed)
    cover = 0
    for _ in range(reps):
        n = len(simulate_poisson_process(lam_true, T, r))
        lam_hat, (lo, hi) = poisson_mle_from_counts(n, T)
        cover += (lo <= lam_true <= hi)
    return cover / reps

cov = coverage_poisson()
print(f'Empirical 95% CI coverage ≈ {100*cov:.1f}%')


## Part B — M/M/1: Estimating $\lambda$ and $\mu$

### What we'll do

- Observation schemes: (A) i.i.d. inter-arrivals/services; (B) calendar-time aggregates ($N_A(T)$, $C(T)$, busy time $B(T$)).
- Estimation: $\hat λ$ from counts/exposure, $\hat \mu$ from completions per busy-time; plug into M/M/1 formulae.
- Diagnostics: compare plug-in $W_q$ to empirical mean wait; consider burn-in and near-critical sensitivity.

In [None]:

@dataclass
class MM1Stats:
    """Simulation statistics for a single-server queue.
    arrivals: count of arrivals with arrival_time <= T
    completions: count of completions by time T
    busy_time: total server busy time within [0, T]
    T: observation horizon
    service_samples: array of service times actually started
    waits: array of waiting times (Wq) for completed jobs
    """
    arrivals: int
    completions: int
    busy_time: float
    T: float
    service_samples: np.ndarray
    waits: np.ndarray

def simulate_MM1(T: float, lam: float, mu: float, rng=np.random.default_rng()):
    """Simulate an M/M/1 queue on [0, T] with exponential(λ) inter-arrivals and exponential(μ) services.
    Returns MM1Stats with waits (Wq) and service_samples recorded.
    """
    t = 0.0
    next_arrival = t + rng.exponential(1.0/lam)
    server_busy = False
    service_end = math.inf
    queue = []              # FIFO queue of arrival timestamps
    waits = []              # waiting times before service (Wq)
    service_samples = []    # service times actually started
    arrivals = 0
    completions = 0
    busy_time = 0.0
    busy_start = None

    def start_service(now):
        nonlocal server_busy, service_end, busy_start
        s = rng.exponential(1.0/mu)
        service_samples.append(s)
        server_busy = True
        busy_start = now
        return now + s

    while True:
        t_next = min(next_arrival, service_end)
        if t_next == math.inf:
            break
        t = t_next

        if next_arrival <= service_end:  # arrival
            if t <= T:
                arrivals += 1
                if server_busy:
                    queue.append(t)
                else:
                    waits.append(0.0)  # no waiting when server idle
                    service_end = start_service(t)
            next_arrival = t + rng.exponential(1.0/lam)
        else:  # completion
            if busy_start is not None and busy_start < T:
                busy_time += max(0.0, min(t, T) - busy_start)
                busy_start = None
            if t <= T:
                completions += 1
            if queue:
                a = queue.pop(0)
                waits.append(max(0.0, t - a))
                busy_start = t
                service_end = start_service(t)
            else:
                server_busy = False
                service_end = math.inf
        if next_arrival > T and not server_busy and t >= T:
            break

    if server_busy and busy_start is not None and busy_start < T:
        busy_time += max(0.0, T - busy_start)

    return MM1Stats(arrivals=arrivals, completions=completions, busy_time=busy_time, T=T,
                    service_samples=np.array(service_samples), waits=np.array(waits))

In [None]:
# M/M/1 experiment
# We simulate on [0, T], compute λ̂ from arrivals/exposure, μ̂ from completions per busy-time,
# and compare plug-in Wq to empirical mean wait (Wq).
lam_true, mu_true = 0.8, 1.0  # true parameters (λ, μ)  # ρ=0.8
T = 1000.0
stats = simulate_MM1(T, lam_true, mu_true, rng)
lam_hat = stats.arrivals / stats.T  # λ̂ from arrivals per unit time
mu_hat_busy = stats.completions / max(stats.busy_time, 1e-12)  # μ̂ from completions per busy-time
mu_hat_sample = len(stats.service_samples) / max(stats.service_samples.sum(), 1e-12)  # alternative μ̂ from service samples

# Asymptotic SEs
se_lam = math.sqrt(max(lam_hat, 1e-12) / stats.T)
se_mu_busy = math.sqrt(max(mu_hat_busy, 1e-12) / max(stats.busy_time, 1e-12))
se_mu_sample = mu_hat_sample / math.sqrt(max(len(stats.service_samples), 1,))

def ci(x, se, alpha=0.05):
    lo, hi = ci_two_sided(x, se, alpha)
    return (lo, hi)

print(f'M/M/1 on [0,T], T={T}')
print(f'True λ={lam_true:.3f}, μ={mu_true:.3f};  observed arrivals={stats.arrivals}, completions={stats.completions}, busy_time={stats.busy_time:.1f}')
print(f'λ̂={lam_hat:.4f}  CI95={ci(lam_hat,se_lam)}')
print(f'μ̂ (busy-time) ={mu_hat_busy:.4f}  CI95={ci(mu_hat_busy,se_mu_busy)}')
print(f'μ̂ (from samples)={mu_hat_sample:.4f}  CI95={ci(mu_hat_sample,se_mu_sample)}')

# Plug-in performance
rho_hat = lam_hat / mu_hat_busy  # utilization estimate
Wq_hat = lam_hat / (mu_hat_busy * (mu_hat_busy - lam_hat))  # M/M/1 formula
Wq_true = lam_true / (mu_true * (mu_true - lam_true))
Wq_emp = stats.waits.mean() if stats.waits.size else 0.0  # empirical mean wait (Wq)
den = mu_hat_busy * max(mu_hat_busy - lam_hat, 1e-12)
dlam = (mu_hat_busy**2) / (den**2)
dmu  = -lam_hat * (2*mu_hat_busy - lam_hat) / (den**2)
var_lam = max(lam_hat, 1e-12) / stats.T
var_mu  = max(mu_hat_busy, 1e-12) / max(stats.busy_time, 1e-12)
var_wq  = dlam**2 * var_lam + dmu**2 * var_mu
se_wq   = math.sqrt(max(var_wq, 0.0))
wq_lo, wq_hi = ci_two_sided(Wq_hat, se_wq, alpha=0.05)
print(f'True Wq={Wq_true:.4f},  Plug-in Wq̂={Wq_hat:.4f} (CI95=[{wq_lo:.4f},{wq_hi:.4f}]),  Empirical avg wait≈{Wq_emp:.4f}')

# Plot waiting time histogram
plt.figure(figsize=(5.2,3.4))
plt.hist(stats.waits, bins=40, alpha=0.7, color='tab:blue', density=True)
plt.axvline(Wq_true, color='tab:green', lw=1.8, label=f'True Wq={Wq_true:.2f}')
plt.axvline(Wq_hat, color='tab:orange', lw=1.8, label=f'Plug-in Wq̂={Wq_hat:.2f}')
plt.axvline(Wq_emp, color='tab:red', lw=1.8, label=f'Empirical Wq={Wq_emp:.2f}')
plt.axvspan(wq_lo, wq_hi, color='tab:orange', alpha=0.15, label=f'Plug-in 95% CI')
plt.title('M/M/1 Waiting Time Histogram with Benchmarks')
plt.xlabel('Wq'); plt.ylabel('Density'); plt.grid(True, alpha=0.3); plt.legend()
plt.show()


In [None]:
# Burn-in analysis for M/M/1 (ignore first 10% of completed jobs)
burn = int(0.10*len(stats.waits))
if stats.waits.size == 0:
    Wq_emp_all = float('nan')
    Wq_emp_tail = float('nan')
else:
    Wq_emp_all = float(np.mean(stats.waits))
    Wq_emp_tail = float(np.mean(stats.waits[burn:])) if len(stats.waits) > burn else Wq_emp_all
print(f'Empirical mean wait (all)   = {Wq_emp_all:.4f}')
print(f'Empirical mean wait (90%)   = {Wq_emp_tail:.4f}')
print(f'Plug-in Wq (M/M/1, busy μ̂) = {Wq_hat:.4f}')


### Discussion

- Plug-in vs empirical can differ on finite runs; the difference shrinks with longer T and after discarding initial transients.
- Near ρ→1, small parameter errors amplify Wq (derivatives blow up). Report uncertainty and check stability (ρ̂<1).
- Using busy-time exposure for μ̂ aligns with the MLE under Scheme B and is preferable to 1/mean(S) when only aggregates are available.

## Part C — M/G/1: Moment Estimation and PK Plug-in

### What we'll do

- Estimate λ, m1, m2 from data; compute the PK plug-in Wq = λ m2 / (2(1−ρ)).
- Two CIs: Delta method (asymptotic) and input bootstrap (PK only).
- Important: input bootstrap ≠ system bootstrap; we add a regenerative bootstrap later.

In [None]:

# Ensure simulate_MG1_gamma is defined (fallback if notebook executed out of order)
if 'simulate_MG1_gamma' not in globals():
    def simulate_MG1_gamma(T: float, lam: float, shape: float, scale: float, rng=np.random.default_rng()):
        t = 0.0
        next_arrival = t + rng.exponential(1.0/lam)
        server_busy = False
        service_end = math.inf
        queue = []
        waits = []
        service_samples = []
        arrivals = 0
        completions = 0
        busy_time = 0.0
        busy_start = None
        def start_service(now):
            nonlocal server_busy, service_end, busy_start
            s = rng.gamma(shape=shape, scale=scale)
            service_samples.append(s)
            server_busy = True
            busy_start = now
            return now + s
        while True:
            t_next = min(next_arrival, service_end)
            if t_next == math.inf:
                break
            t = t_next
            if next_arrival <= service_end:
                if t <= T:
                    arrivals += 1
                    if server_busy:
                        queue.append(t)
                    else:
                        waits.append(0.0)  # no wait when server idle
                        service_end = start_service(t)
                next_arrival = t + rng.exponential(1.0/lam)
            else:
                if busy_start is not None and busy_start < T:
                    busy_time += max(0.0, min(t, T) - busy_start)
                    busy_start = None
                if t <= T:
                    completions += 1
                if queue:
                    a = queue.pop(0)
                    waits.append(max(0.0, t - a))
                    busy_start = t
                    service_end = start_service(t)
                else:
                    server_busy = False
                    service_end = math.inf
            if next_arrival > T and not server_busy and t >= T:
                break
        if server_busy and busy_start is not None and busy_start < T:
            busy_time += max(0.0, T - busy_start)
        return MM1Stats(arrivals=arrivals, completions=completions, busy_time=busy_time, T=T,
                        service_samples=np.array(service_samples), waits=np.array(waits))


### Part C — M/G/1 with Gamma service: what we estimate

- We simulate M/G/1 with Poisson(λ) inter-arrivals and Gamma(shape, scale) services; we record zero waits when service starts immediately.
- Estimate λ̂ from counts/exposure, m̂1, m̂2 from service samples; compute PK plug-in Wq̂ and its Delta-method CI.
- Compare with empirical mean wait; finite-horizon effects and variability can cause gaps that shrink with longer T and burn-in.

In [None]:
# Choose Gamma service with mean 1 and variance 0.5 -> shape=2, scale=0.5
shape, scale = 2.0, 0.5
m1_true = shape*scale
m2_true = (shape*scale)**2 + shape*scale**2
lam_true = 0.7
rho_true = lam_true * m1_true
assert rho_true < 1
T = 20000.0
stats_mg1 = simulate_MG1_gamma(T, lam_true, shape, scale, rng)

lam_hat = stats_mg1.arrivals / stats_mg1.T
S = stats_mg1.service_samples
m1_hat = S.mean()
m2_hat = (S**2).mean()
rho_hat = lam_hat * m1_hat
Wq_true = lam_true * m2_true / (2*(1 - rho_true))
Wq_hat = lam_hat * m2_hat / (2*(1 - rho_hat))
Wq_emp = stats_mg1.waits.mean() if stats_mg1.waits.size else 0.0
# Delta-method CI for PK plug-in (λ, m1, m2)
nS = len(S)
rho_hat = lam_hat * m1_hat
g_lam = m2_hat / (2.0 * (1.0 - rho_hat)**2)
g_m1  = (lam_hat**2) * m2_hat / (2.0 * (1.0 - rho_hat)**2)
g_m2  = lam_hat / (2.0 * (1.0 - rho_hat))
var_lam = max(lam_hat, 1e-12) / stats_mg1.T
var_S   = float(np.var(S, ddof=1)) if nS>1 else 0.0
S2      = S**2
var_S2  = float(np.var(S2, ddof=1)) if nS>1 else 0.0
cov_S_S2 = float(np.cov(S, S2, ddof=1)[0,1]) if nS>1 else 0.0
var_m1 = var_S / max(nS,1)
var_m2 = var_S2 / max(nS,1)
cov_m1_m2 = cov_S_S2 / max(nS,1)
var_wq = (g_lam**2)*var_lam + (g_m1**2)*var_m1 + (g_m2**2)*var_m2 + 2*g_m1*g_m2*cov_m1_m2
se_wq = math.sqrt(max(var_wq, 0.0))
wq_lo_gamma, wq_hi_gamma = ci_two_sided(Wq_hat, se_wq, alpha=0.05)
print(f'Gamma service: m1_true={m1_true:.3f}, m2_true={m2_true:.3f},  ρ_true={rho_true:.3f}')
print(f'λ̂={lam_hat:.4f}, m1̂={m1_hat:.4f}, m2̂={m2_hat:.4f},  ρ̂={rho_hat:.4f}')
print(f'Wq_true={Wq_true:.4f},  Plug-in Wq̂={Wq_hat:.4f} (CI95=[{wq_lo_gamma:.4f},{wq_hi_gamma:.4f}]),  Empirical avg wait≈{Wq_emp:.4f}')

# Bootstrap CI for Wq plug-in
def bootstrap_Wq(S, lam_hat, B=300, seed=123):
    r = np.random.default_rng(seed)
    n = len(S)
    out = np.empty(B)
    for b in range(B):
        Sb = r.choice(S, size=n, replace=True)
        m1b = Sb.mean()
        m2b = (Sb**2).mean()
        rhob = lam_hat * m1b
        out[b] = lam_hat * m2b / (2 * max(1e-9, (1 - rhob)))
    lo, hi = np.percentile(out, [2.5, 97.5])
    return (lo, hi), out

(lo, hi), samples = bootstrap_Wq(S, lam_hat)
print(f'Bootstrap 95% CI for PK plug-in Wq: ({lo:.4f}, {hi:.4f})')

plt.figure(figsize=(5.2,3.4))
plt.hist(samples, bins=40, alpha=0.75, color='tab:green', density=True)
plt.axvline(Wq_hat, color='k', lw=1.5, label='Plug-in Wq̂')
plt.axvline(Wq_emp, color='tab:red', lw=1.5, label='Empirical mean wait')
plt.title('Bootstrap distribution of Wq plug-in (M/G/1)')
plt.xlabel('Wq'); plt.ylabel('Density'); plt.grid(True, alpha=0.3); plt.legend()
plt.show()

# Wait-time histogram with PK plug-in CI band
plt.figure(figsize=(5.2,3.4))
plt.hist(stats_mg1.waits, bins=40, alpha=0.7, color='tab:purple', density=True)
plt.axvline(Wq_true, color='tab:green', lw=1.8, label=f'True Wq={Wq_true:.2f}')
plt.axvline(Wq_hat, color='tab:orange', lw=1.8, label=f'PK plug-in={Wq_hat:.2f}')
plt.axvline(Wq_emp, color='tab:red', lw=1.8, label=f'Empirical Wq={Wq_emp:.2f}')
plt.axvspan(wq_lo_gamma, wq_hi_gamma, color='tab:orange', alpha=0.15, label='PK plug-in 95% CI')
plt.title('M/G/1 (Gamma) Waiting Time Histogram with Benchmarks')
plt.xlabel('Wq'); plt.ylabel('Density'); plt.grid(True, alpha=0.3); plt.legend()
plt.show()


In [None]:
# Burn-in analysis for M/G/1 (Gamma) using job order (approximate)
burn = int(0.10*len(stats_mg1.waits))
Wq_emp_all = stats_mg1.waits.mean() if stats_mg1.waits.size else float('nan')
Wq_emp_tail = stats_mg1.waits[burn:].mean() if stats_mg1.waits.size>burn else float('nan')
print(f'Empirical mean wait (all)   = {Wq_emp_all:.4f}')
print(f'Empirical mean wait (90%)   = {Wq_emp_tail:.4f}')
print(f'PK plug-in Wq               = {Wq_hat:.4f}')


## Part D — Case Study: Identify and Fit a Queue Model from Data

Scenario

A single-agent support chat handles tickets during a business window. A triage tool and internal SOP enforce **bounded handling times** (SLA). The system logs for each ticket: arrival_time, service_time, start_service_time, completion_time, wait_time, system_time, and queue_len_at_arrival.
You are given the raw log `data/lecture4_case_study.csv`. The observation is contiguous (no gaps), though mild time-of-day effects may exist.

Goal

- From the data alone, **infer a plausible single-server queue model** (e.g., M/M/1, M/G/1 with a bounded or heavy-tailed family, or GI/G/1).
- Justify assumptions, quantify fit, and discuss limitations.

Hints

- Bounded service suggests a **Uniform or truncated** family may be reasonable; verify with histograms/ECDF and summary stats.
- Check whether inter-arrivals are **Poisson-like**: exponential inter-arrival distribution, and a roughly constant rate over time (or segment by time-of-day if needed).
- Compute **utilisation** $\hat \rho = \hat \lambda \hat m_1$ and ensure $\hat \rho<1$.
- Compare empirical mean wait to a model-based prediction (e.g., M/M/1 formula or M/G/1 PK if you choose that model).
- Use **regenerative bootstrap** (busy cycles) to form a system-level CI for mean wait; use **input bootstrap** only if you assume a PK plug-in.
- Iterate: refine the chosen service family (Uniform, Gamma, Lognormal, truncated variants), or question Poisson arrivals if rate varies strongly.

Deliverables

- A concise write-up: your chosen model, evidence (plots/statistics), fitted parameters, and two CIs (input PK if relevant; regenerative).
- Short discussion of alternative models you considered and why they were rejected.


In [None]:
import pandas as pd, os
CANDIDATES=['data/lecture4_case_study.csv']
DATA_PATH=next((p for p in CANDIDATES if os.path.exists(p)), None)
assert DATA_PATH, 'Missing case study CSV'
df=pd.read_csv(DATA_PATH)
print(f'Loaded dataset: {DATA_PATH} with {len(df)} rows.')

#### Checklist
- [ ] Nonnegative waits and start ≥ arrival
- [ ] Inter-arrival diagnostics (histograms/ECDF, stationarity)
- [ ] Service diagnostics (boundedness, candidate families)
- [ ] Estimated $\hat \lambda, \hat m_1, \hat m_2, \hat \rho$ with $\hat \rho <1$
- [ ] Model-based Wq vs empirical Wq with CIs
- [ ] Regenerative bootstrap CI for mean wait
- [ ] Clear justification of the chosen model
