# Lecture 6 — Model Selection and Optimisation

From fitted queueing models to simulation-based decisions with costs and confidence intervals.

## Learning Objectives
- Choose and test queue models against data/assumptions.
- Simulate G/G/c systems (plain Python) with optional abandonment.
- Show how structural misspecification (ignoring abandonment) biases conclusions.
- Run scenario analysis and simple cost-based comparisons.
- Practice designing robustness/diagnostics exercises.

## Notebook Roadmap
1. Define a toy queueing system (no external data).
2. Simulation helper for G/G/c with optional abandonment.
3. Baseline runs (no abandonment) and metrics.
4. Toy misspecification: true abandonment, naive model without abandonment.
5. Scenario exploration (servers, demand multipliers).
6. Optional cost comparison.
7. Exercises for students.

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import heapq
from typing import Callable, Tuple

plt.style.use("seaborn-v0_8-darkgrid")
pd.set_option("display.precision", 4)

RNG = np.random.default_rng


## Toy System Definition

We simulate from a synthetic stationary system. Change parameters below to explore different behaviours.

In [None]:
# Base toy parameters
BASE_PARAMS = {
    "lambda_rate": 0.9,   # arrival rate
    "delta": 0.2,         # service shift (deterministic part)
    "theta": 1.4,         # exponential tail rate
    "servers": 2,         # baseline server count
    "run_time": 2000.0,
    "warmup": 200.0,
}
print(BASE_PARAMS)


## Simulation Helper (Optional Abandonment)

Plain-Python discrete-event simulator for a G/G/c system with optional customer abandonment. Pass a patience sampler to enable reneging; leave as `None` for infinite patience.

Patience sampler
- Purpose: when provided, the simulator samples a patience duration for each arriving customer and computes a patience deadline = arrival_time + patience_duration. If the customer's service does not start before that deadline they abandon (reneging).
- Signature: a callable of form `patience_sampler(rng) -> float`, where `rng` is a numpy.random.Generator and the returned float is a non‑negative time (same time units as `run_time`, `warmup`, service times).
- Behaviour:
    - If `patience_sampler is None` customers have infinite patience (no abandonments).
    - If a sampled deadline < current time when a server frees, the customer is counted as abandoned and removed from the queue.
    - Abandoned customers are included in the `abandoned` metric; for reporting the simulator records their wait up to the deadline (service time = 0) so they contribute to wait/system summaries appropriately.
- Examples:
    - Exponential patience with mean 1.0: `patience_sampler = lambda rng: rng.exponential(1.0)`
    - Deterministic patience of 10 time units: `patience_sampler = lambda rng: 10.0`
- Notes:
    - Keep units consistent across arrival/service/patience times.
    - To approximate “very patient” customers use a large constant or `None` for true infinite patience.


In [None]:
from typing import Callable

def make_shifted_exp_sampler(delta: float, theta: float) -> Callable[[np.random.Generator], float]:
    """
    Factory function to create a sampler for Shifted Exponential distribution.
    Service time S = delta + X, where X ~ Exp(theta).
    
    Args:
        delta: Minimum service time (shift).
        theta: Rate parameter for the exponential component.
    """
    def sampler(rng: np.random.Generator) -> float:
        # rng.exponential(scale) uses scale = 1/rate
        return delta + rng.exponential(1.0 / theta)
    return sampler


def simulate_queue(
    lambda_rate: float,
    service_sampler: Callable[[np.random.Generator], float],
    c: int = 2,
    run_time: float = 2000.0,
    warmup: float = 200.0,
    seed: int | None = None,
    max_jobs: int = 200000,
    patience_sampler: Callable[[np.random.Generator], float] | None = None,
) -> dict:
    """
    Simulates a G/G/c queue with optional customer abandonment (reneging).
    
    This Discrete Event Simulation (DES) tracks the state of the queue over time,
    handling arrivals, service completions, and customer patience.

    Args:
        lambda_rate: Arrival rate (Poisson process).
        service_sampler: Function returning random service times.
        c: Number of servers.
        run_time: Total simulation duration.
        warmup: Time to run before collecting statistics (transient removal).
        seed: Random seed for reproducibility.
        max_jobs: Safety limit on total departures to prevent infinite loops.
        patience_sampler: Optional function returning random patience times.
                          If provided, customers leave the queue if wait > patience.
                          If None, customers have infinite patience.

    Returns:
        Dictionary containing performance metrics (mean wait, utilization, etc.).
    """
    rng = RNG(seed)
    t = 0.0         # Current simulation clock
    last_t = 0.0    # Time of the previous event (for integral calculation)
    
    # Generate the first arrival
    next_arrival = rng.exponential(1.0 / lambda_rate)
    
    # Event Heap: Stores (completion_time, arrival_time, start_time, service_time)
    # Ordered by completion_time (min-heap)
    completions = [] 
    
    # Queue: List of (arrival_time, patience_deadline)
    # patience_deadline is the absolute time by which service MUST start to avoid abandonment.
    queue = []        
    
    # State variables
    busy = 0          # Number of servers currently busy
    area_queue = 0.0  # Time-integrated queue length (for Lq)
    area_busy = 0.0   # Time-integrated busy servers (for Utilization)
    wait_samples = [] # Collected wait times
    system_samples = [] # Collected system times (wait + service)
    departures = 0    # Count of served customers
    abandoned = 0     # Count of customers who reneged

    while t < run_time and departures < max_jobs:
        # --- 1. Determine Next Event ---
        # The next event is either a new arrival or a service completion.
        next_completion = completions[0][0] if completions else float('inf')
        next_event = min(next_arrival, next_completion)
        
        # Check if simulation time is over
        if next_event > run_time:
            # Add final slice of statistics up to run_time
            eff_start = max(last_t, warmup)
            if run_time > eff_start:
                dt = run_time - eff_start
                area_queue += len(queue) * dt
                area_busy += busy * dt
            break

        # --- 2. Update Statistics (Time Integration) ---
        # We calculate the area under the curve for queue length and busy servers.
        # Only accumulate stats if we are past the warmup period.
        if next_event > warmup:
            eff_start = max(last_t, warmup)
            dt = next_event - eff_start
            if dt > 0:
                area_queue += len(queue) * dt
                area_busy += busy * dt

        # Advance clock
        t = next_event
        last_t = next_event

        # --- 3. Process Event ---
        if next_arrival <= next_completion:
            # === ARRIVAL EVENT ===
            
            # Calculate patience deadline if the feature is enabled.
            # Deadline = Current Time + Sampled Patience Duration.
            patience_deadline = None
            if patience_sampler is not None:
                patience_deadline = t + patience_sampler(rng)
            
            if busy < c:
                # Server available: Start service immediately.
                busy += 1
                s = service_sampler(rng)
                completion_time = t + s
                completions.append((completion_time, t, t, s))
                completions.sort(key=lambda x: x[0])
            else:
                # All servers busy: Add to queue with deadline.
                queue.append((t, patience_deadline))
            
            # Schedule next arrival
            next_arrival = t + rng.exponential(1.0 / lambda_rate)
        else:
            # === COMPLETION EVENT ===
            
            # Free the server and record stats for the departing customer
            completion_time, arrival_time, start_time, service_time = completions.pop(0)
            busy -= 1
            
            if arrival_time >= warmup:
                wait = start_time - arrival_time
                system_time = wait + service_time
                wait_samples.append(wait)
                system_samples.append(system_time)
                departures += 1
            
            # Assign the newly free server to the next customer in queue.
            # CRITICAL: We must check for abandoned customers here.
            while queue and busy < c:
                arrival_time, patience_deadline = queue.pop(0) # FIFO
                
                # Check if the customer has already lost patience.
                # If patience_deadline < current time 't', they left before we could serve them.
                if patience_deadline is not None and patience_deadline < t:
                    abandoned += 1
                    
                    # Record stats for abandoned customers too.
                    # They waited until their deadline, then left (service time = 0).
                    if arrival_time >= warmup:
                        wait = patience_deadline - arrival_time
                        wait_samples.append(wait)
                        system_samples.append(wait)

                    # This customer is gone; loop again to check the next person.
                    continue
                
                # Customer is still waiting: Start service.
                busy += 1
                s = service_sampler(rng)
                start_time = t
                completion_time = t + s
                completions.append((completion_time, arrival_time, start_time, s))
                completions.sort(key=lambda x: x[0])

    # --- 4. Finalize Metrics ---
    horizon = max(run_time - warmup, 1.0)
    return {
        "mean_wait": float(np.mean(wait_samples)) if wait_samples else np.nan,
        "mean_system": float(np.mean(system_samples)) if system_samples else np.nan,
        "lq_timeavg": area_queue / horizon,
        "utilisation": area_busy / (horizon * c),
        "departures": departures,
        "abandoned": abandoned,
    }


def run_replications(
    config: dict,
    n_rep: int = 20,
    run_time: float = 2000.0,
    warmup: float = 200.0,
    seed: int = 0,
    patience_sampler: Callable[[np.random.Generator], float] | None = None,
) -> pd.DataFrame:
    """
    Runs multiple independent replications of the simulation.
    Useful for generating confidence intervals and smoothing out stochastic noise.
    """
    sampler = make_shifted_exp_sampler(config["delta"], config["theta"])
    rows = []
    for r in range(n_rep):
        rows.append(
            simulate_queue(
                lambda_rate=config["lambda_rate"],
                service_sampler=sampler,
                c=config["servers"],
                run_time=run_time,
                warmup=warmup,
                seed=seed + r,
                patience_sampler=patience_sampler,
            )
        )
    return pd.DataFrame(rows)


## Baseline (No Abandonment)

Simulate the toy system with no abandonment and summarise performance.

In [None]:
baseline_runs = run_replications(BASE_PARAMS, n_rep=30, run_time=BASE_PARAMS['run_time'], warmup=BASE_PARAMS['warmup'], seed=42, patience_sampler=None)

summary = baseline_runs.mean().to_dict()
print("Baseline metrics (no abandonment):")
for k,v in summary.items():
    print(f"  {k:12s} {v:8.4f}")



## Toy Misspecification: Ignoring Abandonment

Simulate data where customers have finite patience (some abandon) but then evaluate a model that assumes infinite patience. This illustrates how violating structural assumptions biases conclusions.


In [None]:
# --- 1. Ground Truth: Simulate a system where customers abandon ---
# We define a patience distribution (Exponential) and run one long simulation
# to get a sample path representing the "real world".
print("--- 1. Ground Truth Simulation (with Abandonment) ---")
patience_mean = 1.0
patience_sampler = lambda rng: rng.exponential(patience_mean)

# Run a single, longer simulation to represent the "true" system behavior
true_runs = run_replications(
    BASE_PARAMS, n_rep=1, run_time=5000.0, warmup=500.0, seed=999, patience_sampler=patience_sampler
)
true_metrics = true_runs.iloc[0]

# Calculate key metrics from this true system
total_arrivals_est = true_metrics.departures + true_metrics.abandoned
abandon_rate = true_metrics.abandoned / total_arrivals_est if total_arrivals_est > 0 else 0

print(f"True arrival rate (lambda): {BASE_PARAMS['lambda_rate']:.3f}")
print(f"Observed departures:        {true_metrics.departures:.0f}")
print(f"Observed abandonments:      {true_metrics.abandoned:.0f}")
print(f"Estimated abandon rate:     {abandon_rate:.2%}")
print(f"True mean wait (Wq):        {true_metrics.mean_wait:.4f}\n")


# --- 2. Naive Model: Analyst ignores abandonment ---
# The analyst only observes departures and incorrectly assumes this is the total arrival rate.
# They build a model with no abandonment, using a lower, incorrect arrival rate.
print("--- 2. Naive Model Simulation (Ignoring Abandonment) ---")
# The analyst underestimates the arrival rate by only counting completions.
naive_lambda = BASE_PARAMS["lambda_rate"] * (1 - abandon_rate)
naive_cfg = {**BASE_PARAMS, "lambda_rate": naive_lambda}

print(f"Naive assumption: Arrival rate = completion rate ≈ {naive_lambda:.3f}")
print("Simulating a no-abandonment model with this lower rate...")

# Run replications of the misspecified model
naive_runs = run_replications(
    naive_cfg, n_rep=30, run_time=2000.0, warmup=200.0, seed=1234, patience_sampler=None
)
naive_mean_wait = naive_runs["mean_wait"].mean()


# --- 3. Comparison ---
# The naive model, fed with a lower arrival rate, wrongly predicts a much lower waiting time.
print("\n--- 3. Comparison of Results ---")
print(f"True System Mean Wait:      {true_metrics.mean_wait:.4f}")
print(f"Naive Model Predicted Wait: {naive_mean_wait:.4f}")
print(f"Prediction Error:           {(naive_mean_wait - true_metrics.mean_wait) / true_metrics.mean_wait:.2%}")


### Interpreting the Misspecification Result
- In the true system, some customers abandon, so completions underestimate the real arrival rate and hide lost demand.
- A naive analyst who only sees completions and assumes infinite patience will think the system is less loaded and will mis-estimate waiting times.
- Recommendations based on the naive model are over-optimistic: they ignore both lost customers and the increased congestion that would appear if everyone actually stayed.
- Before adopting a no-abandonment model, always check the data for signs of reneging or timeouts.


## Scenario Exploration (Synthetic)

This section runs a small, synthetic scenario study to illustrate how capacity (number of servers) and demand scale interact and to quantify the resulting performance and uncertainty.

What is done
- We sweep a small grid of configurations: servers ∈ {base−1, base, base+1, base+2} and demand multipliers ∈ {0.8, 1.0, 1.2}.
- For each configuration we run n_rep = 20 independent replications of the discrete‑event simulator using the run_time and warmup from BASE_PARAMS.
- For each replication we collect mean waiting time (Wq), time‑average queue length (Lq), and utilisation; the code then reports the sample mean and a 95% CI for each metric via the summarize(...) helper.

Why this is useful
- Capacity planning and robustness: the grid shows how small changes in demand or servers move the system between under‑loaded, well‑operating, and near‑saturation regimes.
- Detect regime shifts: when utilisation nears 1 the mean wait (and its uncertainty) typically explodes — this is important to spot before making operational decisions.
- Decision support under uncertainty: the CIs help assess whether differences between configurations are meaningful (statistical vs. simulation noise), and enable cost/benefit trade‑offs when paired with a cost model.

What to look for in the output
- scenarios (DataFrame): one row per configuration with columns servers, demand_mult, mean_wait, mean_wait_lo, mean_wait_hi, lq, lq_lo, lq_hi, util, util_lo, util_hi. Use these to compare performance and uncertainty across configs.
- cost_table (DataFrame, built in the next cell): maps each scenario to a simple cost = c_server * servers + c_wait * λ * mean_wait so you can rank configurations by expected cost.
- Interpretation tips: very large mean_wait (and large CI) indicates a configuration that is effectively unstable for the chosen demand; tight CIs indicate reliable simulation estimates; compare cost_table to trade off staffing cost vs. customer waiting cost.

Use these results to choose feasible configurations (satisfying SLA or service‑level constraints) or to prioritise further experiments (e.g., finer demand grid, alternative cost weights, or exploring patience/abandonment).

In [None]:
def summarize(series: pd.Series) -> Tuple[float, float, float]:
    x = series.dropna().values
    m = x.mean()
    s = x.std(ddof=1)
    half = 1.96 * s / np.sqrt(len(x))
    return m, m - half, m + half

base_c = BASE_PARAMS["servers"]
servers = sorted({max(1, base_c - 1), base_c, base_c + 1, base_c + 2})
demand_multipliers = [0.8, 1.0, 1.2]

rows = []
for c in servers:
    for alpha in demand_multipliers:
        cfg = {**BASE_PARAMS, "lambda_rate": BASE_PARAMS["lambda_rate"] * alpha, "servers": c}
        runs = run_replications(cfg, n_rep=20, run_time=BASE_PARAMS['run_time'], warmup=BASE_PARAMS['warmup'], seed=100)
        mw = summarize(runs["mean_wait"])
        lq = summarize(runs["lq_timeavg"])
        ut = summarize(runs["utilisation"])
        rows.append({
            "servers": c,
            "demand_mult": alpha,
            "mean_wait": mw[0], "mean_wait_lo": mw[1], "mean_wait_hi": mw[2],
            "lq": lq[0], "lq_lo": lq[1], "lq_hi": lq[2],
            "util": ut[0], "util_lo": ut[1], "util_hi": ut[2],
        })

scenarios = pd.DataFrame(rows)
scenarios


### Optional Cost Comparison

Translate performance into cost if desired; adjust weights as needed.

In [None]:
c_server = 1.0   # cost per server per unit time
c_wait = 5.0     # cost per unit waiting time per job

cost_rows = []
for _, row in scenarios.iterrows():
    lam = BASE_PARAMS["lambda_rate"] * row["demand_mult"]
    cost_mean = c_server * row["servers"] + c_wait * lam * row["mean_wait"]
    cost_lo = c_server * row["servers"] + c_wait * lam * row["mean_wait_lo"]
    cost_hi = c_server * row["servers"] + c_wait * lam * row["mean_wait_hi"]
    cost_rows.append({**row, "cost": cost_mean, "cost_lo": cost_lo, "cost_hi": cost_hi})

cost_table = pd.DataFrame(cost_rows)
cost_table.sort_values(["demand_mult", "cost"])


**Exercice:**  How many servers would you choose ?


## Exercises
- Modify `BASE_PARAMS` and observe how stability/utilisation change; detect when the system nears saturation.
- Extend the abandonment example: vary patience mean and quantify how the naive (no-abandonment) model mis-predicts waits.
- Try a heavier-tailed service sampler (e.g. lognormal) and compare CIs to the shifted-exponential baseline.
- Add a service-level constraint (e.g. target $P(W_q > w^*)$) and choose the cheapest feasible configuration.
- Document which diagnostics justify your chosen model and which behaviours the model cannot capture.