# Lecture 1 Hands-On: Poisson Warm-Up

This notebook pairs with Lecture 1. Start with a plain NumPy simulation of Poisson counts, then step into a SimPy arrival process.

## Setup

In [None]:
import pathlib

import numpy as np

try:
    import simpy
except ImportError as exc:
    raise SystemExit("SimPy is required for this notebook. Install via 'pip install simpy'.") from exc

RNG = np.random.default_rng(42)
NOTEBOOK_DIR = pathlib.Path.cwd()


---
## Part 1 — Plain Python Warm-Up
Follow the checklist from the slides:
1. Set the hourly rate `lam` and scale it to a half-hour window.
2. Simulate `n_windows = 1_000` Poisson counts for the 30 minute window.
3. Report the sample mean and variance versus the theoretical value `lambda_window`.
4. Estimate `P(X >= 1)` empirically by counting non-zero draws.
5. Summarise the takeaways in the Markdown cell below.

In [None]:
lam_per_hour = 4
lambda_window = lam_per_hour * 0.5  # TODO: confirm the scaling
n_windows = 1_000

# TODO: simulate Poisson counts and compute sample mean, variance, and probability of at least one event
counts = None
sample_mean = None
sample_var = None
p_ge_one = None
p_zero = None  

sample_mean, sample_var, p_ge_one, p_zero


### Theoretical Benchmarks

For a Poisson random variable $X \sim \mathrm{Poi}(\lambda)$ with $\lambda = 2$ (half-hour window):
- $\mathbb{E}[X] = \lambda$.
- $\operatorname{Var}(X) = \lambda$.
- $\mathbb{P}[X \ge 1] = 1 - e^{-\lambda}$.

Fill in the following cell to compute the exact values numerically.

In [None]:
import math

exact_mean = None
exact_variance = None
exact_p_ge_one = None
eaxct_p_zero = None

exact_mean, exact_variance, exact_p_ge_one, eaxct_p_zero

### Compare Simulation vs. Theory

Compute absolute errors between your simulated statistics and the theoretical values above. Check whether the discrepancies are within the tolerance you expect for $n_\text{windows} = 1000$.

In [None]:
# TODO: replace `sample_mean`, `sample_var`, `p_ge_one` after computing them above
errors = {
    'mean_error': abs(sample_mean - exact_mean),
    'variance_error': abs(sample_var - exact_variance),
    'prob_error': abs(p_ge_one - exact_p_ge_one),
}
errors

### Reasonable Tolerance Bounds

Determine quantitative tolerances for each statistic. For example, one approach is to use a normal approximation or Chebyshev inequality to set bounds for estimated mean/variance/probabilities. Formulate a justification and verify the simulation output lies within your bounds.

In [None]:
# TODO: Define tolerances justified by your reasoning
mean_tolerance = None
variance_tolerance = None
prob_tolerance = None

checks = {
    'mean_within_tolerance': errors['mean_error'] <= mean_tolerance if mean_tolerance is not None else None,
    'variance_within_tolerance': errors['variance_error'] <= variance_tolerance if variance_tolerance is not None else None,
    'prob_within_tolerance': errors['prob_error'] <= prob_tolerance if prob_tolerance is not None else None,
}
checks

### Commentary

Briefly discuss whether the simulation agrees with theory given your tolerances. If it does not, refine your reasoning or increase the number of simulations until the comparison is satisfactory.

---
## Part 2 — M/M/1 Queue in Plain Python

We begin with an end-to-end view of the M/M/1 queue before touching SimPy.
- **Arrivals** follow a Poisson process with rate $\lambda$ (exponential inter-arrival times).
- **Service times** are i.i.d. exponential with rate $\mu$.
- A single server works first-come/first-served, with unlimited waiting room.

Key quantities to track: utilisation $\rho = \lambda/\mu$, waiting times $W_q$, sojourn times $W$, and queue length process $L(t)$. We'll implement a minimal discrete-event simulator using only basic Python + NumPy.

In [None]:
# Queue and simulation parameters (feel free to tweak)
lambda_rate = 4.0   # arrivals per hour
mu_rate = 6.0       # services per hour
sim_hours = 2.0
max_events = 5_000  # safety cap to avoid infinite loops

rho = lambda_rate / mu_rate
rho

### Task: Implement a Plain-Python Simulator
Steps to follow inside the skeleton below:
1. Generate exponential inter-arrival and service times using `rng.exponential` (remember rate vs. scale).
2. Keep track of the next arrival time and when the server becomes free.
3. For each arrival, decide when service starts (max of arrival time and server-available time), then update departure time.
4. Record per-customer metrics: waiting time in queue, total time in system, and queue length just before arrival.
5. Stop when simulated time exceeds `sim_hours` or you hit the safety cap.
6. Return a dictionary with raw logs to analyse later.

In [None]:
# TODO: complete the simulator

def simulate_mm1_basic(lambda_rate, mu_rate, sim_hours, rng, max_events=5_000):
    """Return a dict containing arrival/departure logs for an M/M/1 queue."""
    raise NotImplementedError

mm1_logs = simulate_mm1_basic(lambda_rate, mu_rate, sim_hours, RNG, max_events=max_events)
list(mm1_logs.keys())

### Theoretical Benchmarks for M/M/1
For $\rho = \lambda/\mu < 1$ the steady-state metrics are:
- $L = \dfrac{\rho}{1-\rho}$ (expected number in system)
- $L_q = \dfrac{\rho^2}{1-\rho}$ (expected number waiting)
- $W = \dfrac{1}{\mu - \lambda}$ (expected time in system)
- $W_q = \dfrac{\rho}{\mu - \lambda}$ (expected waiting time)

Compute them numerically below for the chosen parameters.

In [None]:
theoretical = {
    'L': rho / (1 - rho),
    'L_q': (rho**2) / (1 - rho),
    'W': 1 / (mu_rate - lambda_rate),
    'W_q': rho / (mu_rate - lambda_rate),
}
theoretical

### Analyse the Simulation Output
Using the raw logs, derive empirical estimates for the same metrics:
- Average number in system / queue (e.g., via time averaging or Little's Law).
- Sample means for waiting time and total time in system.
Then compare to the theoretical values above.

In [None]:
# TODO: derive empirical metrics from mm1_logs
empirical = {
    'L_est': None,
    'L_q_est': None,
    'W_est': None,
    'W_q_est': None,
}
empirical

### Sanity Check
Compute absolute errors between empirical estimates and theory. Comment on whether the run length (`sim_hours`) and sample size are enough to match steady-state predictions, and what adjustments you would make if not.

In [None]:
# TODO: compare empirical metrics to theory (once filled)
if all(value is not None for value in empirical.values()):
    mm1_errors = {
        'L_error': abs(empirical['L_est'] - theoretical['L']),
        'L_q_error': abs(empirical['L_q_est'] - theoretical['L_q']),
        'W_error': abs(empirical['W_est'] - theoretical['W']),
        'W_q_error': abs(empirical['W_q_est'] - theoretical['W_q']),
    }
else:
    mm1_errors = None
mm1_errors

### Reflection
Summarise your findings: does the simple simulator agree with the closed-form M/M/1 results? Note any sources of discrepancy (finite horizon, warm-up bias, random fluctuation) and proposed fixes.

### From Analytical Models to Event Simulation
Parts 1 and 2 gave us the arrival distribution and queue behaviour using plain NumPy. We now carry those ingredients into SimPy so that the event scheduling matches the assumptions we've already validated.

### SimPy Primer
SimPy is a **discrete-event simulation** library. Instead of stepping through time in tiny increments, it keeps a priority queue of upcoming events and jumps directly to them. A few core ideas:
- The `Environment` manages simulated time and the event queue.
- Each **process** is a Python generator that `yield`s events such as `env.timeout(...)` or resource requests.
- When a process yields a timeout, SimPy schedules the next wake-up at the requested simulated time.
- All processes share the same clock, so we can model queues, resources, and stochastic arrivals consistently.

We'll start with a minimal example to see the mechanics before building the Poisson arrival stream.

In [38]:

import simpy

log = []

def ticker(env, interval):
    """Process that records a message every `interval` time units."""
    while True:
        yield env.timeout(interval)
        log.append({'time': env.now, 'event': 'tick'})

# Set up and run the environment
env = simpy.Environment()
env.process(ticker(env, interval=0.75))
env.run(until=3.0)

log


[{'time': 0.75, 'event': 'tick'},
 {'time': 1.5, 'event': 'tick'},
 {'time': 2.25, 'event': 'tick'}]

Each iteration of `ticker` waits for `interval` simulated time units. When `env.run` finishes, the `log` shows that SimPy advanced directly to the event times (0.75, 1.5, 2.25, ...). We'll reuse the same pattern in Part 3: a generator that yields exponential timeouts to produce Poisson arrivals.

---
## Part 3 — SimPy Arrival Stream
Using the generator pattern from the primer, recreate the Poisson arrival process inside SimPy. Treat this as the event-driven counterpart to the NumPy simulations:
1. Reuse `lam_per_hour` from Part 1 for the arrival rate.
2. Implement `arrival_process(env, lam, rng, log)` that draws exponential inter-arrival times (matching Part 1) and records `env.now` just like the timestamps you computed for M/M/1.
3. Write `simulate_arrivals` that seeds a SimPy environment, launches the process, and runs it for `simpy_duration_hours`.
4. After the run, analyse the timestamps to recover **both** the inter-arrival distribution and the half-hour counts. Compare these to the theoretical benchmarks from Part 1 and use the tolerances you developed there.

In [39]:

lam = lam_per_hour  # reuse the rate from Part 1
window_hours = 0.5
simpy_duration_hours = 2.0
num_windows = int(simpy_duration_hours / window_hours)
simpy_rng = RNG  # reuse the global RNG unless you prefer a fresh seed


In [40]:

def arrival_process(env, lam, rng, log):
    """Generate Poisson arrivals with rate ``lam`` and log event times."""
    while True:
        # TODO: draw an exponential inter-arrival time using `rng`
        inter_arrival = ...
        # TODO: yield a timeout so the environment advances by that amount
        yield ...
        # TODO: record the new time (`env.now`) in the log
        ...

def simulate_arrivals(lam, duration_hours, rng):
    """Run a SimPy environment and return arrival timestamps as a NumPy array."""
    env = simpy.Environment()
    timestamps = []
    # TODO: start the arrival process inside the environment
    ...
    # TODO: run the environment for the requested number of hours
    ...
    return np.array(timestamps)

# Run the (still incomplete) simulator once you fill in the TODOs above
arrival_log = simulate_arrivals(lam, simpy_duration_hours, simpy_rng)
arrival_log[:5]


array([0.165894  , 0.44090428, 0.92113621, 1.06361783, 1.10930455])

### Analyse the Arrival Log
- Compute inter-arrival samples and verify their mean/variance against the exponential theory ($1/\lambda$).
- Bucket arrivals into half-hour windows and compare empirical counts to the Part 1 metrics (mean, variance, $P(X\ge 1)$, etc.).
- Comment on whether the SimPy results respect the tolerances you set earlier.

In [None]:

# Analyse SimPy arrivals against Part 1 benchmarks
if arrival_log.size == 0:
    raise ValueError("Simulation produced no arrivals; increase simpy_duration_hours or check parameters.")

# Inter-arrival diagnostics (include the gap from time 0 to the first arrival)
inter_arrivals = np.diff(np.insert(arrival_log, 0, 0.0))
inter_summary = {
    'mean_inter_arrival': inter_arrivals.mean(),
    'var_inter_arrival': inter_arrivals.var(ddof=1) if inter_arrivals.size > 1 else np.nan,
    'theoretical_mean': 1 / lam,
    'theoretical_variance': 1 / (lam ** 2),
}

# Window counts over half-hour buckets
bin_edges = np.arange(0, simpy_duration_hours + window_hours + 1e-9, window_hours)
counts, _ = np.histogram(arrival_log, bins=bin_edges)
count_summary = {
    'mean_count': counts.mean(),
    'var_count': counts.var(ddof=1) if counts.size > 1 else np.nan,
    'prob_ge_one': (counts > 0).mean(),
    'theoretical_mean': exact_mean,
    'theoretical_variance': exact_variance,
    'theoretical_prob_ge_one': exact_p_ge_one,
}

# Absolute errors relative to theory
count_errors = {
    'mean_error': abs(count_summary['mean_count'] - exact_mean),
    'variance_error': abs(count_summary['var_count'] - exact_variance) if not np.isnan(count_summary['var_count']) else np.nan,
    'prob_error': abs(count_summary['prob_ge_one'] - exact_p_ge_one),
}

# Compare against tolerances if they exist
tolerance_checks = {
    'mean_within_tolerance': None,
    'variance_within_tolerance': None,
    'prob_within_tolerance': None,
}
if 'mean_tolerance' in globals() and mean_tolerance is not None:
    tolerance_checks['mean_within_tolerance'] = count_errors['mean_error'] <= mean_tolerance
if 'variance_tolerance' in globals() and variance_tolerance is not None and not np.isnan(count_errors['variance_error']):
    tolerance_checks['variance_within_tolerance'] = count_errors['variance_error'] <= variance_tolerance
if 'prob_tolerance' in globals() and prob_tolerance is not None:
    tolerance_checks['prob_within_tolerance'] = count_errors['prob_error'] <= prob_tolerance

{
    'inter_arrival': inter_summary,
    'counts': count_summary,
    'errors': count_errors,
    'tolerances': tolerance_checks,
}


### Interpretation
The dictionary above reports how the SimPy arrival stream aligns with the analytic Poisson benchmarks:
- Inter-arrival mean/variance should be close to $1/\lambda$ and $1/\lambda^2$.
- Half-hour counts mirror the Part 1 simulation: mean $\approx \lambda_{0.5h}$, variance $\approx \lambda_{0.5h}$, and $P(X\ge 1)$ near $1-e^{-\lambda_{0.5h}}$.
- If the error terms sit within the tolerances you specified earlier, the event-driven simulation is consistent with the plain NumPy approach.
If they do not, increase the runtime (larger `simpy_duration_hours`) or revisit the tolerance justification.

### Stretch Goal
Replace the manual M/M/1 simulator with a SimPy version: inject the same arrivals into a SimPy `Resource`, add exponential services with rate `mu_rate`, and replicate the queueing metrics from Part 2. Compare the two implementations and explain any differences.

### Bonus: Full M/M/1 Queue in SimPy
Let’s replicate the plain-Python simulator using SimPy so you can see how a resource model mirrors the analytics:
1. Reuse `lambda_rate`, `mu_rate`, and the theoretical metrics from Part 2.
2. Create an arrival generator that yields exponential inter-arrival times and spawns a `customer` process for each arrival.
3. Model the single server with `simpy.Resource(capacity=1)`; each customer requests the server, draws an exponential service time, and releases the server when done.
4. Log arrival, service-start, and departure times to compute waiting-time statistics.
5. Compare the SimPy estimates to both the theoretical benchmarks and the plain-Python results.

In [None]:

import itertools

def simulate_mm1_simpy(lambda_rate, mu_rate, duration_hours, rng, max_customers=10_000):
    env = simpy.Environment()
    server = simpy.Resource(env, capacity=1)

    arrivals, service_starts, departures = [], [], []

    def customer(env, cust_id):
        arrival_time = env.now
        arrivals.append(arrival_time)
        # TODO: request the server resource (hint: use "with server.request() as req")
        ...
        # TODO: once you have the resource, log the service start time
        ...
        # TODO: draw an exponential service time and yield a timeout
        ...
        # TODO: append the departure time when service finishes
        ...

    def arrival_generator(env):
        for cust_id in itertools.count(1):
            # TODO: draw exponential inter-arrival times and advance the clock
            ...
            if env.now > duration_hours or cust_id > max_customers:
                break
            # TODO: launch a new customer process for each arrival
            ...

    # TODO: register the arrival generator with the environment and run it
    ...

    if not arrivals:
        raise ValueError("Simulation produced no customers; increase duration or arrival rate.")

    arrivals_arr = np.array(arrivals)
    service_arr = np.array(service_starts)
    departures_arr = np.array(departures)

    waiting_times = service_arr - arrivals_arr
    system_times = departures_arr - arrivals_arr

    return {
        'arrivals': arrivals_arr,
        'service_starts': service_arr,
        'departures': departures_arr,
        'waiting_times': waiting_times,
        'system_times': system_times,
        'customers_served': departures_arr.size,
    }

simpy_rng_queue = np.random.default_rng(314)
mm1_simpy_logs = simulate_mm1_simpy(lambda_rate, mu_rate, sim_hours, simpy_rng_queue)
mm1_simpy_logs['customers_served']


In [None]:

# Compare SimPy queue metrics to theory (and to the plain-Python estimates)
waiting_times = mm1_simpy_logs['waiting_times']
system_times = mm1_simpy_logs['system_times']

simpy_empirical = {
    'W_est': waiting_times.mean() + system_times.mean() - waiting_times.mean(),  # redundant but keeps structure
    'W_q_est': waiting_times.mean(),
}
# Use Little's Law to estimate L and L_q
def little_law(lambda_rate, mean_time):
    return lambda_rate * mean_time

simpy_empirical['W_est'] = system_times.mean()
simpy_empirical['W_q_est'] = waiting_times.mean()
simpy_empirical['L_est'] = little_law(lambda_rate, simpy_empirical['W_est'])
simpy_empirical['L_q_est'] = little_law(lambda_rate, simpy_empirical['W_q_est'])

simpy_errors = {
    'L_error': abs(simpy_empirical['L_est'] - theoretical['L']),
    'L_q_error': abs(simpy_empirical['L_q_est'] - theoretical['L_q']),
    'W_error': abs(simpy_empirical['W_est'] - theoretical['W']),
    'W_q_error': abs(simpy_empirical['W_q_est'] - theoretical['W_q']),
}

{'simpy_empirical': simpy_empirical, 'errors': simpy_errors}


The SimPy queue uses the same stochastic primitives but lets the environment manage event ordering for us. Check that:
- The empirical waiting times match the plain-Python simulation (subject to Monte Carlo noise).
- The Little’s Law estimates (`L` and `L_q`) are close to the theoretical values when utilisation $
ho < 1$.
- Any discrepancies can be explained by warm-up effects or short simulation horizons—try extending `sim_hours` to verify convergence.

### Wrap-Up
Use your findings to articulate how the three perspectives line up:
- Poisson theory (Part 1)
- Plain Python queueing (Part 2)
- SimPy event simulation (Part 3)

Note any mismatches and what diagnostic you'd run next to resolve them.