# Lecture 5 Evaluation Exercise — M/G/2 Case Study

This notebook is the **graded evaluation** for Lecture 5. Your task is to analyse a synthetic event log from a two‑server queue, reconstruct a plausible M/G/2 model, estimate its parameters, and assess its performance using the tools from Lectures 2–4.

You may work **individually or in groups of up to 3 students**.

---

### Submission

- When you have finished, save a copy of this notebook **with your group name in the filename**.
- The notebook must contain **all your code, figures, and answers**.
- Send the completed notebook **before 17h** by email to:  
  `sebastian.muller@univ-amu.fr`.

---

### Executability requirements

Your notebook must run **from top to bottom** in a fresh environment:

- Import (and, if necessary, install) **all required packages at the beginning** of the notebook.
- Before submitting, **restart the kernel and run all cells** to ensure the notebook executes without errors.

---

**Group name:**  
**Members:**


## Learning Objectives

- Work from raw event logs to a plausible $M/G/2$ queue model.
- Estimate arrival and service parameters (including uncertainty) from data.
- Compare empirical performance (mean waiting time, mean number in queue $L_q$, utilisation) to model-based predictions.
- Communicate modelling assumptions, diagnostics, and limitations.


## Scenario

A small ML-backed support system routes **complex tickets** to a pool of two identical agents.
All such tickets in the observation window enter this two-server system and are recorded in the log.

- Arrivals are approximately time-homogeneous during the observation window.
- Each complex ticket requires a **fixed overhead** (reading context, loading tools) plus an additional random processing time.
- The system can have at most two tickets in service simultaneously; extra tickets wait in a single FCFS queue.

For this evaluation, we treat the system as an $M/G/2$ queue: Poisson arrivals with unknown rate $\lambda$, i.i.d. service times with unknown distribution $G$, and two identical servers.


## Data Description

You are given a single contiguous observation window (no gaps) as a CSV file:

- File: `data/lecture5_mg2_case_study.csv`
- One row per completed ticket.
- Columns:
  - `arrival_time` (float): time the ticket enters the manual-queue system.
  - `queue_len_at_arrival` (int): number of tickets already in system (in service + waiting) just before arrival.
  - `service_time` (float): total service time once an agent starts working on the ticket.
  - `start_service_time` (float): time service begins.
  - `completion_time` (float): time service ends.
  - `wait_time` (float): $W_q = \text{start} - \text{arrival}$.
  - `system_time` (float): $W = \text{completion} - \text{arrival}$.

The log includes all jobs that **arrive** within the window; some jobs may complete after the final arrival time.


## Your Tasks

Work in this notebook only; do not regenerate or modify the CSV.

1. **Data hygiene and basic checks**
   - Verify non-negativity (`wait_time \ge 0`, `system_time \ge service_time`).
   - Check temporal consistency (`start_service_time \ge arrival_time`, `completion_time \ge start_service_time`).
   - Inspect `queue_len_at_arrival` to confirm the system behaves like two parallel servers.
2. **Arrival process**
   - Compute inter-arrival times and explore whether a **homogeneous Poisson process** is reasonable (histograms, ECDF, log-tail).
   - Estimate the arrival rate $\hat\lambda$ and provide a 95\% CI using the methods from Lecture 4.
3. **Service-time distribution**
   - Explore the empirical distribution of `service_time` (histograms, ECDF, log-survival).
   - Look for evidence of a **positive lower bound** (fixed overhead) and an approximately **memoryless tail**.
   - Propose a simple parametric model (for example, constant offset + exponential) and estimate its parameters; report point estimates and a 95\% CI for the main rate parameter.
4. **Queue performance and utilisation**
   - Using your fitted model and the fact that there are two identical servers, estimate utilisation $\hat\rho = \hat\lambda \hat m_1 / 2$.
   - Compute the empirical mean waiting time $\overline{W}_q$ from the log and compare it to a model-based prediction obtained either via simulation or an approximation of $M/G/2$.
   - Estimate the mean number of jobs waiting in queue $L_q$ from the data (e.g. using `queue_len_at_arrival` (arrivals see a typical queue length in steady state) and/or Little's Law $L_q \approx \hat\lambda\,\overline{W}_q$) and comment on consistency.
   - Quantify uncertainty for at least one performance metric (e.g., bootstrap CIs over regenerative cycles or over independent simulation replications).
5. **Short write-up** (in markdown cells)
   - Clearly state your chosen model (arrival process, service family) and justify it with plots/statistics.
   - Report parameter estimates and CIs.
   - Discuss how well your model explains the observed waiting-time distribution and what its limitations are.


### Checklist

- [ ] Basic sanity checks on the raw log.
- [ ] Estimated $\hat\lambda$ with CI and Poisson diagnostics.
- [ ] Service-time model chosen, fitted, and validated (at least one goodness-of-fit diagnostic).
- [ ] Estimated utilisation $\hat\rho$ with interpretation (stable or near-critical?).
- [ ] Estimated mean number in queue $L_q$ and related it to $\hat\lambda$ and $\overline{W}_q$.
- [ ] Empirical vs model-based mean waiting time with some notion of uncertainty.
- [ ] Clear, concise explanation of assumptions and limitations.


In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import os

plt.style.use("seaborn-v0_8-darkgrid")

CANDIDATES = ["data/lecture5_mg2_case_study.csv"]
DATA_PATH = next((p for p in CANDIDATES if os.path.exists(p)), None)
assert DATA_PATH, "Missing case study CSV (expected in data/lecture5_mg2_case_study.csv)"

df = pd.read_csv(DATA_PATH)
print(f"Loaded dataset: {DATA_PATH} with {len(df)} rows.")
df.head()
