# Weeks 12-13: Queuing Models

**Objective:** Understand the components of a queuing system, learn the notation for describing queues, analyze the fundamental M/M/1 model, and simulate a queue to verify its theoretical properties.

## Step 1: Build Intuition

We've all experienced queues: waiting in line at the grocery store, a coffee shop, or a bank; being on hold for customer service; or even waiting for a web page to load. Queuing theory is the mathematical study of waiting lines.

It helps us answer practical business questions:
- If we add another barista, how much will the average customer wait time decrease?
- What is the probability that a customer arriving at the bank will find more than 5 people already in line?
- How many servers do we need to ensure that 95% of customers are served within 3 minutes?

A queuing model is a stochastic process that describes the flow of customers through a service system.

## Step 2: Understand the Core Idea

A queuing system is characterized by three main components:

1.  **The Arrival Process:** How customers arrive. Are arrivals random or scheduled? What is the average rate?
2.  **The Service Process:** How long it takes to serve a customer. Is the service time constant or random?
3.  **The System Configuration:** How many servers are there? Is there a limit to how many people can wait in the queue?

### Kendall's Notation
We use a standard shorthand called **Kendall's Notation (A/B/s)** to describe a queue:
- **A:** The distribution of inter-arrival times.
- **B:** The distribution of service times.
- **s:** The number of parallel servers.

Common codes for A and B include:
- **M:** Markovian or Memoryless (meaning an Exponential distribution).
- **D:** Deterministic (constant time).
- **G:** General (any distribution).

The most fundamental queue is the **M/M/1 queue**: Poisson arrivals (Exponential inter-arrival times), Exponential service times, and 1 server.

## Step 3: Learn the Definitions and Formulas (for M/M/1)

For an M/M/1 queue, we define:
- **\(\lambda\):** The mean arrival rate (e.g., customers per hour). This comes from a Poisson process.
- **\(\mu\):** The mean service rate (e.g., customers per hour one server can handle). This means service times are Exponential with mean \(1/\mu\).

**Traffic Intensity (\(\rho\))**
This is the proportion of time the server is busy. It's a critical parameter.
$$ \rho = \frac{\lambda}{\mu} $$
For the queue to be **stable** (i.e., not grow infinitely long), we must have \(\rho < 1\), which means \(\lambda < \mu\). The service rate must be greater than the arrival rate.

--- 

**Key Performance Metrics for a Stable M/M/1 Queue:**

1.  **\(L\):** Average number of customers in the **system** (waiting + being served).
    $$ L = \frac{\rho}{1 - \rho} = \frac{\lambda}{\mu - \lambda} $$
2.  **\(L_q\):** Average number of customers in the **queue** (waiting).
    $$ L_q = \frac{\rho^2}{1 - \rho} = \frac{\lambda^2}{\mu(\mu - \lambda)} $$
3.  **\(W\):** Average time a customer spends in the **system** (waiting + service).
    $$ W = \frac{1}{\mu - \lambda} $$
4.  **\(W_q\):** Average time a customer spends in the **queue** (waiting).
    $$ W_q = \frac{\rho}{\mu - \lambda} = \frac{\lambda}{\mu(\mu - \lambda)} $$
5.  **\(P_n\):** Probability of having exactly \(n\) customers in the system.
    $$ P_n = (1 - \rho)\rho^n $$

## Step 4: Apply and Practice

**Scenario:** A small coffee shop has one barista. Customers arrive at a rate of \(\lambda = 20\) per hour. The barista can serve customers at a rate of \(\mu = 30\) per hour.

### Part A: Theoretical Calculation
Let's use the formulas to analyze this M/M/1 queue.

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import heapq

plt.style.use('seaborn-v0_8-whitegrid')

# --- Parameters ---
lambda_rate = 20.0  # Arrival rate (customers/hour)
mu_rate = 30.0      # Service rate (customers/hour)

# --- Calculations ---
rho = lambda_rate / mu_rate

L = rho / (1 - rho)
Lq = rho**2 / (1 - rho)
W = 1 / (mu_rate - lambda_rate)  # in hours
Wq = rho / (mu_rate - lambda_rate) # in hours

print("--- M/M/1 Queue Analysis ---")
print(f"Arrival Rate (lambda): {lambda_rate}/hour")
print(f"Service Rate (mu): {mu_rate}/hour")
print(f"Traffic Intensity (rho): {rho:.4f}")
print("\n--- Performance Metrics ---")
print(f"Avg # of customers in system (L): {L:.4f}")
print(f"Avg # of customers in queue (Lq): {Lq:.4f}")
print(f"Avg time in system (W): {W * 60:.4f} minutes")
print(f"Avg time in queue (Wq): {Wq * 60:.4f} minutes")

### Part B: Verifying with a Discrete-Event Simulation

The formulas are powerful, but building a simulation gives a much deeper understanding of how the system dynamics lead to these results. We will simulate the coffee shop for a very long time and track every customer to see if our simulated averages match the theory.

In [None]:
def simulate_mm1_queue(lambda_rate, mu_rate, max_customers):
    """Simulates an M/M/1 queue and returns performance metrics."""
    # State variables
    current_time = 0.0
    server_busy_until = 0.0
    
    # Data collection
    total_wait_time = 0.0
    total_system_time = 0.0
    
    for _ in range(max_customers):
        # 1. Generate next arrival
        # Inter-arrival time is Exponential(lambda)
        inter_arrival_time = np.random.exponential(1.0 / lambda_rate)
        current_time += inter_arrival_time
        arrival_time = current_time
        
        # 2. Calculate wait time
        # The customer can start service either now or when the server is free
        start_service_time = max(arrival_time, server_busy_until)
        wait_time = start_service_time - arrival_time
        total_wait_time += wait_time
        
        # 3. Generate service time and update server status
        # Service time is Exponential(mu)
        service_time = np.random.exponential(1.0 / mu_rate)
        departure_time = start_service_time + service_time
        server_busy_until = departure_time
        
        # 4. Calculate system time
        system_time = departure_time - arrival_time
        total_system_time += system_time
        
    # Calculate averages
    avg_Wq = total_wait_time / max_customers
    avg_W = total_system_time / max_customers
    
    # Use Little's Law to get L and Lq
    avg_L = lambda_rate * avg_W
    avg_Lq = lambda_rate * avg_Wq
    
    return avg_L, avg_Lq, avg_W, avg_Wq

# --- Simulation Parameters ---
N_CUSTOMERS = 100000 # Simulate a large number of customers for convergence

# Run the simulation
sim_L, sim_Lq, sim_W, sim_Wq = simulate_mm1_queue(lambda_rate, mu_rate, N_CUSTOMERS)

print("\n--- Simulation Results ---")
print(f"Simulated Avg # in system (L): {sim_L:.4f} (Theoretical: {L:.4f})")
print(f"Simulated Avg # in queue (Lq): {sim_Lq:.4f} (Theoretical: {Lq:.4f})")
print(f"Simulated Avg time in system (W): {sim_W * 60:.4f} minutes (Theoretical: {W * 60:.4f})")
print(f"Simulated Avg time in queue (Wq): {sim_Wq * 60:.4f} minutes (Theoretical: {Wq * 60:.4f})")

**Interpretation:**

The results from our discrete-event simulation are remarkably close to the theoretical values predicted by the M/M/1 formulas. This gives us high confidence in both the theory and our simulation model. It beautifully demonstrates how the interaction of random arrivals and random service times leads to the predictable long-run averages.

### Visualizing the Impact of Traffic Intensity (ρ)

Let's see how the average wait time explodes as the arrival rate (\(\lambda\)) gets closer to the service rate (\(\mu\)), causing \(\rho\) to approach 1.

In [None]:
rho_values = np.linspace(0.01, 0.99, 100)
mu_fixed = 30.0

# W = 1 / (mu - lambda) = 1 / (mu - rho*mu) = 1 / (mu * (1-rho))
W_values = 1 / (mu_fixed * (1 - rho_values))

plt.figure(figsize=(12, 6))
plt.plot(rho_values, W_values * 60) # Convert to minutes
plt.title('Average Time in System (W) vs. Traffic Intensity ($\rho$)')
plt.xlabel('Traffic Intensity ($\rho = \lambda/\mu$)')
plt.ylabel('Average Time in System (minutes)')
plt.grid(True)
plt.show()

This plot is crucial for any operations manager. It shows that as the system gets closer to full utilization (\(\rho \to 1\)), the waiting times don't just increase linearly; they increase exponentially. A system running at 95% capacity has dramatically longer queues than one running at 80% capacity.

## Summary & Next Steps

In this notebook, we've dived into the world of queuing theory:
1.  Queues are described by their **arrival process**, **service process**, and **number of servers** (A/B/s).
2.  The **M/M/1 queue** is the most fundamental model, with Poisson arrivals and Exponential service times.
3.  The system's behavior is governed by the **traffic intensity \(\rho = \lambda/\mu\)**, which must be less than 1 for stability.
4.  We derived key performance metrics (L, Lq, W, Wq) and verified them with a discrete-event simulation.

In **Week 14**, we will look at **Birth-and-Death Processes**, which provide a general framework for modeling systems where the state (e.g., number of customers) only increases or decreases by one at a time. The M/M/1 queue is a classic example of a birth-and-death process.