# Chapter 2. Poisson Processes

__Dave's Notes__

* Example 2.1: I adjusted the problem set-up as the rates for a manicure and haircut were initially unclear to me.
* Example 2.3: This is a motivating example of compound Poisson processes in the book, but I provided a solution utilizing theorem 2.10, parts (i) and (iii).
* Example 2.4: This is a motivating example of compound Poisson processes in the book, but I provided a solution utilizing theorem 2.10, parts (i) and (ii).
* Example 2.9: This is a motivating example of nonhomogeneous Poisson thinning in the book, but I provided a solution utilizing theorem 2.12.
* Example 2.10: I corrected the typo for the first word—i.e., "Mean" $\rightarrow$ "Men"—in the problem set-up. Additionally I altered the second to last sentence in the problem set-up to include that women stay in the store for a mean of $\frac{1}{4}$ hour, since Dr. Durrett's solution assumes that mean time in the store is $\frac{1}{2}$ and $\frac{1}{4}$ hour for men and women, respectively. The mean time in the store for men is explicitly stated in the problem set-up and I could not figure out a logical way to deduce that the mean time in the store for women is $\frac{1}{4}$ hour since $X \sim \mathcal{U}_{[0, 60]} \implies \mathbb{E}[X] = 0.5 \cdot (0 + 60) = \frac{1}{2}$ hour.
* Example 2.12: I think Dr. Durrett either made a typo or an error when computing the numerator in the textbook solution.
* Example 2.14 Part (d): I am either missing something critical or Dr. Durrett made an error when computing the variance for the number of customers arriving at Bojangles. Specifically, I think Dr. Durrett's solution for the variance actually corresponds to the expectation of the random variable squared, which I believe must be true for the following reasons.
  * All of the terms that correspond to the number of passengers in a truck or car are squared.
  * Since all trucks have exactly one passenger, the variance for the number of passengers per truck must be zero.

In [1]:
# Import packages.
import numpy as np
from scipy.integrate import quad
from scipy.special import comb
from scipy.stats import binom
from scipy.stats import poisson

## Example 2.1 (Exponential Races) 

Anne and Betty enter a beauty parlor simultaneously, Anne to get a manicure and Betty to get a haircut. Suppose the time for a manicure and haircut is exponentially distributed with mean 20 and 30 minutes, respectively.

__(a)__ What is the probability Anne gets done first?

__(b)__ What is the expected amount of time until Anne and Betty are both done?

In [2]:
# Define the competing rates for Anne and Betty.
lambda_A = 1 / 20  # Anne's rate in minutes^{-1}.
lambda_B = 1 / 30  # Betty's rate in minutes^{-1}.


### Part (a): Probability Anne finishes first. ###

# Equation (2.8).
prob_A_first = lambda_A / (lambda_A + lambda_B)
# Print the result.
print(f"Part (a): Probability Anne finishes first = {prob_A_first}")


### Part (b): Expected total time until both are done. ###

# Mean time until first service completion.
mean_first_completion = 1 / (lambda_A + lambda_B)  # In minutes.
# Expected remaining time given Anne or Betty finishes first.
# First term: probability Anne finishes first times expected waiting time for Betty to finish her haircut.
# Second term: probability Betty finishes first times expected waiting time for Anne to finish her manicure.
expected_remaining_time = (prob_A_first * (1 / lambda_B)) + (
    (1 - prob_A_first) * (1 / lambda_A)
)
# Total expected time for Anne and Betty to finish their services.
total_expected_time = mean_first_completion + expected_remaining_time
# Print the result.
print(
    f"Part (b): Expected total time until both are done = {total_expected_time} minutes"
)

Part (a): Probability Anne finishes first = 0.6
Part (b): Expected total time until both are done = 38.0 minutes


## Example 2.2 (Exponential Races)

Ron, Sue, and Ted arrive at the beginning of a professor's office hours. The amount of time they will stay is exponentially distributed with means of $1, \frac{1}{2}$, and $\frac{1}{3}$ hour.

__(a)__ For each student find the probability they are the last student left. 

__(b)__ What is the expected time until all three students are gone?

In [3]:
# Define rates
lambda_R = 1  # Ron's rate (1/hr).
lambda_S = 2  # Sue's rate (2/hr).
lambda_T = 3  # Ted's rate (3/hr).


### Part (a): Probability that each student is last. ###

# Total rate when all three are present.
total_rate = lambda_R + lambda_S + lambda_T
# Probabilities of each student leaving first.
prob_R_first = lambda_R / total_rate
prob_S_first = lambda_S / total_rate
prob_T_first = lambda_T / total_rate
# Conditional probabilities of each student being the last to leave from each pair of students.
prob_R_last_from_RS = lambda_S / (lambda_R + lambda_S)
prob_R_last_from_RT = lambda_T / (lambda_R + lambda_T)
prob_S_last_from_RS = lambda_R / (lambda_R + lambda_S)
prob_S_last_from_ST = lambda_T / (lambda_S + lambda_T)
prob_T_last_from_RT = lambda_R / (lambda_R + lambda_T)
prob_T_last_from_ST = lambda_S / (lambda_S + lambda_T)
# Probability of each student being the last to leave.
prob_R_last = prob_S_first * prob_R_last_from_RT + prob_T_first * prob_R_last_from_RS
prob_S_last = prob_R_first * prob_S_last_from_ST + prob_T_first * prob_S_last_from_RS
prob_T_last = prob_R_first * prob_T_last_from_ST + prob_S_first * prob_T_last_from_RT
# Print the results.
print(f"Part (a): Probability Ron (R) is last = {prob_R_last:.3f}")
print(f"Part (a): Probability Sue (S) is last = {prob_S_last:.3f}")
print(f"Part (a): Probability Ted (T) is last = {prob_T_last:.3f}")


### Part (b): Expected time until all three students are gone. ###

# Expected waiting times until a student leaves.
mean_time_first_leave = 1 / total_rate
mean_time_RS_leave = 1 / (lambda_R + lambda_S)
mean_time_RT_leave = 1 / (lambda_R + lambda_T)
mean_time_ST_leave = 1 / (lambda_S + lambda_T)
# Expected time until all three students are gone.
expected_total_time = (
    mean_time_first_leave
    + prob_T_first * mean_time_RS_leave
    + prob_S_first * mean_time_RT_leave
    + prob_R_first * mean_time_ST_leave
    + prob_R_last * (1 / lambda_R)
    + prob_S_last * (1 / lambda_S)
    + prob_T_last * (1 / lambda_T)
)
# Print the result.
print(
    f"Part (b): Expected total time until all students are gone: {expected_total_time:.4f} hours"
)

Part (a): Probability Ron (R) is last = 0.583
Part (a): Probability Sue (S) is last = 0.267
Part (a): Probability Ted (T) is last = 0.150
Part (b): Expected total time until all students are gone: 1.2167 hours


## Example 2.3 (Compound Poisson Process)

Consider the McDonald's restaurant on Route 13 in the southern part of Ithaca. By arguments in the last section, it is not unreasonable to assume that between 12:00 and 1:00 cars arrive according to a Poisson process with rate $\lambda$. Let $Y_{i}$ be the number of people in the $i^{th}$ vehicle. There might be some correlation between the number of people in the car and the arrival time, but for a first approximation it seems reasonable to assume that the $Y_{i}$ are i.i.d. and independent of the Poisson process of arrival time.

### Solution

The arrival of cars is modeled as a Poisson process with rate $\lambda$. If $\lambda$ is given in units of cars per hour, then the number of cars arriving between 12:00 and 1:00 (denoted by $N$) follows a Poisson distribution:
$$
\mathbb{P}\left( N = n \right) = e^{- \lambda} \frac{\lambda^{n}}{n!} \; .
$$ 
It is assumed that $Y_{i}$ are identically distributed, independent of each other, and independent of the Poisson process governing the car arrivals. This means $Y_{i}$ are i.i.d. with some distribution. For simplicity, let's assume $Y_{i}$ also follows a Poisson distribution with rate $\mu$. The total number of people arriving at the restaurant between 12:00 and 1:00 can be expressed as:
$$
S = \sum_{i = 1}^{N} Y_{i} \; ,
$$
where $N$ is Poisson($\lambda$) and each $Y_{i}$ is i.i.d. Poisson($\mu$). Given $N$ is Poisson and $Y_{i}$ are i.i.d. Poisson and independent of $N$, the sum $S$ also follows a Poisson distribution. The expected value of $S$ is the product of the expected values of $N$ and $Y_{i}$:
$$
\mathbb{E}\left[ S \right] = \mathbb{E}\left[ N \right] \cdot \mathbb{E}\left[ Y_{i} \right] = \lambda \mu \; .
$$
Similarly, since $N$ and $Y_{i}$ are independent and Poisson distributed the variance of $S$ is the product of the variance of $N$ and the second moment of $Y_{i}$:
$$
\mathbb{V}\left[ S \right] = \mathbb{E}\left[ N \right] \cdot \mathbb{V}\left[ Y_{i} \right] + \mathbb{V}\left[ N \right] \left( \mathbb{E}\left[ Y_{i} \right] \right)^{2} = \lambda \left(\mathbb{V}\left[ Y_{i} \right] +  \left( \mathbb{E}\left[ Y_{i} \right] \right)^{2} \right) = \lambda \left(\mu + \mu^{2} \right) \; ,
$$
reflecting the scaling of individual term effects by the rate of the Poisson process. Thus, $S$ is Poisson distributed with rate $\lambda \mu$.

In [4]:
# Initialize random seed.
np.random.seed(37)  # https://grossack.site/2023/11/08/37-median.html
# Initialize rates.
lambda_ = 20  # Average number of cars per hour.
mu = 1.5  # Average number of people per car.
# Initialize lists to store the simulated replicates.
Ns = []
Y_is = []
Ss = []
# Run the simulation one million times (this took me ~15 seconds).
for _ in range(1_000_000):
    # Simulate the number of cars arriving in one hour.
    N = np.random.poisson(lambda_)
    # Simulate the number of people in each car.
    Y_i = np.random.poisson(mu, N)
    # Calculate the total number of people arriving in one hour.
    S = np.sum(Y_i)
    # Update the simulated replicate lists.
    Ns.append(N)
    Y_is.extend(Y_i)
    Ss.append(S)
# Compare the analytical solutions to the simulation results.
print(
    f"Number of cars: E_{{N}} = {lambda_} vs mu_{{N}} = {np.mean(Ns):.4f} & V_{{N}} = {lambda_} vs sigma_squared_{{N}} = {np.var(Ns):.4f}"
)
print(
    f"People in each car: E_{{Y_i}} = {mu} vs mu_{{Y_i}} = {np.mean(Y_is):.4f} & V_{{Y_i}} = {mu} vs sigma_squared_{{Y_i}} = {np.var(Y_is):.4f}"
)
print(
    f"Total number of people: E_{{S}} = {lambda_ * mu} vs mu_{{S}} = {np.mean(Ss):.4f} & V_{{S}} = {lambda_ * (mu + mu**2)} vs sigma_squared_{{S}} = {np.var(Ss):.4f}"
)

Number of cars: E_{N} = 20 vs mu_{N} = 19.9951 & V_{N} = 20 vs sigma_squared_{N} = 19.9991
People in each car: E_{Y_i} = 1.5 vs mu_{Y_i} = 1.5000 & V_{Y_i} = 1.5 vs sigma_squared_{Y_i} = 1.5003
Total number of people: E_{S} = 30.0 vs mu_{S} = 29.9918 & V_{S} = 75.0 vs sigma_squared_{S} = 75.0999


## Example 2.4 (Compound Poisson Process)

Messages arrive at a computer to be transmitted across the Internet. If we imagine a large number of users writing emails on their laptops (or tablets or smart phones) then the arrival times of messages can be modeled by a Poisson process. If we let $Y_{i}$ be the size of the $i^{th}$ message, then again it is reasonable to assume $Y_{1}, Y_{2}, \ldots$ are i.i.d. and independent of the Poisson process of arrival times.
### Solution

The arrival of messages is modeled as a Poisson process with rate $\lambda$, which could be given in units like messages per hour. If we consider a specific time window, such as one hour, the number of messages $N$ that arrive during this period is Poisson distributed:
$$
\mathbb{P}\left( N = n \right) = e^{- \lambda} \frac{\lambda^{n}}{n!} \; .
$$ 
It is assumed that $Y_{i}$ are identically distributed, independent of each other, and independent of the Poisson process governing the car arrivals. This means $Y_{i}$ are i.i.d. with some distribution. For simplicity, let's assume $Y_{i}$ follows a Log-normal distribution with parameters $\mu$ and $\sigma$ (the mean and standard deviation of the underlying normal distribution) to better reflect the variability and skewness typically observed in message sizes. The total size of the messages for an hour time interval can be expressed as:
$$
S = \sum_{i = 1}^{N} Y_{i} \; ,
$$
where $N$ is Poisson($\lambda$) and each $Y_{i}$ is i.i.d. Similar to the previous example the expected value and variance of $S$ are:
$$
\mathbb{E}\left[ S \right] = \mathbb{E}\left[ N \right] \cdot \mathbb{E}\left[ Y_{i} \right] = \lambda e^{\mu + \frac{\sigma^{2}}{2}} \; ,
$$
$$
\mathbb{V}\left[ S \right] = \mathbb{E}\left[ N \right] \cdot \mathbb{V}\left[ Y_{i} \right] + \mathbb{V}\left[ N \right] \left( \mathbb{E}\left[ Y_{i} \right] \right)^{2} = \lambda \left( \left( e^{\sigma^{2}} - 1 \right) e^{2 \mu + \sigma^{2}} \right) + \lambda \left( e^{\mu + \frac{\sigma^{2}}{2}} \right)^{2} = \lambda e^{2 \mu +  2\sigma^{2}} \; .
$$

In [5]:
# Initialize random seed.
np.random.seed(37)  # https://grossack.site/2023/11/08/37-median.html
# Initialize rates.
lambda_ = 10  # Average number of messages per hour
mu = 1  # Mean of the log for the log-normal distribution (log scale).
sigma = (
    0.5  # Standard deviation of the log for the log-normal distribution (log scale).
)
# Initialize lists to store the simulated replicates.
Ns = []
Y_is = []
Ss = []
# Run the simulation one million times (this took me ~12 seconds).
for _ in range(1_000_000):
    # Simulate the number of messages arriving in one hour.
    N = np.random.poisson(lambda_)
    # # Simulate the sizes of each message.
    Y_i = np.random.lognormal(mu, sigma, N)
    # Calculate the total size of messages in one hour.
    S = np.sum(Y_i)
    # Update the simulated replicate lists.
    Ns.append(N)
    Y_is.extend(Y_i)
    Ss.append(S)
# Compare the analytical solutions to the simulation results.
print(
    f"Number of messages: E_{{N}} = {lambda_} vs mu_{{N}} = {np.mean(Ns):.4f} & V_{{N}} = {lambda_} vs sigma_squared_{{N}} = {np.var(Ns):.4f}"
)
print(
    f"Sizes of messages: E_{{Y_i}} = {np.exp(mu + ((sigma**2) / 2)):.4f} vs mu_{{Y_i}} = {np.mean(Y_is):.4f} & V_{{Y_i}} = {((np.exp(sigma**2) - 1) * np.exp((2 * mu) + (sigma**2))):.4f} vs sigma_squared_{{Y_i}} = {np.var(Y_is):.4f}"
)
print(
    f"Total size of messages: E_{{S}} = {(lambda_ * np.exp(mu + ((sigma**2) / 2))):.4f} vs mu_{{S}} = {np.mean(Ss):.4f} & V_{{S}} = {(lambda_ * np.exp((2 * mu) + (2 * (sigma**2)))):.4f} vs sigma_squared_{{S}} = {np.var(Ss):.4f}"
)

Number of messages: E_{N} = 10 vs mu_{N} = 10.0019 & V_{N} = 10 vs sigma_squared_{N} = 9.9921
Sizes of messages: E_{Y_i} = 3.0802 vs mu_{Y_i} = 3.0804 & V_{Y_i} = 2.6948 vs sigma_squared_{Y_i} = 2.6968
Total size of messages: E_{S} = 30.8022 vs mu_{S} = 30.8100 & V_{S} = 121.8249 vs sigma_squared_{S} = 121.7755


## Example 2.5 (Compound Poisson Process)

Suppose that the number of customers at a liquor store in a day has a Poisson distribution with mean 81 and that each customer spends an average of $\$ 8$ with a standard deviation of $\$ 6$.

__(a)__ Find the liquor store's expected daily revenue.

__(b)__ Find the variance of liquor store's daily revenue.

__(c)__ Find the standard deviation of liquor store's daily revenue.

In [6]:
# Initialize the liquor store's parameters.
mean_daily_customers = 81
mean_expenditure = 8
std_expenditure = 6


### Part (a): The liquor store's expected daily revenue. ###

# Theorem (2.10).
exp_daily_revenue = mean_daily_customers * mean_expenditure
# Print the result.
print(f"Part (a):  The liquor store's expected daily revenue = ${exp_daily_revenue}")


### Part (b): The variance of liquor store's daily revenue. ###

# Theorem (2.10).
var_daily_revenue = mean_daily_customers * (
    (std_expenditure**2) + (mean_expenditure**2)
)
# Print the result.
print(f"Part (b): The variance of liquor store's daily revenue = ${var_daily_revenue}")


### Part (c): The standard deviation of liquor store's daily revenue. ###
std_daily_revenue = np.sqrt(var_daily_revenue)
# Print the result.
print(
    f"Part (c): The standard deviation of liquor store's daily revenue = ${std_daily_revenue}"
)

Part (a):  The liquor store's expected daily revenue = $648
Part (b): The variance of liquor store's daily revenue = $8100
Part (c): The standard deviation of liquor store's daily revenue = $90.0


## Example 2.6 (Thinning)

Ellen catches fish at times of a Poisson process with rate 2 per hour. $40 \%$ of the fish are salmon, while $60 \%$ of the fish are trout. What is the probability she will catch exactly 1 salmon and 2 trout if she fishes for 2.5 hours? 

In [7]:
# Initialize the parameters.
t = 2.5  # Time in hours.
lambda_f = 2  # Mean number of fish caught per hour.
# Compute the rates for salmon and trout.
lambda_s = lambda_f * 0.4 * t  # Mean number of salmon caught in 2.5 hours.
lambda_t = lambda_f * 0.6 * t  # Mean number of trout caught in 2.5 hours.
# Compute the probability of catching exactly 1 salmon and 2 trout.
prob_salmon = poisson.pmf(1, lambda_s)
prob_trout = poisson.pmf(2, lambda_t)
# Compute the joint probability.
joint_prob = prob_salmon * prob_trout
# Print the result.
print(f"Probability of catching exactly 1 salmon and 2 trout: {joint_prob:.4f}")

Probability of catching exactly 1 salmon and 2 trout: 0.0606


## Example 2.7 (Thinning)

Two copy editors read a 300-page manuscript. The first found 100 typos, the second found 120 , and their lists contain 80 errors in common. Suppose that the author's typos follow a Poisson process with some unknown rate $\lambda$ per page, while the two copy editors catch errors with unknown probabilities of success $p_{1}$ and $p_{2}$. The goal in this problem is to find an estimates of $\lambda$, $p_{1}, p_{2}$ and the number of undiscovered typos.

Let $X_{0}$ be the number of typos that neither found. Let $X_{1}$ and $X_{2}$ be the number of typos found only by 1 or only by 2 , and let $X_{3}$ be the number of typos found by both. If we let $\mu=300 \lambda$ the $X_{i}$ are independent Poisson with means:
$$
\mu\left(1-p_{1}\right)\left(1-p_{2}\right), \quad \mu p_{1}\left(1-p_{2}\right), \quad \mu\left(1-p_{1}\right) p_{2}, \quad \mu p_{1} p_{2} \; .
$$

In [8]:
# Initialize the observed number of typos.
X_1 = 20  # Number of typos found only by the first editor (100 - 80).
X_2 = 40  # Number of typos found only by the second editor (120 - 80).
X_3 = 80  # Number of typos found by both editors (80).
# Compute the success rates each editor catches errors.
prob_1 = X_3 / (X_2 + X_3)
prob_2 = X_3 / (X_1 + X_3)
# Compute the total number of expected types.
mu = X_3 / (prob_1 * prob_2)
# Compute the rate of typos per page.
lambda_ = mu / 300  # 300-page manuscript.
# Compute the number of undiscovered typos.
X_4 = mu - (X_1 + X_2 + X_3)
# Print the results.
print(f"The Author's typo rate per page: lambda = {lambda_:.4f}")
print(f"Copy Editor 1's typo dectetion probability: p_{{1}} = {prob_1:.4f}")
print(f"Copy Editor 2's typo dectetion probability: p_{{2}} = {prob_2:.4f}")
print(f"Number of undiscovered typos: X_4 = {X_4}")

The Author's typo rate per page: lambda = 0.5000
Copy Editor 1's typo dectetion probability: p_{1} = 0.6667
Copy Editor 2's typo dectetion probability: p_{2} = 0.8000
Number of undiscovered typos: X_4 = 10.0


## Example 2.8 (Thinning - Poissonization)

This example illustrates Poissonization - the fact that some combinatorial probability problems become much easier when the number of objects is not fixed but has a Poisson distribution. Suppose that a Poisson number of Duke students with mean 2263 will show up to watch the next women's basketball game. What is the probability that for all of the 365 days there is at least one person in the crowd who has that birthday (pretend February 29th does not exist).

In [9]:
# Initialize the parameters.
mean_students_at_game = 2263
days_in_a_year = 365
# Compute the expected number of students at any given game who also have a birthday that day.
mean_birthdays_at_game = mean_students_at_game / days_in_a_year
# Compute the probability that no students at the game have a birthday for any given day.
prob_no_birthday = np.exp(-mean_birthdays_at_game)
# Compute the probability that at least one student at the game has a birthday for any given day.
prob_birthday = 1 - prob_no_birthday
# Compute the probability that at least one student at the game has a birthday for all 365 days.
prob_birthday_all = prob_birthday**days_in_a_year
# Print the result.
print(
    f"Probability that for all of the 365 days there is at least one student at the game that has a birthday: {prob_birthday_all:.4f}"
)

Probability that for all of the 365 days there is at least one student at the game that has a birthday: 0.4764


## Example 2.9 (Nonhomogeneous Thinning - M/G/$\infty$ Queue)

 As one walks around the Duke campus it seems that every student is talking on their smartphone. The argument for arrivals at the ATM implies that the beginnings of calls follow a Poisson process. As for the calls themselves, while many people on the telephone show a lack of memory, there is no reason to suppose that the duration of a call has an exponential distribution, so we use a general distribution function $G$ with $G(0)=0$ and mean $\mu$. Suppose that the system starts empty at time 0.

### Solution

The probability that a call started at time $s$ is still ongoing at time $t$ is given by $1 - G(t-s)$, where $G(t-s)$ is the cumulative distribution function (CDF) of the call duration that describes the probability of a call ending by time $t-s$. To find the mean number of calls in progress at any given time $t$, we integrate the rate of calls still ongoing at each prior time $s$, from the start of the observation period $s = 0$ up to time $t$:
$$
\lambda \int_{s=0}^{t} (1-G(t-s)) \, ds \; .
$$
Let $r = t - s $ leads to $ds = -dr$, and adjusting the limits for $r$ gives:
$$
\lambda \int_{r=t-s}^{0} (1-G(r)) \, (-dr) = \lambda \int_{r=0}^{t} (1-G(r)) \, dr \; .
$$
This integral, $\lambda \int_{r=0}^{t} (1-G(r)) \, dr$, represents the expected number of calls still in progress at time $t$, where $\lambda$ scales the integral by the rate of new calls.

In [10]:
# Define the CDF of the call duration, G(x), for a uniform distribution where call durations are between 0 and 1 hour.
def G(x):
    if 0 <= x <= 1:
        return x / 1
    elif x > 1:
        return 1
    else:
        return 0


# Define a function to compute the integral 1 - G(x).
def integrand(x):
    return 1 - G(x)


# Define the arrival rate of calls.
lambda_ = 10  # Number of new calls per hour
# Initialize the time interval.
s = 0  # Start time.
t = 0.5  # End time of 30 minutes.
r = t - s
# Compute the integrals.
integral_s_to_t, _ = quad(integrand, s, t)
integral_r_to_0, _ = quad(integrand, r, 0)
integral_r_to_t, _ = quad(integrand, 0, t)
# Print the results.
print(
    f"lambda * int_{{s = 0}}^{{t = 0.5}} (1 - G(t - s)) ds = {lambda_ * integral_s_to_t:.4f}"
)
print(
    f"lambda * int_{{r = (t - s)}}^{{0}} (1 - G(r)) (-dr) = {lambda_ * integral_r_to_0 * -1:.4f}"
)
print(f"lambda * int_{{r = 0}}^{{t}} (1 - G(r)) dr = {lambda_ * integral_r_to_t:.4f}")

lambda * int_{s = 0}^{t = 0.5} (1 - G(t - s)) ds = 3.7500
lambda * int_{r = (t - s)}^{0} (1 - G(r)) (-dr) = 3.7500
lambda * int_{r = 0}^{t} (1 - G(r)) dr = 3.7500


## Example 2.10 (Nonhomogeneous Thinning)

Customers arrive at a sporting goods store at rate 10 per hour. $60 \%$ of the customers are men and $40 \%$ are women. Men stay in the store for an amount of time that is exponential with mean $\frac{1}{2}$ hour. Women for an amount of time that is uniformly distributed with mean $\frac{1}{4}$ hour. What is the probability in equilibrium that there are four men and two women in the store?

In [11]:
# Initialize the parameters.
lambda_customers = 10
percent_m = 0.6
percent_w = 0.4
mean_duration_m = 0.5
mean_duration_w = 0.25
# Compute the expected number of men and women customers.
N_m = lambda_customers * percent_m
N_w = lambda_customers * percent_w
# Compute the equilibrium rates for men and women.
lambda_m = N_m * mean_duration_m
lambda_w = N_w * mean_duration_w
# Compute the probabilities of the number men and women being at equilibrium.
prob_equilibrium_m = poisson.pmf(4, lambda_m)
prob_equilibrium_w = poisson.pmf(2, lambda_w)
# Compute the joint probability.
joint_prob = prob_equilibrium_m * prob_equilibrium_w
# Print the result.
print(
    f"Probability in equilibrium that there are four men and two women in the store: {joint_prob:.4f}"
)

Probability in equilibrium that there are four men and two women in the store: 0.0309


## Example 2.11 (Nonhomogeneous Thinning)

People arrive at a puzzle exhibit according to a Poisson process with rate 2 per minute. The exhibit has enough copies of the puzzle so everyone at the exhibit can have one to play with. Suppose the puzzle takes an amount of time to solve that is uniform on $(0,10)$ minutes. 

__(a)__ What is the distribution of the number of people working on the puzzle in equilibrium?

__(b)__ What is the probability that there are three people at the exhibit working on puzzles, one that has been working more than four minutes, and two less than four minutes?

## Part __(a)__ Solution

The problem states that the time it takes to solve a puzzle is $X \sim$ Uniform($0, 10$), therefore, the probability that a customer who arrived $x$ minutes ago is still working on a puzzle is:
$$
1 - F_{X}\left( X = x \right) = 1 - \frac{x - a}{b - a} = 1 - \frac{x}{10} \; .
$$
Given that arrivals are a Poisson process with rate 2, by Poisson thinning the number of people who are still working on the puzzle at any given time in equilibrium is Poisson with mean:
$$
2 \int_{0}^{10} 1 - \frac{x}{10} dx = 2 \left[x - \frac{x^{2}}{20} \right]_{0}^{10} = 2 \left(10 - 5 \right) = 10 \; .
$$

In [12]:
# Define the function to integrate: 1 - (x / 10).
def integrand(x):
    return x / 10


# Initialize the arrival rate.
lambda_ = 2
# Initialize the mean number of people who are still working on the puzzle at any given time in equilibrium.
mean_at_equilibrium = 10  # From part (a).
# Compute the mean number of people who have been working on the puzzle more than 4 minutes.
mean_more_than_four_mins, _ = quad(
    integrand, 0, 6
)  # By Poisson thinning \int_{4}^{10} 1 - (x / 10) dx = \int_{0}^{6} (x / 10) dx.
mean_more_than_four_mins *= lambda_
# Compute the mean number of people who have been working on the puzzle less than 4 minutes.
mean_less_than_four_mins = mean_at_equilibrium - mean_more_than_four_mins
# Compute the probability that there are three people at the exhibit working on puzzles, one that has been working more than four minutes, and two less than four minutes.
prob_more_than_four_mins = poisson.pmf(1, mean_more_than_four_mins)
prob_less_than_four_mins = poisson.pmf(2, mean_less_than_four_mins)
# Compute the joint probability.
joint_prob = prob_more_than_four_mins * prob_less_than_four_mins
# Print the result.
print(
    f"Part (b): Probability that there are three people at the exhibit working on puzzles, one that has been working more than four minutes, and two less than four minutes = {joint_prob:.4f}"
)

Part (b): Probability that there are three people at the exhibit working on puzzles, one that has been working more than four minutes, and two less than four minutes = 0.0033


## Example 2.12 (Superposition - A Poisson Race)

Given a Poisson process of red arrivals with rate $\lambda$ and an independent Poisson process of green arrivals with rate $\mu$, what is the probability that we will get 6 red arrivals before a total of 4 green ones?

### Solution

The key to this problem is to first note that the event in question—i.e., the probability that we will get 6 red arrivals before a total of 4 green one—is equivalent to having at least 6 red arrivals out of the first 9 arrivals in the combined process. One can view the combined arrivals as a single Poisson process where each arrival is independently classified as red with probability $p = \frac{\lambda}{\lambda + \mu}$ or green with a probability of $1 - p$. Given this setup, the total number of arrivals in the combined process is distributed as Poisson($\lambda + \mu$) and the color of each arrival is determined by a Bernoulli trial with success probability $p$. Therefore, the number of red arrivals in the first 9 total arrivals can be modeled as a binomial distribution:
$$
X \sim \text{Binomial}\left(n = 9, p = \frac{\lambda}{\lambda + \mu} \right) \; .
$$
Given that we need at least 6 reds before 4 greens the relevant sum of probabilities is for the event of interest is for $k = 6 \rightarrow k = 9$ thus the probability of interest is:
$$
\sum_{k = 6}^{9} = \binom{9}{k} \left( \frac{\lambda}{\lambda + \mu} \right)^{k} \left( 1 - \frac{\lambda}{\lambda + \mu} \right)^{9 - k} \; .
$$

#### Note

I believe Dr. Durrett has made an typo in his computation were he states: "_for simplicity suppose that $\lambda=\mu$ so $p=\frac{1}{2}$._" In the text he says this expression becomes:
$$
\frac{1}{512} \cdot \sum_{k=6}^{9} \binom{9}{k} = \frac{1 + 9 + (9 \cdot 8) / 2 + (9 \cdot 8 \cdot 7) / 3!}{512}= \frac{140}{512} \approx 0.273 \; ,
$$
but _I believe_ the numerator of the summation should actually be 10 less than what it is in the text, which is validated in the code below:
$$
\frac{1}{512} \cdot \sum_{k=6}^{9} \binom{9}{k} = \frac{1 + 9 + (9 \cdot 8) / 2 + (9 \cdot 8 \cdot 7) / 3!}{512} = \frac{1 + 9 + 36 + 84}{512} = \frac{130}{512} \approx 0.2539 \; .
$$

In [13]:
# Initialize the parameters assuming lambda = mu and p = 1 / 2.
n = 9  # Total number of arrivals.
p = 0.5  # Probability of an arrival being red.
# Compute the probability that we will get 6 red arrivals before a total of 4 green ones.
prob_at_least_six_reds_before_four_greens = sum(
    binom.pmf(k, n, p) for k in range(6, 10)
)
# Print the result.
print(
    f"Probability that we will get 6 red arrivals before a total of 4 green ones: {prob_at_least_six_reds_before_four_greens:.4f}"
)

Probability that we will get 6 red arrivals before a total of 4 green ones: 0.2539


## Example 2.13 (Conditioning)

Consider a Poisson process with rate $\lambda$, what are the probabilities of observing $k$ arrivals by time 1 given that a total of 4 arrivals are observed over an interval of length 3.


In [14]:
# Initialize the parameters.
n = 4
t = 3
s = 1
p = s / t
# Compute the conditional probabilities.
conditional_probs = [binom.pmf(k, n, p) for k in range(n + 1)]
# For every conditional probability.
for k, prob in enumerate(conditional_probs):
    # Print the result.
    print(f"P(N(1) = {k}|N(3) = 4) = {prob:.4f}")

P(N(1) = 0|N(3) = 4) = 0.1975
P(N(1) = 1|N(3) = 4) = 0.3951
P(N(1) = 2|N(3) = 4) = 0.2963
P(N(1) = 3|N(3) = 4) = 0.0988
P(N(1) = 4|N(3) = 4) = 0.0123


## Example 2.14 (Conditioning)

Trucks and cars on highway US 421 are Poisson processes with rate 40 and 100 per hour respectively. $\frac{1}{8}$ of the trucks and $\frac{1}{10}$ of the cars get off on exit 257 to go to the Bojangles' in Yadkinville. 

__(a)__ Find the probability that exactly 6 trucks arrive at Bojangles between noon and 1PM. 

__(b)__ Given that there were 6 truck arrivals at Bojangles between noon and 1PM, what is the probability that exactly two arrived between 12:20 and 12:40?

__(c)__ If we start watching at noon, what is the probability that four cars arrive at Bojangles before two trucks do. 

__(d)__ Suppose that all trucks have 1 passenger while $30 \%$ of the cars have 1 passenger, $50 \%$ have 2, and $20 \%$ have 4. Find the mean and standard deviation of the number of customers that arrive at Bojangles in one hour.

#### Note

I must be missing something for part (d). My derivation of the for the expected number of customers matches Dr. Durrett's solution, but my derivation of the variance is very different. I am hesitant to claim Dr. Durrett is wrong—as he indubitably knows way more about statistics than I do—so here is my derivation with justifications.


Let $\lambda = 40$ and $\mu = 100$ be the rates for the Poisson processes where trucks and cars appear on highway US 421 per hour respectively. Given that $\frac{1}{8}$ of the trucks and $\frac{1}{10}$ of the cars get off on exit 257 to go to the Bojangles' in Yadkinville, let $\lambda_{b}$ and $\mu_{b}$ be the thinned Poisson rates of trucks and cars arriving at Bojangles per hour, respectively. By Poisson thinning these rates are:
\begin{align*}
&\lambda_{b} = \lambda \cdot \frac{1}{8} = \frac{40}{8} = 5 \; , \\
\\
&\mu_{b} = \mu \cdot \frac{1}{10} = \frac{100}{10} = 10 \; .
\end{align*}
The problem states that all trucks have 1 passenger while $30 \%$ of the cars have 1 passenger, $50 \%$ have 2, and $20 \%$ have 4. Let $N_{t}$ and $N_{c}$ be random variables that represent the number of passengers riding in a single truck and car, respectively. We are first interested in computing the expected values and variances for $N_{t}$ and $N_{c}$. Since all trucks have exactly 1 passenger finding the expectation and variance for $N_{t}$ is trivial:
\begin{align*}
&\mathbb{E}\left[ N_{t} \right] = 1 \; , \\
\\
&\mathbb{E}\left[ N_{t}^{2} \right] = 1^{2} = 1 \; , \\
\\
&\mathbb{V}\left[ N_{t} \right] = \mathbb{E}\left[ N_{t}^{2} \right] - \left( \mathbb{E}\left[ N_{t} \right] \right)^{2} = 1 - 1^{2} = 0 \; .
\end{align*}
Computing the expectation and variance for $N_{c}$ is straightforward, by the law of the unconscious statistician (first two equalities) and the definition of variance (third equality) we get:
\begin{align*}
&\mathbb{E}\left[ N_{c} \right] = 0.3 \cdot 1 + 0.5 \cdot 2 + 0.2 \cdot 4 =  0.3 + 1 + 0.8 = 2.1 \; , \\
\\
&\mathbb{E}\left[ N_{c}^{2} \right] = 0.3 \cdot 1^{2} + 0.5 \cdot 2^{2} + 0.2 \cdot 4^{2} = 0.3 + 2 + 3.2 = 5.5 \; , \\
\\
&\mathbb{V}\left[ N_{c} \right] = \mathbb{E}\left[ N_{c}^{2} \right] - \left( \mathbb{E}\left[ N_{c} \right] \right)^{2} = 5.5 - 2.1^{2} =  5.5 - 4.41 = 1.09 \; .
\end{align*}
Now, let $C$ be the number of customers that arrive at Bojangles in one hour. Lastly, recalling that the expected value of a Poisson random variable is equivalent to the rate parameter, by the linearity of expectation (first equality), the property of variance for independent random variables (second equality), and the definition of standard deviation (third equality) the expected value and standard deviation of $C$ are:
\begin{align*}
&\mathbb{E}\left[ C \right] = \mathbb{E}\left[ \lambda_{b} \cdot N_{t} + \mu_{b} \cdot N_{c} \right] = \lambda_{b} \cdot \mathbb{E}\left[ N_{t} \right] + \mu_{b} \cdot \mathbb{E}\left[ N_{c} \right] = 5 \cdot 1 + 10 \cdot 2.1  = 5 + 21 = 26 \; , \\
\\
&\mathbb{V}\left[ C \right] = \lambda_{b} \cdot \mathbb{V}\left[ N_{t} \right] + \mu_{b} \cdot \mathbb{V}\left[ N_{c} \right] = 5 \cdot 0 + 10 \cdot 1.09 = 0 + 10.9 = 10.9 \; , \\
\\
&\mathbb{SD}\left[ C \right] = \sqrt{\mathbb{V}\left[ C \right]} = \sqrt{10.9} \approx 3.3015 \; .
\end{align*}

In [15]:
# Initialize the parameters.
lambda_ = 40  # Truck arrival rate (trucks per hour).
mu = 100  # Car arrival rate (cars per hour).
prop_trucks_bojangles = 1 / 8  # Proportion of trucks that go to Bojangles.
prop_cars_bojangles = 1 / 10  # Proportion of cars that go to Bojangles.


### Part (a): Probability that exactly six trucks arrive at Bojangles between noon and 1PM. ###

# Compute the rate of trucks arriving at Bojangles.
lambda_bojangles = lambda_ * prop_trucks_bojangles
# Compute the probability that exactly six trucks arrive at Bojangles between noon and 1PM.
prob_six_trucks = poisson.pmf(6, lambda_bojangles)
# Print the result.
print(
    f"Part (a): Probability that exactly six trucks arrive at Bojangles between noon and 1PM = {prob_six_trucks:.4f}"
)


### Part (b): Probability that exactly two trucks arrived between 12:20 and 12:40 given that six trucks arrived at Bojangles between noon and 1PM. ###

# Initialize parameters for the conditional distribution of trucks arriving at Bojangles.
n_trucks = 6  # Number of trucks that arrived at Bojangles between noon and 1PM.
m_trucks = 2  # Number of trucks that arrived at Bojangles between 12:20 and 12:40.
t_trucks = 60  # Time in minutes between noon and 1PM.
s_trucks = 20  # Time in minutes between 12:20 and 12:40.
# Compute the conditional probability that exactly two trucks arrived between 12:20 and 12:40 given that six trucks arrived at Bojangles between noon and 1PM.
cond_prob_two_trucks = binom.pmf(m_trucks, n_trucks, s_trucks / t_trucks)
# Print the result.
print(
    f"Part (b): Probability that exactly two trucks arrived between 12:20 and 12:40 given that six trucks arrived at Bojangles between noon and 1PM = {cond_prob_two_trucks:.4f}"
)


### Part (c): Probability that four cars arrive at Bojangles before two trucks do. ###
## Note: There are two scenarios to consider: (1) four cars are among the first five vehicles, and (2) all of the first five vehicles are cars. ##

# Compute the rate of cars arriving at Bojangles—we computed this for trucks in part (a).
mu_bojangles = mu * prop_cars_bojangles
# Compute the probabilities of vehicle identities in the combined Bojangles arrival process.
prob_car_bojangles = mu_bojangles / (lambda_bojangles + mu_bojangles)
prob_truck_bojangles = 1 - prob_car_bojangles
# Compute the probability of exactly four cars and one truck.
prob_four_cars_one_truck = (
    comb(5, 4, exact=True) * (prob_car_bojangles**4) * prob_truck_bojangles
)
# Compute the probability of all five vehicles being cars.
prob_five_cars = prob_car_bojangles**5
# Compute the total probability that at least four cars arrive before two trucks do.
prob_four_cars_before_two_trucks = prob_four_cars_one_truck + prob_five_cars
# Print the result.
print(
    f"Part (c): Probability that four cars arrive at Bojangles before two trucks do = {prob_four_cars_before_two_trucks:.4f}"
)


### Part (d): Find the mean and standard deviation of the number of customers that arrive at Bojangles in one hour. ###

# Initialize the number of passengers per truck.
n_truck_passengers = 1
# Initialize the number of passengers per car.
n_car_passengers = {
    1: 0.3,
    2: 0.5,
    4: 0.2,
}  # Key: number of passengers, Value: probability.
# Since all trucks have one passenger, initialize the expected value and variance of the number of passengers per truck.
exp_truck_passengers = n_truck_passengers
var_truck_passengers = (n_truck_passengers**2) - (exp_truck_passengers**2)
# Compute the expected value and variance of the number of passengers per car.
exp_car_passengers = sum(k * v for k, v in n_car_passengers.items())
var_car_passengers = sum((k**2) * v for k, v in n_car_passengers.items()) - (
    exp_car_passengers**2
)
# Compute the expected value and standard deviation of the number of customers that arrive at Bojangles in one hour.
exp_customers = (lambda_bojangles * exp_truck_passengers) + (
    mu_bojangles * exp_car_passengers
)
std_customers = np.sqrt(
    (lambda_bojangles * var_truck_passengers) + (mu_bojangles * var_car_passengers)
)
# Print the results.
print(
    f"Part (d): Expected number of customers that arrive at Bojangles in one hour = {exp_customers:.4f}"
)
print(
    f"Part (d): Standard deviation of the number of customers that arrive at Bojangles in one hour = {std_customers:.4f}"
)

Part (a): Probability that exactly six trucks arrive at Bojangles between noon and 1PM = 0.1462
Part (b): Probability that exactly two trucks arrived between 12:20 and 12:40 given that six trucks arrived at Bojangles between noon and 1PM = 0.3292
Part (c): Probability that four cars arrive at Bojangles before two trucks do = 0.4609
Part (d): Expected number of customers that arrive at Bojangles in one hour = 26.0000
Part (d): Standard deviation of the number of customers that arrive at Bojangles in one hour = 3.3015
