## A Fun and Easy problem about Geyser Eruptions

### Carolyn P. Johnston, March 6, 2016

Husband and Grown Children went off for a drive somewhere, and when they came back, they were arguing about how to solve the following problem: 

*There are 3 geysers: A, B, and C. Geyser A erupts every 2 hours; B, every 4 hours; and C, every 6 hours. The geysers erupt independently (as far as we know).*  

*You arrive at the geysers, with no knowledge of when any of them last erupted. What is the probability that A will erupt before both B and C?*

I first tried to work out an analytic solution, and then used a simulation to check my work. After several tries at calculating the integral and getting results that didn't match the simulation's (the integral was different every time -- not a good sign), I finally calculated the integral correctly.

I'd like to be able to blame Probability or at least Calculus, but Arithmetic was actually at fault. 

### The analytic solution

The time until next eruption of A, $t_A$, is modeled with $U_A$, a uniform distribution over $[0,2]$; $t_B$, as $U_B$ = Uniform $[0,4]$; $t_C$, as $U_C$ = Uniform$[0,6]$. The events of A, B, and C erupting are assumed to be independent. 

Since the eruption events are independent, the joint probability is given by the product of the individual pdfs:

![alt text](Geyser-eq1.png "Equation 1")

<!-- (originally) $$P(t_A < a, t_B < b, t_C < c) = \int_0^a \int_0^b\int_0^c U_A(a)U_B(b)U_C(c) \cdot da \cdot db \cdot dc.$$ -->

The probability we are interested in is that of the event that $t_A < t_B$ and $t_A < t_C$. This is given by the integral: 

![alt text](Geyser-eq2.png "Equation 2")

<!-- (originally)  $$\int_0^2 \int_a^4\int_a^6 U_A(a)\cdot U_B(b)\cdot U_C(c) \cdot da \cdot db \cdot dc = $$ -->

<!-- (originally)  $$\frac{1}{48} \int_0^2 \left\{\int_a^4 db \cdot \int_a^6 dc\right\} \cdot da = \frac{1}{48} \int_0^2 (4-a)\cdot(6-a) \cdot da = \frac{1}{48} \cdot [\frac{56}{3} + 12] = 0.63888.....$$ -->


### The simulation in R

We'll simulate N draws from these distributions, where N is a Really Large Number, and look at the frequency of the event of interest (A erupting before both B and C). 

The value you get from the simulation with N=100,000 is within a couple of thousandths of the closed-form solution. To get any closer than that, the computer has to work a little.


In [6]:
# simulate N observations of the time till the geyser erupts
# using runif 

N=100000

time_till_A_erupts = runif(N, min=0, max=2)
time_till_B_erupts = runif(N, min=0, max=4)
time_till_C_erupts = runif(N, min=0, max=6)

# calculate the desired frequencies
eventsA = (time_till_A_erupts < time_till_B_erupts & time_till_A_erupts < time_till_C_erupts)
frequencyA = mean(eventsA)
sprintf("The frequency of A erupting before B and C is: %f", frequencyA) 
