In [0]:
#@title Imports
!pip install -q symbulate
from symbulate import *

import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

In [0]:
#@title Define Plotting Functions

def plot_continuous_function(f, xlim=(0, 1), xlabel="", ylabel=""):
  xs = np.linspace(xlim[0], xlim[1], 1000)
  ys = [f(x) for x in xs]
  plt.plot(xs, ys, "-")
  plt.xlabel(xlabel, fontsize=18)
  plt.ylabel(ylabel, fontsize=18)
  plt.xlim(*xlim)

# Continuous Distributions

## Motivation: Time-to-Event Data

Suppose we wish to model random events (called **arrivals**) over time. For example, we might be interested in when radioactive particles hit a Geiger counter or when customers arrive at a bank. 

Run the cell below to simulate some random arrivals.

In [0]:
#@title Simulate Random Arrivals

plt.figure(figsize=(8, 1))
for t in 3 * np.random.rand(5):
  plt.plot([t], [0], 'rx', markersize=10)
plt.xlim(0, 3)
plt.xlabel("Time ($t$)", fontsize=16)
  
ax = plt.gca()
ax.yaxis.set_visible(False)
ax.spines['bottom'].set_position('center')
ax.spines['left'].set_color('none')
ax.spines['right'].set_color('none')
ax.spines['top'].set_color('none')

## A Model for Time-to-Event Data

- Chop up time into short intervals of length $\Delta t = 1/m$, where $m$ is a large number. 
- Each short interval can either have 0 arrivals or 1 arrival. (The intervals are so short that 2 or more arrivals in the same interval is impossible.)
- The probability of an arrival on any one short interval is small, $\lambda / m$. This means that the rate of arrivals is constant. For example, in 1 second, there are $m$ intervals, so the rate of arrivals is 
$$ m \cdot \frac{\lambda}{m} = \lambda \text{ arrivals per second}. $$
- An arrival in one interval does not change the probability of an arrival in any other interval, so these intervals can be modeled as random draws (with replacement) from a box.

How many arrivals are there between times $s$ and $t$? There are $m(t - s)$ short intervals between $s$ and $t$, each of which has a probability $\lambda / m$ of containing an arrival. So the exact distribution is 

$$ \text{number of arrivals in $(s, t)$} \sim \text{Binomial}(n=m(t - s), p=\lambda/m). $$ 

But we learned that when $n$ is large and $p$ is small, as it is in this case, the distribution is approximately 

$$\text{Poisson}(\mu=\lambda(t - s)). $$ 

For this reason, the model above is called the **Poisson process**.

For example, consider a Poisson process with a rate of $2.3$ arrivals per second. The number of arrivals between $1$ and $3$ seconds is a random variable whose p.m.f. looks like...

In [0]:
# YOUR CODE HERE
Poisson(2.3 * (3 - 1)).plot()

## The Time of the First Arrival

We've seen that the _number_ of arrivals in an interval follows a Poisson distribution. What about the _time_ of the first arrival? This is a very different kind of random variable. Unlike the Poisson distribution, which takes on values in the set $\{ 0, 1, 2, 3, ...\}$, the _time_ of the first arrival can be any real number in the interval $(0, \infty)$.

- Distributions like the binomial, Poisson, hypergeometric, and negative binomial---where the possible outcomes are discrete (usually integers)---are called **discrete**.
- The distribution of the time of the first arrival---where the possible outcomes are any real number---is called **continuous**.

Let's explore some properties of continuous random variables.

### Probability of an Exact Value

What is the probability that the time of the first arrival $T_1$ happens at exactly time $t$?

$$ P(T_1 = t) $$

In other words, we are looking for the number of trials until the first arrival. The number of trials $X$ follows a $\text{NegativeBinomial}(r=1, p)$ distribution, with p.m.f. 
$$ p[x] = p (1 - p)^{x-1}. $$
In our case, $p = \lambda / n$ and $x = mt$, so plugging in these values for $p$ and $x$, we obtain 

\begin{align}
P(T_1 = t) &= \frac{\lambda}{m} \left(1 - \frac{\lambda}{m}\right)^{mt - 1} \\
&\approx \lambda e^{-\lambda t} \Delta t & \text{when $m$ is very large.}
\end{align}

Because $\Delta t = 1/m \to 0$, the probability that $T_1$ is _exactly_ equal to $t$ is 0. This makes sense---there are so many possible times it could be; it is practically impossible that it would be _exactly_ equal to $t$.

### Probability of a Range of Values

But there are other sensible probabilities we can calculate. For example, what is the probability that the first arrival happens between time $a$ and $b$?

\begin{align}
P(a < T_1 < b) &= \sum_{t=a}^b \lambda e^{-\lambda t} \Delta t \\
&\approx \int_a^b \lambda e^{-\lambda t}\,dt & \text{as $\Delta t \to 0$.}
\end{align}

### The Probability Density Function

It turns out that the _integrand_ (the expression inside the integral)

$$ p(t) = \lambda e^{-\lambda t}. $$

completely the distribution of a continuous random variable.
It is called the **probability density function** (p.d.f.). Let's take a look at this function.

In [0]:
#@title Graph the p.d.f. above.

Exponential(0.5).plot(xlim=(0, 5))
Exponential(1.).plot(xlim=(0, 5))
Exponential(2.).plot(xlim=(0, 5))

plt.xlabel("Time (t)")
plt.legend([r"$\lambda = 0.5$", r"$\lambda = 1.$", r"$\lambda = 2.$"])


The p.d.f. $p(x)$ is the function you integrate to get probabilities for continuous random variables. If we want to know the probability that the random variable falls in some set $B$, then we integrate the p.d.f. $p(x)$ over $B$:

$$ P(X \in B) = \int_B p(x)\,dx. $$

**Caution:** The values of the p.d.f. are not probabilities. So $p(2.1)$ is _not_ the probability that the random variable is exactly equal to 2.1 (because the probability a continuous random variable is exactly equal to a particular value is 0, as we saw above!).

You can see this another way. By the definition above, the probability that a continuous random variable $X$ is exactly equal to $a$ is the integral from $a$ to $a$ of $p(x)$, which is necessarily zero.

$$ P(X = a) = \int_a^a p(x)\,dx = 0. $$

## Exponential Distribution

The continuous distribution that we derived above, with p.d.f. 

$$ p(x) = \lambda e^{-\lambda x}, x \geq 0. $$

is a named distribution. It is the $\text{Exponential}(\lambda)$ distribution.

In [0]:
X = RV(Exponential(1.5))
xs = X.sim(10000)
xs

In [0]:
xs.plot()
Exponential(1.5).plot()

## Exercises

### Exercise 1

Suppose that customers arrive at a bank according to a Poisson process with rate $\lambda = 0.8$ per minute.

What is the probability that the first customer arrives after 3 minutes? (_Hint:_ There are two ways to answer this question---using the exponential distribution or using the Poisson distribution. Try both and make sure your answers match!)


### Exercise 2

Suppose that we first choose a random number $R$ from an $\text{Exponential}(\lambda = 1.5)$ distribution and then draw a circle with radius $R$. What is the probability the _area_ of the circle is greater than $2$?