In [0]:
#@title Imports
!pip install -q symbulate
from symbulate import *

import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

In [0]:
#@title Define Plotting Functions

def plot_continuous_function(f, xlim=(0, 1), xlabel="", ylabel="", ax=None):
  xs = np.linspace(xlim[0], xlim[1], 1000)
  ys = [f(x) for x in xs]
  if ax is None:
    ax = plt.gca()
  ax.plot(xs, ys, "-")
  ax.set_xlabel(xlabel, fontsize=18)
  ax.set_ylabel(ylabel, fontsize=18)
  ax.set_xlim(*xlim)

# Cumulative Distribution Function

The **cumulative distribution function** (or **c.d.f.**) of a random variable returns the probability a random variable is less than equal to $x$.


$$ F(x) \overset{\text{def}}{=} P(X \leq x) $$

For example, suppose we toss a coin repeatedly until lands hands. What is the probability it takes at most 4 tosses?

In [0]:
(NegativeBinomial(r=1, p=.5).pmf([0, 1, 2, 3, 4]).sum(),
 NegativeBinomial(r=1, p=.5).cdf(4))

Both discrete and continuous distributions have c.d.f.s. For example, suppose we simulate $R$ from an $\text{Exponential}(\lambda=1.5)$ distribution and then draw a circle with radius $R$. We can calculate the probability that the area is greater than 2 in two ways---by integrating the p.d.f. or by using the c.d.f. of $R$:

\begin{align}
P(\pi R^2 > 2) &= P(R > \sqrt{2/\pi}) = \int_{\sqrt{2 / \pi}}^\infty 1.5 e^{-1.5 r}\,dr & \text{(integrating the p.d.f.)} \\
&= 1 - F(\sqrt{2/\pi}) & \text{(using the c.d.f. of $R$)}
\end{align}

In [0]:
1 - Exponential(1.5).cdf(sqrt(2 / pi))

## Visualizing the c.d.f.

For a discrete random variable with p.m.f. $p[x]$, we calculate the c.d.f. $F(x)$ by summing all the probabilities up to (and including) $x$:

$$ F(x) = \sum_{t\leq x} p[t]. $$

(Note that we use the dummy variable $t$ inside the summation.)

Let's graph the p.m.f. and c.d.f. of the $\text{NegativeBinomial}(r=1, p=0.5)$ distribution above.

In [0]:
#@title Graphing the PMF vs. the CDF

fig, axes = plt.subplots(2, 1, figsize=(6, 12))

NegativeBinomial(r=1, p=0.5).plot(ax=axes[0])
plot_continuous_function(NegativeBinomial(r=1, p=0.5).cdf,
                         xlim=(0, 10),
                         ax=axes[1])

axes[0].set_ylabel("PMF", fontsize=16)
axes[1].set_ylabel("CDF", fontsize=16)
axes[1].set_xlabel("$x$", fontsize=16);

In general, the c.d.f. of a discrete random variable will be a step function. That's because if a random variable $X$ can only take the values $\{ 0, 1, 2, 3, ... \}$, then $P(X \leq 2.1)$ and $P(X \leq 2.7)$ are the same; they are both equal to $p[0] + p[1] + p[2]$, since no value between 2 and 3 is a possible outcome.

What about a continuous random variable? To calculate $P(X \leq x)$, we have to integrate the p.d.f. up to $x$:

$$ F(x) = \int_{-\infty}^x p(t)\,dt. $$

For example, for the $\text{Exponential}(\lambda)$ distribution, the c.d.f. is 

$$ F(x) = \int_{0}^x \lambda e^{-\lambda t}\,dt = 1 - e^{-\lambda t}. $$

(Check the math for yourself.) Note that the lower limit of the integral is $0$ (rather than $-\infty$) because $p(t) = 0$ when $t < 0$ for the exponential distribution.

Let's also graph the p.d.f.s and c.d.f.s of the $\text{Exponential}(\lambda=1.5)$ distribution.

In [0]:
#@title Graphing the PMF vs. the CDF

fig, axes = plt.subplots(2, 1, figsize=(6, 12))

Exponential(1.5).plot(ax=axes[0], xlim=(-0.5, 5))
plot_continuous_function(Exponential(1.5).cdf,
                         xlim=(-0.5, 5),
                         ax=axes[1])
axes[0].set_xlim(-0.5, 5)

axes[0].set_ylabel("PDF", fontsize=16)
axes[1].set_ylabel("CDF", fontsize=16)
axes[1].set_xlabel("$x$", fontsize=16);

In summary, here are the main properties of the c.d.f.:

- The c.d.f. never decreases as you move from left to right (i.e., increase $x$).
- As $x \to -\infty$, $F(x) \to 0$.
- As $x \to \infty$, $F(x) \to 1$.

Think about why these properties make intuitive sense.