## Geometric Distribution

**Geometric Distribution** is the distribution of number of trials needed to get a first success in repeated Bernoulli trials. So suppose that there are independent trials and each trial results in one of two possible outcomes, labelled success and failure. Then our probability of success is denoted by P(success) = $p$, this stays constant from trial to trial and $X$ represents the number of trials needed to get the first success.

For the first success to occurr on the $x_{th}$ trial:

- The first $x - 1$ trial must be failures.
- The $x_{th}$ trial must be a success.

This gives us the probability mass function of the geometric distribution:

\begin{align}
P(X = x) = (1 - p)^{x - 1}p
\end{align}

For the geometric distribution the minimum $x$ that it can take is 1, however, there is no upper bound to that number.

> Example: In a large populations of adults, 30% have received CPR training, if adults from this population are randomly selected, what is the probability that the 6th person sampled is the first that has received CPR training?

In [3]:
from scipy.stats import geom

# geometric distribution with a probability of 0.3
# and the first success occuring on the 6th trial
geom(p = 0.3).pmf(k = 6)

0.05042099999999998

## Negative Binomial Distribution

**Negative Binomial Distribution** generalizes the geometric distribution. It specifies the number of trials needed to get the $r_{th}$ success. Note that we should not confuse this with the binomial distribution, which measures the number of success in a fix number of independent Bernoulli trials.

We'll again denote the probability of success by P(success) = $p$, this stays constant from trial to trial and $X$ represents the number of trials needed to get the $r_{th}$ success. Then, in order for $r_{th}$ to occur on the $x_{th}$ trial:

- The first $x - 1$ trial must result in $r - 1$ success. Using the binomial distribution's formula, we obtain

\begin{align}
P(X = x) = \binom{x - 1}{r - 1} p^{r - 1} \left( 1 - p \right)^{(x - 1) - (r - 1)}
\end{align}

- The last trial must be a success. Thus we multiply $p$ to the equation above and simply it a bit

\begin{align}
P(X = x) = \binom{x - 1}{r - 1} p^{r} \left( 1 - p \right)^{x - r}
\end{align}

For the negative binomial distribution the minimum $x$ that it can take is $r$, however, there is no upper bound to that number.

> Example: A person conducting telephone surveys must get 3 more completed surveys before their job is finished. On each random dialed number, there is a 9% chance of reaching an adult who will complete the survey. What is the probability the 3rd completed survey occurs on the 10th call

In [3]:
from scipy.stats import nbinom

# note that the definition of scipy's
# negative binomial distribution is:
# choose(k+n-1, n-1) * p**n * (1-p)**k
# thus k is equal to 10 - 3
nbinom(p = 0.09, n = 3).pmf(k = 7)

0.013561876192013217

# Reference

- [Youtube: An Introduction to the Geometric Distribution](https://www.youtube.com/watch?v=zq9Oz82iHf0)
- [Youtube: Introduction to the Negative Binomial Distribution](https://www.youtube.com/watch?v=BPlmjp2ymxw)
- [Notes: Eberly College of Science STAT 414/415: Geometric and Negative Binomial Distributions](https://onlinecourses.science.psu.edu/stat414/node/55)