## __Discrete Probability Distributions__

---

<br>

Author:      Tyler J. Brough <br>
Last Update: February 16, 2022 <br>

<br>

This notebook is based in part on the following source:

* _Chapter 3: Special Discrete Distributions_ of _Introduction to Probability and Mathematical Statistics_

---

<br>

In [1]:
import numpy as np
from scipy import stats
import matplotlib.pyplot as plt

In [2]:
plt.rcParams['figure.figsize'] = [10, 8]

<br>

### __Bernoulli Distribution__

<hr style="border:2px solid black"> </hr>

A random variable that takes only the values $0$ or $1$ is known as a __Bernoulli variable__, and an experiment with only two types of outcomes is called a __Bernoulli trial__. In particular, if an experiment can result only in "success" ($E$) or "failure" ($E^{\prime}$), then the corresponding Bernoulli variable is

<br>

$$
X(e) = \begin{cases} 
        1 & \text{  if } e \in E \\
        0 & \text{  if } e \in E^{\prime}
       \end{cases} 
$$

<br>

The probability function of the random variable $X$ is given by $f(0) = 1 - \theta$ and $f(1) = \theta$. The corresponding distribution is known as a __Bernoulli distribution__, and its pmf can be expressed as 

<br>

$$
f(x) = \theta^{x} (1 - \theta)^{1-x} \quad x = 0, 1
$$

<br>

We will use the notation $X \sim BIN(1, \theta)$ to denote a Bernoulli random variable.

<br>

<hr style="border:2px solid black"> </hr>

<br>

#### __Example: Bernoulli__ 

<br>
<br>

Consider rolls of a four-sided die. A bet is placed that a 1 will occur on a single roll of the die.
Thus, $E = \{1\}$, $E^{\prime} = \{2, 3, 4\}$. 

<br>

It is a simple matter of counting to deduce that $\theta = \frac{1}{4}$. 


<br>
<br>
<br>

<br>

### __Binomial Distribution__

<hr style="border:2px solid black"> </hr>

We may work with a more generalized experiment as a sequence of __independent Bernoulli trials__, where the quantity of 
interest is the number of successes on a certain number of trials. 

<br>

This leads to a more general __binomial distribution__. In a sequence of $n$ independent Bernoulli trials with probability
of success $\theta$ on each trial, let $X$ represent the number of successes. The discrete pmf of $X$ is given by

<br>

$$
b(x; n, \theta) = {n \choose x} \theta^{x} (1 - \theta)^{n-x} \quad x = 0, 1, \ldots, n
$$

<br>

For the event $[X = x]$ to occur, it is necessary to have some permutation of $x$ successes ($E$) and $n - x$ failures ($E^{\prime}$). This count is given by the so-called __binomial coefficient__ ${n \choose x}$ each one occurring with probability $\theta^{x} (1 - \theta)^{n - x}$, which is the product of $x$ values of $\theta = P(E)$ and $n - x$ values of $1 - \theta = P(E^{\prime})$.

<br>

We use the notation $X \sim BIN(n, \theta)$ to denote a binomial random variable.

<br>

<hr style="border:2px solid black"> </hr>

<br>

#### __Example: Binomial__ 

<br>
<br>

A student answers 20 true-false questions at random. What is the probability of getting $100\%$ on the test? What is the probability of getting $80\%$ on the test? 



In [3]:
stats.binom(20, 0.5).pmf(20)

9.5367431640625e-07

In [4]:
stats.binom(20, 0.5).pmf(16)

0.004620552062988271

<br>

What is the probability of getting 5 or fewer correct?

<br>

In [5]:
stats.binom(20, 0.5).pmf(0) + stats.binom(20, 0.5).pmf(1) + stats.binom(20, 0.5).pmf(2)  \
    + stats.binom(20, 0.5).pmf(3) + stats.binom(20, 0.5).pmf(4) + stats.binom(20, 0.5).pmf(5)

0.02069473266601554

In [6]:
result = 0.0

for x in range(0, 6):
    result += stats.binom(20, 0.5).pmf(x)
        
result

0.02069473266601554

In [7]:
stats.binom(20, 0.5).cdf(5)

0.020694732666015625

In [9]:
# 11 or greater
1.0 - stats.binom(20, 0.5).cdf(10)

0.4119014739990232

<br>
<br>
<br>

<br>

### __Geometric Distribution__

<hr style="border:2px solid black"> </hr>

Consider again a sequence of independent Bernoulli trials with probability of success $\theta = P(E)$. In the case of the binomial distribution, the number of 
trials was a fixed number $n$, and the variable of interest was the number of successes. Now we consider the number of trials required to achieve a specified 
number of successes. 

<br>

If we denote the number of trials required to obtain the _first_ success by the random variable $X$, then the discrete pmf of $X$ is given by

<br>

$$
g(x; \theta) = \theta (1 - \theta)^{x-1} \quad x = 1, 2, 3, \ldots
$$

<br>

The distribution of $X$ is known as the __geometric distribution__. We will use the notation $X \sim GEO(\theta)$ to denote a geometric random variable.

<br>

<hr style="border:2px solid black"> </hr>

<br>

#### __Example: Geomtric__

<br>
<br>

Research has shown that 4 out of 10 5th graders in Utah can locate the state of Colorado on a map. What is the probability that you will have to sample 5 students before one can locate Colorado on the map? 

<br>

In [10]:
stats.geom

[0;31mSignature:[0m       [0mstats[0m[0;34m.[0m[0mgeom[0m[0;34m([0m[0;34m*[0m[0margs[0m[0;34m,[0m [0;34m**[0m[0mkwds[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mType:[0m            geom_gen
[0;31mString form:[0m     <scipy.stats._discrete_distns.geom_gen object at 0x7fc69089f5e0>
[0;31mFile:[0m            /opt/anaconda3/lib/python3.8/site-packages/scipy/stats/_discrete_distns.py
[0;31mDocstring:[0m      
A geometric discrete random variable.

As an instance of the `rv_discrete` class, `geom` object inherits from it
a collection of generic methods (see below for the full list),
and completes them with details specific for this particular distribution.

Methods
-------
rvs(p, loc=0, size=1, random_state=None)
    Random variates.
pmf(k, p, loc=0)
    Probability mass function.
logpmf(k, p, loc=0)
    Log of the probability mass function.
cdf(k, p, loc=0)
    Cumulative distribution function.
logcdf(k, p, loc=0)
    Log of the cumulative distribution functio

<br>

### __Negative Binomial Distribution__

<hr style="border:2px solid black"> </hr>

In repeated independent Bernoulli trials, let $X$ denote the number of trials required to obtain $r$ successes. Then the probability distribution of $X$
is the __negative binomial distribution__ with discrete pmf given by

<br>

$$
f(x; r, \theta) = {x-1 \choose r-1} \theta^{r} (1 - \theta)^{x-r} \quad x = r, r + 1, \ldots
$$

<br>

We will use the notation $X \sim NB(r, p)$ to denote a negative binomial random variable.

<br>

<hr style="border:2px solid black"> </hr>

<br>

#### __Example: Negative Binomial__

<br>
<br>

Do da do do do, do do do da da do

<br>
<br>
<br>

<br>

### __Poisson Distribution__

<hr style="border:2px solid black"> </hr>

A discrete random variable $X$ is said to have a __Poisson distribution__ with parameter $\lambda > 0$ if it has the discrete pmf of the form

<br>

$$
f(x; \lambda) = \frac{e^{-\lambda} \lambda^{x}}{x!} \quad x = 0, 1, 2, \ldots
$$

<br>

We will use the notation $X \sim POI(\lambda)$ to denote a Poisson random variable. 

<br>

<hr style="border:2px solid black"> </hr>

<br>

#### __Example: Geomtric__

<br>
<br>

Do da do do do, do do do da da do

<br>
<br>
<br>