In [4]:
from scipy import stats

# Probability Distributions

## Definitions

- $E[X] = \sum_{i \in S}{P(X=i)*i}$
- $Var[X] = E[X^2] - E[X]^2$
- $\binom{n}{k} = \frac{n!}{k!(n-k)!}$

## Bernoulli Distribution

- A coin flip, weighted by parameter $p$
    - Event space is $\{0, 1\}$
    - Only one trial (repeated trials makes  Binomial Distribution)
- $P(X=1) = p$, and its complement $P(X=0) = 1-p$
- PMF is $f(x; p) = p^k (1-p)^{(1-k)}$
- Mean (expected value) is $E[X] = p$
- Variance is $Var[X] = pq = p(1-p)$

In [20]:
rv = stats.bernoulli(p=0.75)
mean, var = rv.stats("mv")
print(f"(p=0.75)  Mean: {mean}  Variance: {var}")

(p=0.75)  Mean: 0.75  Variance: 0.1875


In [15]:
prob_1 = rv.pmf(1)
prob_0 = rv.pmf(0)
print(f"P(X==1) = {prob_1}  ;  P(X==0) = {prob_0}")

P(X==1) = 0.75  ;  P(X==0) = 0.25


## Binomial Distribution

- Repeated coin flips
- Parameterized by $p$, the weight of the coin, and $n$, the number of flips
    - $n \in \mathbb{N}$
    - $p \in \{0, 1\}$
- PMF is $P(X=k ; n,p) = \binom{n}{k}p^k(1-p)^{n-k}$
    - Factor 2 is the probability of $k$ successes
    - Factor 3 is the probability of $(n-k)$ failures
    - Factor 1 counts the number of way those can be distributed
- Mean is $E[X] = np$
- Variance is $Var[X] = np(1-p)$, (i.e. n times a single Bernoulli trial)

In [30]:
rv = stats.binom(n=100, p=0.5)
mean, var = rv.stats("mv")
print(f"(n=100,  p=0.5)   Mean: {mean}  Variance: {var}    ")

rv = stats.binom(n=1000, p=0.5)
mean, var = rv.stats("mv")
print(f"(n=1000, p=0.5)   Mean: {mean}  Variance: {var}  # Increase n by 10x ==> mean, var up by 10x")

rv = stats.binom(n=100, p=0.75)
mean, var = rv.stats("mv")
print(f"(n=100,  p=0.75)  Mean: {mean}  Variance: {var}   # p higher than 0.5 ==> mean goes up, variance goes down")

(n=100,  p=0.5)   Mean: 50.0  Variance: 25.0    
(n=1000, p=0.5)   Mean: 500.0  Variance: 250.0  # Increase n by 10x ==> mean, var up by 10x
(n=100,  p=0.75)  Mean: 75.0  Variance: 18.75   # p higher than 0.5 ==> mean goes up, variance goes down


The probability that a motorcycle with change lanes is 80%.  Suppose a random sample of 16 motorcycles are observed.  Find the probability that at least one motorcycle will change lanes.

Source: https://www.youtube.com/watch?v=ftXp6t2znlY

Strategy one: Analytic

The probability that at least one motorcycle changes lanes is the complement of the probability that exactly 0 change lanes.  The problem can thus be simplified by finding $1 - P(X=0)$.

$n = 16$

$p = 0.8$

$k = 0$

$P(X=0) = \binom{16}{0}(0.8)^{0}(0.2)^{16} = 1 * 1 * (0.2)^{16} = 6.5536e12$

$P(X>0) = 1 - 6.5536e12 = 0.9999$

Strategy two: Numeric

In [38]:
rv = stats.binom(n=16, p=0.80)
prob_0 = rv.cdf(0)
print(f"The probability of at least one Motorcycle changing lanes is {1-prob_0}")

The probability of at least one Motorcycle changing lanes is 0.9999999999934464


## Discrete Uniform Distribution

- One throw of a fair die
- All events in the space occur with equal probability
- Often just parameterized short-hand with $n \in \mathbb{N}$ to specify the number of events
- PMF is $P(X=x; n) = \frac{1}{n}$
- Mean is $E[X] = \frac{n + 1}{2}$ or alternatively $\frac{a + b}{2}$
- Variance is $Var[X] = \frac{n^2-1}{12}$

In [46]:
rv = stats.randint(low=1, high=6)
mean, var = rv.stats("mv")
print(f"(n=6)   Mean: {mean}   Variance: {var}")

rv = stats.randint(low=1, high=60)
mean, var = rv.stats("mv")
print(f"(n=60)  Mean: {mean}  Variance: {var}  # Increase n by 10x ==> mean up by 10x, variance up by >>10x")

(n=6)   Mean: 3.0   Variance: 2.0
(n=60)  Mean: 30.0  Variance: 290.0  # Increase n by 10x ==> mean up by 10x, variance up by >>10x


## Multinomial Distribution

- Repeated throws of a die
- if $k$ is 6 as in a die, then PMF is $Pr(x_1, x_2 .. x_i; k_1, k_2 ... k_i) = M{x_i}p_1^{x_2}p_2^{x_2} ... p_i^{x_i}$
    - M is the multinomial coefficient $\frac{n!}{x_1!x_2!...x_{i-1}!}$
- Mean is $E[X] = \sum_{1}^{i}{np_i}$

Sample problems: https://stattrek.com/probability-distributions/multinomial.aspx