In [1]:
import numpy as np
import pandas as pd
from scipy.stats import binom, bernoulli, geom, poisson
from IPython.display import display, Latex

# Reference

> [Unit: Random Variables](https://www.khanacademy.org/math/statistics-probability/random-variables-stats-library)

---

# Binomial random variables

> [Binomial Distribution](https://en.wikipedia.org/wiki/Binomial_distribution)

- **Each trial can be classified as a success or a failure.**
- **Number of trials is fixed.**
- **Independent**:
    - 10% rule: Within finite population, when sampling without replacement and sample size $\leq 10\%$ of the population size, it can be treated as independent.
    - Finite population: When sampling with replacement, it's independent.
    - Infinite population: It's independent.

---

## Identifying binomial variables

> [Binomial proability (basic)](https://www.khanacademy.org/math/statistics-probability/random-variables-stats-library/binomial-random-variables/a/binomial-probability-basic)

---

### Example 1

Based on previous data, an electronics manufacturer knows that $2\%$, percent of its computer processors are defective. Suppose the manufacturer randomly selects these processors until one is found with a defect. Let $D$ represent the number of processors it takes to find the first one that is defective. Assume that defective processors are independent.

**Is $D$ a binomial variable? Why or why not?**

There is no fixed number of trials, so $D$ is not a binomial variable.

---

### Example 2

Mia takes a random sample of $10$ of her coworkers and asks them each how many pets they have. Assume that their results are independent, and let $X$ represent the average number of pets in the sample.

**Is $X$ a binomial variable? Why or why not?**

Each trial isn't being classified as a success or failure, so $X$ is not a binomial variable.

Explain:

True. Each person is reporting a number that isn't being categorized as a success or a failure.

---

## Binomial probability formula

> [Probability mass function](https://en.wikipedia.org/wiki/Probability_mass_function)<br>
> In [probability](https://en.wikipedia.org/wiki/Probability_theory "Probability theory") and [statistics](https://en.wikipedia.org/wiki/Statistics "Statistics"), a **probability mass function** is a function that gives the probability that a [discrete random variable](https://en.wikipedia.org/wiki/Discrete_random_variable "Discrete random variable") is exactly equal to some value.[[1]](https://en.wikipedia.org/wiki/Probability_mass_function#cite_note-1) Sometimes it is also known as the discrete density function. The probability mass function is often the primary means of defining a [discrete probability distribution](https://en.wikipedia.org/wiki/Discrete_probability_distribution "Discrete probability distribution"), and such functions exist for either [scalar](https://en.wikipedia.org/wiki/Scalar_variable "Scalar variable") or [multivariate random variables](https://en.wikipedia.org/wiki/Multivariate_random_variable "Multivariate random variable") whose [domain](https://en.wikipedia.org/wiki/Domain_of_a_function "Domain of a function") is discrete.<br>
> A probability mass function differs from a [probability density function](https://en.wikipedia.org/wiki/Probability_density_function) (PDF) in that the latter is associated with continuous rather than discrete random variables. A PDF must be [integrated](https://en.wikipedia.org/wiki/Integration_(mathematics) "Integration (mathematics)") over an interval to yield a probability.[[2]](https://en.wikipedia.org/wiki/Probability_mass_function#cite_note-:0-2)<br>
> The value of the random variable having the largest probability mass is called the [mode](https://en.wikipedia.org/wiki/Mode_(statistics) "Mode (statistics)").

> [Cumulative distribution function](https://en.wikipedia.org/wiki/Cumulative_distribution_function)<br>
> In [probability theory](https://en.wikipedia.org/wiki/Probability_theory "Probability theory") and [statistics](https://en.wikipedia.org/wiki/Statistics "Statistics"), the **cumulative distribution function** (**CDF**) of a real-valued [random variable](https://en.wikipedia.org/wiki/Random_variable "Random variable") ${\displaystyle X}$, or just **distribution function** of ${\displaystyle X}$, evaluated at ${\displaystyle x}$ that ${\displaystyle X}$ will take a value less than or equal to ${\displaystyle x}$.[[1]](https://en.wikipedia.org/wiki/Cumulative_distribution_function#cite_note-1)<br>
> In the case of a scalar [continuous distribution](https://en.wikipedia.org/wiki/Continuous_distribution "Continuous distribution"), it gives the area under the [probability density function](https://en.wikipedia.org/wiki/Probability_density_function "Probability density function") from minus infinity to ${\displaystyle x}$. Cumulative distribution functions are also used to specify the distribution of [multivariate random variables](https://en.wikipedia.org/wiki/Multivariate_random_variable "Multivariate random variable").

- PMF (Probability mass function): $\displaystyle f(k, n, p) = \text{Pr}(k; n, p) = \text{Pr}(X = k) = {n \choose k}p^{k}q^{n-k}$
- PDF (Probability cumulative distribution function): $\displaystyle F(k; n, p) = \text{Pr}(X \leq k) = \sum^{\lfloor k \rfloor}_{i=0} {n \choose i}p^{i}q^{n-i}$

---

### Example 1

Heather has a weighted coin that has a $60\%$ chance of landing on heads each time it is flipped. She is going to flip the coin 555 times.

**Which of the following would find the probability of Heather getting exactly $3$ heads in $5$ flips of her weighted coin?**

$\displaystyle P(H = 3) = {5 \choose 3}(0.6)^{3}(0.4)^{2}$

---

## Calculating binomial probability

---

### Example 1: PMF

A small college has $800$ students, $10\%$, percent of which are left-handed. Suppose they take an SRS (Simple Random Sample) of $4$ students. Let $X=$ the number of left-handed students in the sample.

**What is the probability that exactly $2$ of the $4$ students are left-handed?**  
_You may round your answer to the nearest hundredth._

In [2]:
k, n, p = 2, 4, 0.1
precision = 2
prob = binom.pmf(k, n, p)
prob = round(prob, precision)
display(Latex(f'$P(X=2) = {prob}$'))

<IPython.core.display.Latex object>

---

### Example 2: CDF for left tail

Aja's favorite cereal is running a promotion that says $1-in-4$ boxes of the cereal contain a prize. Suppose that Aja is going to buy $5$ boxes of this cereal, and let $X$ represent the number of prizes she wins in these boxes. Assume that these boxes represent a random sample, and assume that prizes are independent between boxes.

**What is the probability that she wins at most $1$ prize in the $5$ boxes?**  
_You may round your answer to the nearest hundredth._

In [3]:
k, n, p = 1, 5, 1/4
precision = 2
prob = binom.cdf(k, n, p)
prob = round(prob, precision)
display(Latex(f'$P(X \leq 1) = {prob}$'))

<IPython.core.display.Latex object>

---

### Example 3: CDF for left tail

Ira ran out of time while taking a multiple-choice test and plans to guess on the last $6$ questions. Each question has $4$ possible choices, one of which is correct. Let $X=$ the number of answers Ira correctly guesses in the last $6$ questions.

**What is the probability that he answers fewer than $2$ questions correctly in the last $6$ questions?**  
_You may round your answer to the nearest hundredth._

In [4]:
k, n, p = 2-1, 6, 1/4
precision = 2
prob = binom.cdf(k, n, p)
prob = round(prob, precision)
display(Latex(f'$P(X \leq 2) = {prob}$'))

<IPython.core.display.Latex object>

---

### Example 4: CDF for right tail

Layla has a coin that has a $60\%$, percent chance of showing heads each time it is flipped. She is going to flip the coin $5$ times. Let $X$ represent the number of heads she gets.

**What is the probability that she gets more than $3$ heads?**  
_You may round your answer to the nearest hundredth._

In [5]:
k, n, p = 3, 5, 0.6
precision = 2
prob = 1 - binom.cdf(k, n, p)
prob = round(prob, precision)
display(Latex(f'$P(X > 3) = {prob}$'))

<IPython.core.display.Latex object>

---

### Example 5: CDF for right tail

Marta makes $90\%$, percent of the free throws she attempts. She is going to shoot $3$ free throws. Assume that the results of free throws are independent from each other. Let $X$ represent the number of free throws she makes.

**Find the probability that Marta makes at least $2$ of the $3$ free throws.**  
_You may round your answer to the nearest hundredth._

In [6]:
k, n, p = 2-1, 3, 0.9 # notice k should minus 1 since the condition is "at least"
precision = 2
prob = 1 - binom.cdf(k, n, p)
prob = round(prob, precision)
display(Latex(f'$P(X > 1) = {prob}$'))

<IPython.core.display.Latex object>

---

# Binomial mean and standard deviation formulas

- Mean: $E[X] = np$
- Variance $Var[X] = npq$

---

## Mean and standard deviation formulas of a binomial variable

---

### Example 1

A large fast-food chain runs a promotion where $1-in-4$ boxes of french fries include a coupon for a free box of french fries. Suppose that some location sells $100$ of these boxes of fries per day. Let $X=$ the number of coupons won per day.

**Find the mean and standard deviation of $X$.**  
_You may round your answers to the nearest tenth._

In [7]:
n, p = 100, 1/4
precision = 1
display(Latex(f'$\mu_X = {binom.mean(n, p)}$'))
display(Latex(f'$\sigma_X = {round(binom.std(n, p), precision)}$'))

<IPython.core.display.Latex object>

<IPython.core.display.Latex object>

---

### Example 2

A college has over $5{,}000$ students, $10\%$ of which are left-handed. A new lecture hall is being planned that will seat $300$ students, and college officials want to be sure there are enough left-handed desks. Suppose we randomly select groups of $300$ students from this college.

**What are the mean and standard deviation of the number of left-handed students in each group of $300$?**  
_You may round your answers to the nearest tenth._

In [8]:
n, p = 300, 0.1
precision = 1
display(Latex(f'$\mu_X = {binom.mean(n, p)}$'))
display(Latex(f'$\sigma_X = {round(binom.std(n, p), precision)}$'))

<IPython.core.display.Latex object>

<IPython.core.display.Latex object>

---

## [Bernoulli distribution](https://en.wikipedia.org/wiki/Bernoulli_distribution)

In [probability theory](https://en.wikipedia.org/wiki/Probability_theory "Probability theory") and [statistics](https://en.wikipedia.org/wiki/Statistics "Statistics"), the **Bernoulli distribution**, named after Swiss mathematician [Jacob Bernoulli](https://en.wikipedia.org/wiki/Jacob_Bernoulli "Jacob Bernoulli"),[[1]](https://en.wikipedia.org/wiki/Bernoulli_distribution#cite_note-1) is the [discrete probability distribution](https://en.wikipedia.org/wiki/Discrete_probability_distribution "Discrete probability distribution") of a [random variable](https://en.wikipedia.org/wiki/Random_variable "Random variable") which takes the value 1 with probability ${\displaystyle p}$ and the value 0 with probability ${\displaystyle q=1-p}$.

The Bernoulli distribution is a special case of the [binomial distribution](https://en.wikipedia.org/wiki/Binomial_distribution "Binomial distribution") where a single trial is conducted (so $n$ would be $1$ for such a binomial distribution). It is also a special case of the **two-point distribution**, for which the possible outcomes need not be $0$ and $1$.

- PMF: $\displaystyle f(k;p) = \begin{cases} p  & \text{if } k = 1, \\ q = 1 - p & \text{if } k = 0.\end{cases}$ $\Rightarrow \displaystyle p^k(1 - p)^{1-k} \text{for } k \in \{0, 1\}$
- Mean: $\displaystyle E[X] = p$
- Variance: $\displaystyle Var[X] = pq$

---

### Example 1

Assume unfavorable rating is $40\%$ and favorable rating is $60\%$ for the current president.

Let's say favorable is $1$ and unfavorable is $0$.

In [9]:
p = 0.6
precision = 2
display(Latex(f'$\mu_X = {bernoulli.mean(p)}$'))
display(Latex(f'$\sigma_X = {round(bernoulli.std(p), precision)}$'))

<IPython.core.display.Latex object>

<IPython.core.display.Latex object>

---

# Geometric random variables

> [Geometric Distribution](https://en.wikipedia.org/wiki/Geometric_distribution)

In [probability theory](https://en.wikipedia.org/wiki/Probability_theory "Probability theory") and [statistics](https://en.wikipedia.org/wiki/Statistics "Statistics"), the **geometric distribution** is either one of two [discrete probability distributions](https://en.wikipedia.org/wiki/Discrete_probability_distribution "Discrete probability distribution"):

-   The probability distribution of the number _X_ of [Bernoulli trials](https://en.wikipedia.org/wiki/Bernoulli_trial "Bernoulli trial") needed to get one success, supported on the set ${\displaystyle \{1,2,3,\ldots \}}$;
-   The probability distribution of the number _Y_ = _X_ − 1 of failures before the first success, supported on the set ${\displaystyle \{0,1,2,\ldots \}}$.

Which of these is called "the" geometric distribution is a matter of convention and convenience.

These two different geometric distributions should not be confused with each other. Often, the name _shifted_ geometric distribution is adopted for the former one (distribution of the number _X_); however, to avoid ambiguity, it is considered wise to indicate which is intended, by mentioning the support explicitly.

The geometric distribution gives the probability that the first occurrence of success requires _k_ independent trials, each with success probability _p_. If the probability of success on each trial is _p_, then the probability that the $k_{th}$ trial (out of _k_ trials) is the first success is

${\displaystyle \Pr(X=k)=(1-p)^{k-1}p}$

---

## Binomial vs. geometric random variables

---

### Example 1

Alma wrote a phone app. She has noticed that $3\%$ of the users who download her app upgrade it the same day. Let $N$ be the number of customers who download Alma's app until one upgrades it on the same day. Assume that the probability of each user's upgrades are independent.

**What type of variable is $N$?**

Geometric

Explain:

A binomial setting has a set number of trials, and the variable in question is the _number of successes_ that occur in those trials.

---

## Geometric probability

---

### Example 1: PMF

An online retailer determines that people click on a particular advertisement $2\%$ of the times it appears. Let $V$ be the number of times the advertisement appears to get the first person to click on it. Assume each click is independent.

**Find the probability that a person first clicks on the advertisement the $8^{\text{th}}$  time it appears.**  
_You may round your answer to the nearest hundredth._

In [10]:
k, p = 8, 0.02
precision = 2
prob = geom.pmf(k, p)
prob = round(prob, precision)
display(Latex(f'$P(V = {k}) = {prob}$'))

<IPython.core.display.Latex object>

---

### Example 2: PMF

Milani's teacher draws students names at random, calls on the student, and replaces the name so that students know they should always be prepared to respond. There are $20$ students in Milani's class. Let $X$ be the number of names it takes for the teacher to draw Milani's name.

**Find the probability that the teacher first draws Milani's name as the $7^{\text{th}}$  name.**  
_You may round your answer to the nearest hundredth._

In [11]:
k, p = 7, 1/20
precision = 2
prob = geom.pmf(k, p)
prob = round(prob, precision)
display(Latex(f'$P(X = {k}) = {prob}$'))

<IPython.core.display.Latex object>

---

### Example 3: PMF

Augustus draws tickets one at a time for a raffle. The person named on the ticket must be present to win, but $30\%$, percent of the $750$ raffle tickets have the names of people who are no longer present. Let $T$ be the number of tickets Augustus needs to draw to find a winner who is present.

**Find the probability that Augustus first draws the name of someone present on the $3^{\text{rd}}$ ticket.**  
_You may round your answer to the nearest hundredth._

In [12]:
k, p = 3, 1-0.3
precision = 2
prob = geom.pmf(k, p)
prob = round(prob, precision)
display(Latex(f'$P(T = {k}) = {prob}$'))

<IPython.core.display.Latex object>

---

## Cumulative geometric probability

---

### Example 1: CDF for left tail

An airline offers a survey for its passengers to complete after every flight. Each time a passenger completes the survey, there is a $2\%$ percent chance they will win a discounted price on their next flight. Assume that winners are selected at random, and the results of the surveys are independent.

Zaylee has numerous trips planned with this airline, and she'll always complete each survey in hopes of winning. Let $N$ be the number of surveys Zaylee completes until she wins for the first time.

**Find the probability that it takes Zaylee $3$ surveys or less to win for the first time.**  
_You may round your answer to the nearest hundredth._

In [13]:
k, p = 3, 0.02
precision = 2
prob = geom.cdf(k, p)
prob = round(prob, precision)
display(Latex(f'$P(N \leq {k}) = {prob}$'))

<IPython.core.display.Latex object>

---

### Example 2: CDF for left tail

Lilyana runs a cake decorating business, for which $10\%$ of her orders come over the telephone. Let $C$ be the number of cake orders Lilyana receives in a month until she first gets an order over the telephone. Assume the method of placing each cake order is independent.

**Find the probability that it takes fewer than $5$ orders for Lilyana to get her first telephone order of the month.**  
_You may round your answer to the nearest hundredth._

In [14]:
k, p = 5-1, 0.1 # notice the condition is fewer than 5
precision = 2
prob = geom.cdf(k, p)
prob = round(prob, precision)
display(Latex(f'$P(C \leq {k}) = {prob}$'))

<IPython.core.display.Latex object>

---

### Example 3: CDF for right tail

Jeremiah makes $25\%$ of the three-point shots he attempts. For a warm up, Jeremiah likes to shoot three-point shots until he makes one. Let $M$ be the number of shots it takes Jeremiah to make his first three-point shot. Assume that the results of each shot are independent.

**Find the probability that it takes Jeremiah more than $6$ attempts to make his first shot.**  
_You may round your answer to the nearest hundredth._

In [15]:
k, p = 6, 0.25
precision = 2
prob = round(geom.sf(k, p), precision)
display(Latex(f'$P(M > {k}) = {prob}$'))

<IPython.core.display.Latex object>

---

### Example 4: CDF for right tail

Anand knows from experience that if he does not review a new vocabulary word that he has learned, that he has $70\%$ chance of forgetting it each day. Let $D$ be the number of days Anand goes without reviewing a word until he forgets it.

**Find the probability that it takes Anand $4$ or more days to forget the word.**  
_You may round your answer to the nearest hundredth._

In [16]:
k, p = 4-1, 0.7 # notice the condition is 4 or more
precision = 2
prob = round(geom.sf(k, p), precision)
display(Latex(f'$P(D > {k})={prob}$'))

<IPython.core.display.Latex object>

---

# Poisson random variables

> [Poisson distribution](https://en.wikipedia.org/wiki/Poisson_distribution)

In [probability theory](https://en.wikipedia.org/wiki/Probability_theory "Probability theory") and [statistics](https://en.wikipedia.org/wiki/Statistics "Statistics"), the **Poisson distribution** is a [discrete probability distribution](https://en.wikipedia.org/wiki/Discrete_probability_distribution "Discrete probability distribution") that expresses the probability of a given number of events occurring in a fixed interval of time or space if these events occur with a known constant mean rate and [independently](https://en.wikipedia.org/wiki/Statistical_independence "Statistical independence") of the time since the last event.

---

## Examples that violate the Poisson assumptions

The number of students who arrive at the [student union](https://en.wikipedia.org/wiki/Student_center "Student center") per minute will likely not follow a Poisson distribution, because the rate is not constant (low rate during class time, high rate between class times) and the arrivals of individual students are not independent (students tend to come in groups).

The number of magnitude 5 earthquakes per year in a country may not follow a Poisson distribution if one large earthquake increases the probability of aftershocks of similar magnitude.

Examples in which at least one event is guaranteed are not Poisson distributed; but may be modeled using a [zero-truncated Poisson distribution](https://en.wikipedia.org/wiki/Zero-truncated_Poisson_distribution "Zero-truncated Poisson distribution").

Count distributions in which the number of intervals with zero events is higher than predicted by a Poisson model may be modeled using a [zero-inflated model](https://en.wikipedia.org/wiki/Zero-inflated_model "Zero-inflated model").

---

## Poisson probability

PMF: $\displaystyle f(k;\lambda) = \text{Pr}(X=k) = \frac{\lambda^{k}e^{-\lambda}}{k!}$ for $k > 0$

where

-   k is the number of occurrences $\displaystyle k=0,1,2\dots$
-   e is [Euler's number](https://en.wikipedia.org/wiki/E_(mathematical_constant) "E (mathematical constant)") $\displaystyle e=2.71828...$
-   ! is the [factorial](https://en.wikipedia.org/wiki/Factorial "Factorial") function.

The positive [real number](https://en.wikipedia.org/wiki/Real_number "Real number") λ is equal to the [expected value](https://en.wikipedia.org/wiki/Expected_value "Expected value") of X and also to its [variance](https://en.wikipedia.org/wiki/Variance "Variance").

$\displaystyle \lambda =\operatorname {E} (X)=\operatorname {Var} (X)$

---

### Example 1

Suppose that astronomers estimate that large meteorites (above a certain size) hit the earth on average once every $100$ years ($\lambda = \text{1 event per 100 years}$), and that the number of meteorite hits follows a Poisson distribution. What is the probability of $k = 0$ meteorite hits in the next $100$ years?

In [17]:
k, mu = 0, 1
percision = 2
prob = poisson.pmf(k, mu)
display(Latex(f"$P(X = {k}) = {round(prob, percision)}$"))

<IPython.core.display.Latex object>

---

## Cumulative poisson probability

---

### Example 1

> [Lecture 5: The Poisson distribution](https://www.stats.ox.ac.uk/~filippi/Teaching/psychology_humanscience_2015/lecture5.pdf)

Births in a hospital occur randomly at an average rate of $1.8$ births per hour. What is the probability of observing 4 births in a given hour at the hospital? 

Let $X = \text{No. of births in a given hour}$

In [18]:
k, mu = 4, 1.8
percision = 2
prob = poisson.pmf(k, mu)
display(Latex(f"$P(X = {k}) = {round(prob, percision)}$"))

<IPython.core.display.Latex object>

What about the probability of observing more than or equal to $2$ births in a given hour at the hospital?

In [19]:
k, mu = 2-1, 1.8 # notice the condition is more than or equal to 2
prob = 1 - poisson.cdf(k, mu)
display(Latex(f"$P(X > {k}) = {round(prob, percision)}$"))

<IPython.core.display.Latex object>