In [4]:
# Run the cell by clicking 'shift' + 'enter'

from scipy import stats
from functions.interact import *
from functions.utils import *

### Table of Contents:
- Discrete Probability Distribution
- Interacting with Distribution Parameters
    - Bernoulli
    - Binomial
    - Poisson
    - Geometric
- Conclusion

# Discrete Probability Distribution

In statistics, discrete probability distribution refers to a distribution where a random variable can take on distinct values (eg. the number of heads in 20 coin flips). At their core, probability distributions are just functions similar to the ones you've seen in calculus:

$$ f(x) = x^2 $$

These functions, however, are defined by their parameters: the mean and variance

When describing a random variable in terms of its distribution, we usually specify what kind of distribution it follows, its mean and standard deviation. Take for a example the normal distribution below:

$$ f(\mu, \sigma) = \mathcal{N}(\mu, \sigma) $$

Using the information above, we would specify that our variable follows a Normal Distribution with mean µ and standard Deviation (SD) σ.

## Understanding Probability Mass Function (PMF) 

Before we dive deep into distributions below, it is important to fully understand the underlying mechanics behind them. When dealing with discrete probability, we will be using probability mass function (pmf). Probability Mass Function is defined as a function that gives the probability that a discrete random variable is exactly equal to some distrinct value. 

You may be wondering how this differs from Probability Density Function (PDF). One might say that unlike PMF which deals with discrete random variable, Probability Density Function deals with continuous random variable. To quote a response from [Stack Exchange](https://math.stackexchange.com/questions/23293/probability-density-function-vs-probability-mass-function#comment50446_23294):

*Think of the discrete distribution as having a mass at each point, where the probability of that point is how much of the total mass is there. Then the continuous case is linear density, where the mass is spread over an interval.*

# Interacting with Distribution Parameters

## Bernoulli

A Bernoulli function represents a Bernoulli distribution, which is a discrete probability distribution for a single trial experiment (eg. one flip of a coin and the probability of getting a head) with only two possible outcomes:
- Success (1) with probability  p 
- Failure (0) with probability  1 - p                                                                                        

The probability mass function for bernoulli is:
                             
\begin{cases} 
    1 - p, & \text{if } k = 0 \\
    p, & \text{if } k = 1
\end{cases}
    
$\text{for } k \in \{0,1\}, \quad 0 \leq p \leq 1.$

Note: The Bernoulli function only takes a single parameter p which represents the probability of success. Below, think of Bernoulli(p) as P(Success).

In [5]:
bernoulli() # bernoulli(p)

interactive(children=(FloatSlider(value=0.5, description='p', max=1.0, step=0.05), IntSlider(value=1, descript…

## Binomial

Now say your friend decides to flip a coin 20 times because they want to know have many heads they'll get, this sequence of outcomes is known as the Bernoulli process. To obtain the number of successes (# of heads in 20 flips), we will be looking at the Binomial Distribution. The Binomial function takes two parameters: 
- n - number of trials
- p - probability of success in each trial

The probability mass function for binomial is:

$$ 
f(k, n, p) = P(X = k) = \binom{n}{k} p^k (1 - p)^{n - k}
$$

where 
$$
\binom{n}{k} = \frac{n!}{k!(n-k)!}
$$

is a binomial coefficent which counts the number of ways to choose the positions of the k successes among the n trials. Note: k is not a parameter but rather it represents a random variable for the number of success observed in n trials (below, it is X=x).

In [6]:
binomial() # binomial(n, p)

interactive(children=(IntSlider(value=8, description='n', max=20, min=1), FloatSlider(value=0.1666666666666666…

## Poisson

You work at a busy hospital emergency room where ambulances arrive at an average rate of 5 per hour. Two things you notice are that each ambulance's arrival does not have an effect on when the next one will arive (independent) and that their intervals are not evenly spaced or predictable (random). A Poisson distribution can model the probability of the hospital receiving exactly 7 ambulances in an hour. It can also predict the likelihood of having 2 or fewer arrivals in a given timeframe in order to help staff prepare and come up with effective strategies for delagating shifts.

As suggested in the example above, under the Poisson distribution, we can find out the probability of a given number of events occuring in a fixed interval of time. 

It takes only one parameter, lambda (λ), which is the average number of events occuring in an interval of time or space. In our hospital scenario, λ = 5. The probability mass function is given by:

$$
f(k; \lambda) = \Pr(X = k) = \frac{\lambda^k e^{-\lambda}}{k!}
$$

where k is the number of occurence, e is [Euler's Number](https://en.wikipedia.org/wiki/E_(mathematical_constant)), and k! is the factorial.

In [7]:
poisson()

interactive(children=(IntSlider(value=3, description='λ', max=40, min=1), IntSlider(value=2, description='k', …

## Geometric

Now, imagine you work as a customer service specialist and your team receives thousands of calls a day. Your manager shared the team's performance data: the probability of an agent solving a single customer's refund issue on the first try is 44%. You know that not every call can be successfully resolved because some require multiple interactions. 

In the scenario above, a geometric distribution can help you model the probability of requiring exactly 'k' calls before the first success. In other words, each call (event) is a Bernoulli trial for a single customer. Some questions you might ask are:
- What is the probability that your coworker Elizabeth will resolve an issue on the 3rd attempt?
- Given p = 0.50, how many attempts would it likely take to solve a customer's issue at a 30% probability.

By using the Geometric Probability Mass Function (PMF), we can predict how often issues require multiple calls in order to reduce resolution time for certain cases.  

$$
P(X = k) = (1 - p)^{k-1} p, \quad k = 1, 2, 3, \dots
$$

Where P(X=k) is the probability that the first success occurs on the k-th trial.

### Memorylessness Property ###

The probability of waiting n more trials for success does not depend on how many failures have already occured as each trial has to be independent. 

And lastly, each call attempt (trial) is for a single customer. If we're tracking multiple different customers calling about the same issue, then each call would be a new and independent event. Geometric distribution would not apply here.

The Geometric Distribution has one parameter p, which is the probability of success in a single trial.

In [8]:
geometric()

interactive(children=(FloatSlider(value=0.5, description='p', max=1.0, min=0.01, step=0.02), IntSlider(value=3…

# Conclusion

In this notebook, you learned various scenarious in which we can employ the Bernoulli, Binomial, Poisson, or Geometric Distribution. There are several others you can utilize depending on what you are trying to model or predict. The probability mass function allows us to obtain the probability that a discrete random variable is exactly equal to some value (eg. using Poisson to check the probability that a restaurant gets 12 customers in the next hour). You also iteracted with parameters within each function to see how the distribution behaves. 

Congratulations on making it to the end! 