# Probability Distribution

“Probability theory is nothing more than common sense reduced to calculation. -1819”
― Pierre Simon Laplace

In [None]:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from scipy import stats

### Lesson Objectives:
- Understand and recognize these distributions
- Understand parameters we need to generate these distribution
- Given a distribution, calculate probabilities for certain value of the random variable

<hr style="border:2px solid gray">

### Probabiity Distributions: Mathematical functions that we can use to model real-world processes

- Random Variable: The unknown
- A probability distribution is a list of all of the possible outcomes of a random variable along with their corresponding probability values.
    - Defines the probability of every outcome in the sample space (in our experiment)
    - Sample space: it is set of all outcomes. For dice, we have 6 outcomes.
    - Sum of all probabilties in your sample space = 1 (Seems trivial, but incredibly powerful)

<b>Discrete Distribution Examples</b>:

- Number of customer complaints
- Number of calls received in a call-center per hour
- Number of food trucks at Travis Park in a day

<b>Continuous Distribution Examples</b>:

- Height
- Temperature
- Employee salaries

<hr style="border:2px solid black">

## Types of Distributions:
1. Uniform distribution
2. Normal distribution
3. Binomial distribution
4. Poisson distrbution

#### scipy distribution object: What can we calculate from distribution?
- value -> probability
    - pmf: probability at a particular value of random variable (only for discrete distributions!)
    - pdf: probability at a particular value of random variable (for continuous distributions)
    - cdf: cumulative probablity for less than or equal to value of random variable
    - sf: probability for a random variable greater than certain value
- probability -> value
    - ppf: less than or equal to a point
    - isf: greater than a point
- rvs for random values

<hr style="border:1.5px solid gray">

### 1. Uniform Distribution
- Examples:
    - rolling a dice
    - flipping a coin
    - lucky draw


#### Working in Scipy stats module

In [None]:
# create a scipy object for underlying distribution

# die roll is being assigned the stats object tht will pick a random
# interger between 1 and 6
die_roll = stats.randint(1,7)

In [None]:
# die_roll is just a stats object
die_roll

In [None]:
# if we want to run simulations, we can utilize rvs for random values
# random values will be called with a single
# a single positional argument of 10 will represent ten die rolls
die_roll.rvs(10)

In [None]:
 1/6 + 1/6 + 1/6 + 1/6 + 1/6 + 1/6 == 1

In [None]:
1/6

In [None]:
# the actual object
die_roll

In [None]:
# roll random values 10,000 times, see how often it hits 3
(die_roll.rvs(10_000) == 3).mean()

In [None]:
# What is probability of rolling 3?
# theoretically: pmf
# I want to get the probability for:
# the specific roll value of 3
# i know this is a discrete distribution,
# because i can only get the values 1, 2, 3, 4, 5 or 6
die_roll.pmf(3)

In [None]:
# let's introduce directionality
# What is probability of rolling 3 or less?
die_roll.cdf(3)

In [None]:
(die_roll.rvs(10_000_000) < 4).mean()

In [None]:
# ppf: just inverse of cdf. 
# Given a probability calculate value of 
# random variable

# what value or lower can I get in the lower 50%
die_roll.ppf(.5)

In [None]:
# What is the likelihood we 
# roll a value higher than 4?

#  we're testing for prob of getting higher ======>  (this means survival function!!)
die_roll.sf(4)

In [None]:
# What is the likelihood we 
# roll a value 4 or greater?

#  we're testing for prob of getting higher ======>  (this means survival function!!)
die_roll.sf(3)

In [None]:
1/6 + 1/6

In [None]:
# There is a 1/3 chance a dice 
# roll will be higher than what value
# greater than direction ==>, survival oriented
# but we have a prob instead and want a value
# inverse survival function to the rescue!
die_roll.isf(.333333333333333333333334)

<hr style="border:.5px solid black">

### 2. Normal Distribution
- Bell shaped
- Most observations are closer to the mean
- Common in nature 
- 2 parameters
    - mean (μ)
    - std dev (σ)
- Examples
    - Height
    - time a flight takes from point A to B
    - manufacturing

Suppose that a store's daily sales are normally distributed with a mean of 12,000 dollars and standard deviation of 2000 dollars.
- What is the probability that sales are 10,000 dollars on a certain day.
- What is the probability that sales are 10,000 dollars or less on a certain day.
- What is the probability that sales are greater than 15,000 dollars on a certain day.
- How much would the daily sales have to be to be in the top 10% of all days?

<hr style="border:.5px solid black">

### 3. Binomial Distribution

<hr style="border:.5px solid black">

### 4. Poisson Distribution

<hr style="border:1.5px solid black">

#### Helpful Links:
[Probability distributions](https://en.wikipedia.org/wiki/List_of_probability_distributions) 
<br>
[More Probability distributions](https://www.kdnuggets.com/2020/02/probability-distributions-data-science.html)