# Chapter 2 - Probability

*Probability* is how strongly we believe an event will happen, often expressed as a
percentage. **Remember** that probability is about quantifying predictions of events yet to happen, whereas
likelihood is measuring the frequency of events that already occurred.

## Joint Probabilities

Think of a joint probability as an AND operator. 

Rather than generate all possible combinations and counting the ones of interest to
us, we can again use the multiplication as a shortcut to find the joint probability. This
is known as the `product rule`

P(A AND B) = P(A) X P(B)

## Union Probabilities

When we deal with OR operations with probabilities, this is known as a union
probability.

P(A or B) = P(A) + P(B) - P(A and B) = P(A) + P(B) - (P(A) X P(B))

## Conditional Probability and Bayes’ Theorem

Bayes’ Theorem = P (A|B) = (P(B|A) * P(A)) / P(B)

Example using Bayes’ Theorem in Python

In [2]:
p_coffee_drinker = .65
p_cancer = .005
p_coffee_drinker_given_cancer = .85
p_cancer_given_coffee_drinker = (p_coffee_drinker_given_cancer * p_cancer) / p_coffee_drinker
# prints 0.006538461538461539
print(p_cancer_given_coffee_drinker)

0.006538461538461539


# Binomial Distribution

Measures how likely k successes can happen out of n trials given p probability.

### Example Using SciPy for the binomial distribution

In [3]:
from scipy.stats import binom

In [4]:
n = 10
p = .9

for k in range(n+1):
    probabilty = binom.pmf(k,n,p)
    print(f"k: {k}, probability: {probabilty}")


k: 0, probability: 9.999999999999977e-11
k: 1, probability: 8.999999999999978e-09
k: 2, probability: 3.6449999999999943e-07
k: 3, probability: 8.747999999999991e-06
k: 4, probability: 0.00013778099999999974
k: 5, probability: 0.0014880347999999982
k: 6, probability: 0.011160260999999996
k: 7, probability: 0.05739562799999997
k: 8, probability: 0.1937102444999998
k: 9, probability: 0.38742048899999976
k: 10, probability: 0.3486784401000001


## Beta Distribution

The beta distribution is a type of probability distribution, which means the area under
the entire curve is 1.0, or 100%. To find a probability, we need to find the area within
a range. For example, if we want to evaluate the probability 8/10 successes would
yield 90% or higher success rate, we need to find the area between 0.9 and 1.0

We can use SciPy to implement the beta dis‐
tribution. Every continuous probability distribution has a cumulative density function
(CDF), which calculates the area up to a given x-value. 

In [5]:
from scipy.stats import beta

In [6]:
# Calculating the area up to 90% (0.0 to 0.90)
a = 8
b = 2

p = beta.cdf(.9, a, b)

In [7]:
print(p)

0.7748409780000002


So according to our calculation, there is a 77.48% chance the underlying probability
of success is 90% or less.

How do we calculate the probability of success being 90% or more

In [8]:
a = 8
b = 2
p = 1 - beta.cdf(.9, a, b)

In [9]:
print(p)

0.22515902199999982


This means that out of 8/10 successful engine tests, there is only a 22.5% chance the
underlying success rate is 90% or greater. 

### A beta distribution with more trials

In [10]:
a = 30
b = 6

p = 1 - beta.cdf(.9, a, b)
print(p)

0.13163577484183697


Our probability of meeting our 90% success rate minimum has decreased, going from 22.5% to 13.16%

### What if I want to find the probability my underlying rate of success is between 80% and 90%

In [11]:
a = 8
b = 2

p = beta.cdf(.9, a, b) - beta.cdf(.8, a, b)
print(p)

0.33863336199999994


The beta distribution is a fascinating tool to measure the probability of an event
occurring versus not occurring, based on a limited set of observations. It allows us to
reason about probabilities of probabilities, and we can update it as we get new data.

## Excercises

1. There is a 30% chance of rain today, and a 40% chance your umbrella order will arrive on time. You are eager to walk in the rain today and cannot do so without either! What is the probability it will rain AND your umbrella will arrive?

2. There is a 30% chance of rain today, and a 40% chance your umbrella order will arrive on time. You will be able to run errands only if it does not rain or your umbrella arrives. What is the probability it will not rain OR your umbrella arrives?

3. There is a 30% chance of rain today, and a 40% chance your umbrella order will arrive on time. However, you found out if it rains there is only a 20% chance your umbrella will arrive on time. What is the probability it will rain AND your umbrella will arrive on time?

4. You have 137 passengers booked on a flight from Las Vegas to Dallas. However, it is Las Vegas on a Sunday morning and you estimate each passenger is 40% likely to not show up. You are trying to figure out how many seats to overbook so the plane does not fly empty. How likely is it at least 50 passengers will not show up?

5. You flipped a coin 19 times and got heads 15 times and tails 4 times. Do you think this coin has any good probability of being fair? Why or why not?

In [17]:
# 1.
chance_of_rain = .3
chance_umbrella_arrives_on_time = .4

chance_both = chance_of_rain * chance_umbrella_arrives_on_time
print(chance_both)  # joint probability probability(A and B) = P(A) * P(B)

# 2.
chance_of_rain = .3
chance_not_rain = 1 - .3
chance_umbrella_arrives_on_time = .4

chance_not_rain_or_umbrella = chance_not_rain + chance_umbrella_arrives_on_time - (chance_not_rain * chance_umbrella_arrives_on_time)
print(round(chance_not_rain_or_umbrella, 2))  # union probability P(A or B) = P(A) + P(B) - P(A and B)

# 3.
chance_of_rain = .3 # P(A)
chance_umbrella_arrives_on_time = .4 # P(B)
chance_umbrella_arrives_on_time_given_rain = .2 # P(B|A)
chance_rain_and_umbrella_on_time = chance_of_rain * chance_umbrella_arrives_on_time_given_rain # P(A and B)

print(chance_rain_and_umbrella_on_time)  # joint conditional probability P(A and B) = P(A) * P(B|A)

# 4.
num_passengers = 137
prob_no_show = .4
# at least 50 no show (aka successes), num passengers (aka trials), probability of no show
prob_at_least_fifty_no_show = sum(binom.pmf(k, num_passengers, prob_no_show) for k in range(50, num_passengers))
print(prob_at_least_fifty_no_show) # Binomial Distribution 

# 5.
coin_flips = 19
num_heads = 15
num_tails = 4

p = beta.cdf(.5, num_heads, num_tails)
print(f"I do not think this coin has a good probability ({p}) of being fair (just like master duel >:c)")


0.12
0.82
0.06
0.8220955881474251
I do not think this coin has a good probability (0.0037689208984375) of being fair (just like master duel >:c)


## Extra Credit

There is a 25% chance of snow today, and a 35% chance your package will arrive on time. However, if it snows, there is only a 15% chance your package will arrive on time.

What is the probability it will snow AND your package will arrive on time?

In [16]:
p_snow = .25 # P(A)
p_package_on_time = .35 # P(B)
p_snow_and_package = .15 # P(B | A)

p_snow_and_package_on_time = p_snow * p_snow_and_package # P(A and B)
print(p_snow_and_package_on_time)