# Bayes Theorem
DA Probability & Statistics • Lesson 3

## Topics to Cover
- Motivating question - Together
- Bayes intuition for the 2 event case - NJ
- Updating priors - Shreyas
    - Loop through various priors to demonstrate how the posterior probabilities will update
    - What is the consequence of updating priors?
    - Controversy around priors being subjective: [Prosecutor's Fallacy](https://towardsdatascience.com/the-prosecutors-fallacy-cb0da4e9c039)
- Add a partitioning event and have the students calculate the posterior based on that - Shreyas
- Bayes proof for the generalized case - NJ
- Generalized case example - Together
    - 

## Motivating Question 🤔
> Tell me something that I thought I could answer using what Ravi taught last lesson but in reality is hard to do without Bayes Theorem

- That should explian why calculating probabilities in one direction is harder than calculating it the other way
- **OR** position it as hypothesis and evidence where only one direction is what is interesting to look at

$$
\begin{align}
P(\text{at least one shipment is air}) &= \frac{3}{4}\\
P(\text{the second shipment is air}) &= \frac{1}{2}
\end{align}
$$

- Posterior Probability: 
- Conditional Probability: 
- Prior Probability: 
- Marginal Probability: 

## The Law of Total Probability 📜

Now we need connect conditional and unconditional probabilities. We do this with **the Law of Total Probability** (LOTP). 

<br>

Once we understand this, we will have all the required tools to prove the all-important identity: **Bayes' Theorem**, which we'll leave for next time. 

<br>

You'll also have the tools to deal with conditioning on multiple events/pieces of information since the concepts translate generally.



**The Law of Total Probability** is an incredibly useful problem solving tool. Formally stated, it says:

$$
\text{If }A_i,...,A_n \text{ is a partition of the sample space }S \text{, then }P(B) = \sum_{i=1}^{n}{P(B|A_i)P(A_i)}.
$$

But this is likely better illustrated with a picture:

![Partition of B by A](./LOTP.png)

Okay, your turn to practice!

**Question**: 

> What's $P(\text{TPEB})$. 

Partition the data and use LOTP so you can calculate it. Check against the data directly.

In [9]:
## TODO: Demonstrate LOTP on our data; start with tradelane_mode_xt

# This is the denominator to convert cardinality of sets to probabilities
# (per the Naive Definition of Probability)
S = tradelane_mode_xt.sum().sum()

# Show that p_TPEB_by_LTOP == p_TPEB
p_TPEB = tradelane_mode_xt.loc['TPEB',:].sum()/S

p_Air = 

p_not_Air = 1 - p_Air

p_TPEB_given_Air = 

p_TPEB_given_not_Air = 


p_TPEB_by_LOTP = 

# Check if our answer is right   
print(f"Our Answer: {p_TPEB_by_LOTP:.5%}")
print(f"Expected Answer: {p_TPEB:.5%}")

SyntaxError: invalid syntax (<ipython-input-9-f57864d604cf>, line 10)

## Conditional Probability!  

Let's build up the intuition behind Bayes Theorem.

From last time we know:

The probability of two events A and B happening, $P(A \cap B)$ , is the probability of $A$, $P(A)$, times the probability of B given that A has occurred, $P(B \mid A)$. 

$P(A \cap B)$ = $P(A)P(B \mid A)$

On the other hand, the probability of A and B is also equal to the probability
of B times the probability of A given B.

$P(A \cap B)$ = $P(B)P(A \mid B)$

Equating the two yields:

$P(B)P(A \mid B)$ = $P(A)P(B \mid A)$

and thus

$P(A \mid B) = \frac{P(A) P(B \mid A)} {P(B)}$

The method that we have just proved above is due to the Reverend [Thomas Bayes](https://en.wikipedia.org/wiki/Thomas_Bayes) (1701-1761). His method solved what was called an "inverse probability" problem: given new data, how can you update chances you had found earlier? Though Bayes lived three centuries ago, his method is widely used now in machine learning.

In [10]:
# calculate the probability using Bayes Theorem
# Handy Function we can use!!

# calculate P(A|B) given P(A), P(B|A), P(B|not A)
def bayes_theorem(p_a, p_b_given_a, p_b_given_not_a):
    # calculate P(not A)
    not_a = 1 - p_a
    # calculate P(B)
    p_b = p_b_given_a * p_a + p_b_given_not_a * not_a
    # calculate P(A|B)
    p_a_given_b = (p_b_given_a * p_a) / p_b
    return p_a_given_b
 
# P(A)
p_a = 0.0002
# P(B|A)
p_b_given_a = 0.85
# P(B|not A)
p_b_given_not_a = 0.05
# calculate P(A|B)
result = bayes_theorem(p_a, p_b_given_a, p_b_given_not_a)
# summarize
print('P(A|B) = %.3f%%' % (result * 100))

P(A|B) = 0.339%


### Bayes' Rule of the General Case ###
In general, if the entire outcome space can be partitioned into events $A_1, A_2 \ldots , A_n$, and $B$ is an event of positive probability, then for each $i$,

$$
\begin{align*}
P(A_i \mid B) &= \frac{P(A_iB)}{P(B)} ~~~~ \text{(division rule)} \\ \\
&= \frac{P(A_iB)}{\sum_{j=1}^n P(A_j B)} ~~~~ \text{(the }A_j\text{'s partition the whole space)} \\ \\
&= \frac{P(A_i)P(B \mid A_i)}{\sum_{j=1}^n P(A_j)P(B \mid A_j)} ~~~~
\text{(multiplication rule)}
\end{align*}
$$

This calculation is an application of the division rule in a setting where the events $A_1, A_2, \ldots , A_n$ can be thought of as the results of an "earlier" stage of an experiment and $B$ the result of a "later" stage. The calculation allows us to find "backwards in time" conditional chances of an earlier event given a later one, by writing the chance in terms of the "forwards in time" conditional chances of the later event given the earlier ones.