# Bayes Theorem
DA Probability & Statistics • Lesson 3

- NJ's Part -- Shreyas, you Can copy cell by cell 

In [1]:
# Import dependencies 
from custom.db_utils import get_connection
import pandas as pd

# Object typing
from typing import TypeVar
PandasSeries = TypeVar('pd.core.series.Series')
PandasDataFrame = TypeVar('pd.core.frame.DataFrame')

# Data viz
import matplotlib.pyplot as plt
from matplotlib.ticker import PercentFormatter
import seaborn as sns
import numpy as np

In [2]:
# Get the database connection and cursor objects
conn, cur = get_connection()

# Use a context manager to open and close connection and files
with conn:
    
    # Open the query.sql file
    with open('query.sql', 'r') as q:

        # Save contents of query.sql as string
        query_str = q.read()
    
    # Use the read_sql method to get the data from Snowflake into a 
    # Pandas dataframe
    df = pd.read_sql(query_str, conn)
    
    # Make all the columns lowercase
    df.columns = map(str.lower, df.columns)

# Preview the data
df.sample(3)

Unnamed: 0,shipment_id,accepted_quote_at,tradelane,mode
78654,787169,2020-05-12 00:00:00+00:00,TPEB,Ocean
114341,483506,2019-01-23 00:00:00+00:00,TPEB,Ocean
49296,613201,2019-09-05 00:00:00+00:00,TPEB,Ocean


In [3]:
# Isolate data to be used
tradelane_mode_df = df[['tradelane', 'mode']]
tradelane_mode_df

# Call Crosstab function from last time to get sums of tradelane and mode pairs.
tradelane_mode_xt = pd.crosstab(index=tradelane_mode_df['tradelane'], 
                               columns=tradelane_mode_df['mode'])

# Binary Classification with Bayes

Let's introduce this with an example (motivated from a lesson at UC Berkeley):

Let's Say we know that: 
- 60% of shipments are Ocean and the remaining 40% are Ocean
- 50% of Ocean Shipments are on TPEB
- 80% of Air Shipments are on TPEB


Now suppose I pick a shipment at random. Can you classify the shipment as Air or Ocean? We can do this by predicting which is more likely to happen. 



<b> You probably guessed ocean ... </b>

The shipment is picked at random and so you know that the chance that the shipment is Ocean is 60%. That's greater than the 40% chance of being an Air shipment, so you would classify the shipment as Ocean.

The information about the tradelane is irrelevant, as we already know the proportions of mode. 

We have a pretty simple classifier! 

But now suppose I give you some additional information about the shipment that was picked:

<b>The Shipment was on TPEB. </b>

Would this knowledge change your classification?

<b>Updating the Prediction Based on New Information </b>

Now that we know the shipment is on TPEB, it becomes important to look at the relation between shipment and mode. It's still true that more shipments are ocean than air. But it's also true that among the ocean shipments, a much higher percent are on TPEB. Our classification has to take both of these observations into account.

To visualize this, we will use a table that consists of one row for each of 100 shipments whose mode and tradelane have the same proportions as given in the data.


In [4]:
mode = np.array(['Ocean']*60 + ['Air']*40)
tradelane = np.array(['Not TPEB']*30+['TPEB']*30+['Not TPEB']*8+['TPEB']*32)
df = {'Mode':mode,'Tradelane':tradelane}
df = pd.DataFrame(df, columns=['Mode','Tradelane'])

df.head()

Unnamed: 0,Mode,Tradelane
0,Ocean,Not TPEB
1,Ocean,Not TPEB
2,Ocean,Not TPEB
3,Ocean,Not TPEB
4,Ocean,Not TPEB


In [5]:
pd.crosstab(index=df['Tradelane'], 
                               columns=df['Mode'])

Mode,Air,Ocean
Tradelane,Unnamed: 1_level_1,Unnamed: 2_level_1
Not TPEB,8,30
TPEB,32,30


The total count is 100 shipments, of which 60 are Ocean and 40 are Air. Among the Ocean, 50% are in each of the tradelane choices. Among the 40 Air Shipments, 20% are not on TPEB and 80% are. 

Coming back to the example, we have to pick which row the shipment is most likely to be in. When we knew nothing more about the shipment, and therefore were more likely to be in the second column (Ocean) because that contains more shipments.

But now we know that the student is on TPEB, so the space of possible outcomes has decreased: now the shipment can only be in one of the two TPEB cells. 

There are 62 shipments in those cells, and 32 out of the 62 are Air. That's more than half, even though not by much. 

So, in the light of the new information about the tradelane, <b> we have to update our prediction and now classify the shipment as Air. </b>


The method that we have just used above is due to the Reverend [Thomas Bayes](https://en.wikipedia.org/wiki/Thomas_Bayes) (1701-1761). His method solved what was called an "inverse probability" problem: given new data, how can you update chances you had found earlier? Though Bayes lived three centuries ago, his method is widely used now in machine learning.

## Conditional Probability!  

Let's mathematically build up the intuition behind Bayes Theorem.

- Lets say Event A is Shipment is on Air. 
- Lets say Event B is Shipment being on TPEB. 

<b>From last time we know: </b>

The probability of two events A and B happening, $P(A \cap B)$ , is the probability of $A$, $P(A)$, times the probability of B given that A has occurred, $P(B \mid A)$. 

$P(A \cap B)$ = $P(A)P(B \mid A)$

On the other hand, the probability of A and B is also equal to the probability
of B times the probability of A given B.

$P(A \cap B)$ = $P(B)P(A \mid B)$

Equating the two yields:

$P(B)P(A \mid B)$ = $P(A)P(B \mid A)$

and thus

$P(A \mid B) = \frac{P(A) P(B \mid A)} {P(B)}$

## The Law of Total Probability 📜

Now we need connect conditional and unconditional probabilities. We do this with **the Law of Total Probability** (LOTP). 


You'll also have the tools to deal with conditioning on multiple events/pieces of information since the concepts translate generally.



**The Law of Total Probability** is an incredibly useful problem solving tool. Formally stated, it says:

$$
\text{If }A_i,...,A_n \text{ is a partition of the sample space }S \text{, then }P(B) = \sum_{i=1}^{n}{P(B|A_i)P(A_i)}.
$$

But this is likely better illustrated with a picture:

![Partition of B by A](./LOTP.png)

Okay, your turn to practice!

**Question**: 

> What's $P(\text{TPEB})$. 

Partition the data and use LOTP so you can calculate it. Check against the data directly.

In [12]:
## TODO: Demonstrate LOTP on our data; start with tradelane_mode_xt
df = pd.read_sql(query_str, conn)

# This is the denominator to convert cardinality of sets to probabilities
# (per the Naive Definition of Probability)
S = tradelane_mode_xt.sum().sum()

# Show that p_TPEB_by_LTOP == p_TPEB
p_TPEB = tradelane_mode_xt.loc['TPEB',:].sum()/S

p_Air = tradelane_mode_xt.loc[:,'Air'].sum()/S

p_not_Air = 1 - p_Air

p_TPEB_given_Air = 

p_TPEB_given_not_Air = 

p_TPEB_by_LOTP = 

# # Check if our answer is right   
# print(f"Our Answer: {p_TPEB_by_LOTP:.5%}")
# print(f"Expected Answer: {p_TPEB:.5%}")

SyntaxError: invalid syntax (<ipython-input-12-ac30134b2c1b>, line 15)

# Appendix Below

In [13]:
#maybe calculate P(air|tpeb) and p(ocean|tpeb) to drive home example

In [14]:
# calculate the probability using Bayes Theorem
# Handy Function we can use!!

# calculate P(A|B) given P(A), P(B|A), P(B|not A)
def bayes_theorem(p_a, p_b_given_a, p_b_given_not_a):
    # calculate P(not A)
    not_a = 1 - p_a
    # calculate P(B)
    p_b = p_b_given_a * p_a + p_b_given_not_a * not_a
    # calculate P(A|B)
    p_a_given_b = (p_b_given_a * p_a) / p_b
    return p_a_given_b
 
# P(A)
p_a = 0.0002
# P(B|A)
p_b_given_a = 0.85
# P(B|not A)
p_b_given_not_a = 0.05
# calculate P(A|B)
result = bayes_theorem(p_a, p_b_given_a, p_b_given_not_a)
# summarize
print('P(A|B) = %.3f%%' % (result * 100))

P(A|B) = 0.339%


### Bayes' Rule of the General Case ###
In general, if the entire outcome space can be partitioned into events $A_1, A_2 \ldots , A_n$, and $B$ is an event of positive probability, then for each $i$,

$$
\begin{align*}
P(A_i \mid B) &= \frac{P(A_iB)}{P(B)} ~~~~ \text{(division rule)} \\ \\
&= \frac{P(A_iB)}{\sum_{j=1}^n P(A_j B)} ~~~~ \text{(the }A_j\text{'s partition the whole space)} \\ \\
&= \frac{P(A_i)P(B \mid A_i)}{\sum_{j=1}^n P(A_j)P(B \mid A_j)} ~~~~
\text{(multiplication rule)}
\end{align*}
$$

This calculation is an application of the division rule in a setting where the events $A_1, A_2, \ldots , A_n$ can be thought of as the results of an "earlier" stage of an experiment and $B$ the result of a "later" stage. The calculation allows us to find "backwards in time" conditional chances of an earlier event given a later one, by writing the chance in terms of the "forwards in time" conditional chances of the later event given the earlier ones.