# Bayes Theorem
DA Probability & Statistics Learning Series • Lesson 3

<!-- <img src="https://imgs.xkcd.com/comics/conditionals.png" align="center"/> -->

Welcome, again!

*See **#da_prob_stat** for discussion during and after this tutorial.*

## Goals
- 
- 
- 

In [81]:
# Import dependencies 
import sys
sys.path.append('../custom/')
sys.path.append('../Lesson_2/')
from db_utils import get_connection, get_data
import pandas as pd

# Object typing
from typing import TypeVar
PandasSeries = TypeVar('pd.core.series.Series')
PandasDataFrame = TypeVar('pd.core.frame.DataFrame')

# Data viz
import matplotlib.pyplot as plt
from matplotlib.ticker import PercentFormatter
import seaborn as sns
import numpy as np

## Motivating Context 🤔

Between 1996-98, Sally Clark, a British woman, lost her two infant children under similarly mysterious circumstances. The defence argued that the children had died of *SIDS* - sudden infant death syndrome. However, the prosecution claimed that the probability of children from an affluent family, like Clark's, suffering from SIDS was 1 in 73 million. They used this to *prove* that she *must* be guilty of killing her children; in 1999, Clark was, in fact, convicted of murdering them.

**TODO:** How would you structure this as events?

> Sally Clark is guilty of killing her two children **because** it is *so* unlikely for them to have died mysteriously if she were innocent.

Sounds like a pretty reasonable statement right? It's inaccurate though!

**TODO:** What do you think makes this case funky?

The prosecution argued the following:

$$
\begin{split}
P(\text{innocence | evidence}) & = P(\text{evidence | innocence}) \\
P(\text{I | E}) & = P(\text{E | I})
\end{split}
$$

As we saw above, in plain English, that comparison sounds pretty normal. Mathematically though, it's really not.

From our lesson on conditional probability, we can infer:

$$
\begin{split}
P(I | E) & = \frac{P(I \cap E)}{P(E)} \\
\implies P(I \cap E) & = P(I | E)P(E)
\end{split}
$$

**and the other way too...**

$$
\begin{split}
P(E | I) & = \frac{P(I \cap E)}{P(I)} \\
\implies P(I \cap E) & = P(E | I)P(I)
\end{split}
$$

**thus**

$$
\begin{split}
P(I | E)P(E) & = P(E | I)P(I) \\ \\
\implies P(I | E) & = \frac{P(E | I)P(I)}{P(E)} \\ \\
\implies P(I | E) & \neq P(E | I)
\end{split}
$$

**CONCLUSIONS:**
- Equating a conditional to its inverse is incorrect.
- The prosecution was wrong to convict Sally Clark. It ignored the base rate of the innocence, i.e. the base rate of a mother actually kiling her two infants. That rarely happens.

The Sally Clark case was a statistical blunder. But the mistakes made by the prosecution in this case aren't exclusive to the courtroom.

A more trivial version of the [**Prosecutor's Fallacy**](https://towardsdatascience.com/the-prosecutors-fallacy-cb0da4e9c039) is saying:

>My partner hasn't texted me in 3 hours - they probably don't love me.

  - $P(\text{don't love you | late text}) \neq P(\text{late text | don't love you})$
  - The latter would understandably be very high, but the former is likely because they are either busy or are fed up of sheltering in place with you. Relax a little 😉

<font color='blue'>Bayes gives us a way to correctly relate conditional probabilities to each other</font>

## Intro to Bayes

In [79]:
os.path.exists('query.sql')

False

In [75]:
# Get the database connection and cursor objects
conn, cur = get_connection()

# Use a context manager to open and close connection and files
with conn:
    
    # Open the query.sql file
    with open('query.sql', 'r') as q:

        # Save contents of query.sql as string
        query_str = q.read()
    
    # Use the read_sql method to get the data from Snowflake into a 
    # Pandas dataframe
    df = pd.read_sql(query_str, conn)
    
    # Make all the columns lowercase
    df.columns = map(str.lower, df.columns)

# Preview the data
df.sample(3)

FileNotFoundError: [Errno 2] No such file or directory: 'query.sql'

In [3]:
# Isolate data to be used
tradelane_mode_df = df[['tradelane', 'mode']]
tradelane_mode_df

# Call Crosstab function from last time to get sums of tradelane and mode pairs.
tradelane_mode_xt = pd.crosstab(index=tradelane_mode_df['tradelane'], 
                               columns=tradelane_mode_df['mode'])

# Binary Classification with Bayes

Let's introduce this with an example (motivated from a lesson at UC Berkeley):

Let's Say we know that: 
- 60% of shipments are Ocean and the remaining 40% are Ocean
- 50% of Ocean Shipments are on TPEB
- 80% of Air Shipments are on TPEB


Now suppose I pick a shipment at random. Can you classify the shipment as Air or Ocean? We can do this by predicting which is more likely to happen. 



<b> You probably guessed ocean ... </b>

The shipment is picked at random and so you know that the chance that the shipment is Ocean is 60%. That's greater than the 40% chance of being an Air shipment, so you would classify the shipment as Ocean.

The information about the tradelane is irrelevant, as we already know the proportions of mode. 

We have a pretty simple classifier! 

But now suppose I give you some additional information about the shipment that was picked:

<b>The Shipment was on TPEB. </b>

Would this knowledge change your classification?

<b>Updating the Prediction Based on New Information </b>

Now that we know the shipment is on TPEB, it becomes important to look at the relation between shipment and mode. It's still true that more shipments are ocean than air. But it's also true that among the ocean shipments, a much higher percent are on TPEB. Our classification has to take both of these observations into account.

To visualize this, we will use a table that consists of one row for each of 100 shipments whose mode and tradelane have the same proportions as given in the data.


In [4]:
mode = np.array(['Ocean']*60 + ['Air']*40)
tradelane = np.array(['Not TPEB']*30+['TPEB']*30+['Not TPEB']*8+['TPEB']*32)
df = {'Mode':mode,'Tradelane':tradelane}
df = pd.DataFrame(df, columns=['Mode','Tradelane'])

df.head()

Unnamed: 0,Mode,Tradelane
0,Ocean,Not TPEB
1,Ocean,Not TPEB
2,Ocean,Not TPEB
3,Ocean,Not TPEB
4,Ocean,Not TPEB


In [5]:
pd.crosstab(index=df['Tradelane'], 
                               columns=df['Mode'])

Mode,Air,Ocean
Tradelane,Unnamed: 1_level_1,Unnamed: 2_level_1
Not TPEB,8,30
TPEB,32,30


The total count is 100 shipments, of which 60 are Ocean and 40 are Air. Among the Ocean, 50% are in each of the tradelane choices. Among the 40 Air Shipments, 20% are not on TPEB and 80% are. 

Coming back to the example, we have to pick which row the shipment is most likely to be in. When we knew nothing more about the shipment, and therefore were more likely to be in the second column (Ocean) because that contains more shipments.

But now we know that the student is on TPEB, so the space of possible outcomes has decreased: now the shipment can only be in one of the two TPEB cells. 

There are 62 shipments in those cells, and 32 out of the 62 are Air. That's more than half, even though not by much. 

So, in the light of the new information about the tradelane, <b> we have to update our prediction and now classify the shipment as Air. </b>


The method that we have just used above is due to the Reverend [Thomas Bayes](https://en.wikipedia.org/wiki/Thomas_Bayes) (1701-1761). His method solved what was called an "inverse probability" problem: given new data, how can you update chances you had found earlier? Though Bayes lived three centuries ago, his method is widely used now in machine learning.

## Conditional Probability!  

Let's mathematically build up the intuition behind Bayes Theorem.

- Lets say Event A is Shipment is on Air. 
- Lets say Event B is Shipment being on TPEB. 

<b>From last time we know: </b>

The probability of two events A and B happening, $P(A \cap B)$ , is the probability of $A$, $P(A)$, times the probability of B given that A has occurred, $P(B \mid A)$. 

$$
P(A \cap B)$ = $P(A)P(B \mid A)
$$

On the other hand, the probability of A and B is also equal to the probability
of B times the probability of A given B.

$$
P(A \cap B)$ = $P(B)P(A \mid B)
$$

Equating the two yields:

$$
P(B)P(A \mid B)$ = $P(A)P(B \mid A)
$$

and thus

$$
P(A \mid B) = \frac{P(A) P(B \mid A)} {P(B)}
$$

## The Law of Total Probability 📜

Now we need connect conditional and unconditional probabilities. We do this with **the Law of Total Probability** (LOTP). 


You'll also have the tools to deal with conditioning on multiple events/pieces of information since the concepts translate generally.



**The Law of Total Probability** is an incredibly useful problem solving tool. Formally stated, it says:

$$
\text{If }A_i,...,A_n \text{ is a partition of the sample space }S \text{, then }P(B) = \sum_{i=1}^{n}{P(B|A_i)P(A_i)}.
$$

But this is likely better illustrated with a picture:

![Partition of B by A](./LOTP.png)

Okay, your turn to practice!

**Question**: 

> What's $P(\text{TPEB})$. 

Partition the data and use LOTP so you can calculate it. Check against the data directly.

In [None]:
## TODO: Demonstrate LOTP on our data; start with tradelane_mode_xt
df = pd.read_sql(query_str, conn)

# This is the denominator to convert cardinality of sets to probabilities
# (per the Naive Definition of Probability)
S = tradelane_mode_xt.sum().sum()

# Show that p_TPEB_by_LTOP == p_TPEB
p_TPEB = tradelane_mode_xt.loc['TPEB',:].sum()/S

p_Air = tradelane_mode_xt.loc[:,'Air'].sum()/S

p_not_Air = 1 - p_Air

p_TPEB_given_Air = 

p_TPEB_given_not_Air = 

p_TPEB_by_LOTP = 

# # Check if our answer is right   
# print(f"Our Answer: {p_TPEB_by_LOTP:.5%}")
# print(f"Expected Answer: {p_TPEB:.5%}")

$$
\begin{split}
P(A|B) = \frac{P(B|A)P(A)}{P(B)}
\end{split}
$$

- **Marginal Probability**: Unconditional probability of an event. eg. $P(A)$ and $P(B)$
- **Prior Probability**: A type of marginal probability relevant to Bayes Theorem. It is the probability of an event based on our prior beliefs about that event. Here, $P(A)$ is the prior. The posterior is the updated probability of the prior given new information from event $B$.
- **Posterior Probability**: The conditional probability we are trying to compute - $P(A|B)$
- **Likelihood**: This is the inverse of the conditional probablity we are trying to compute. If we are interested in $P(A|B)$, likelihood refers to $P(B|A)$

## Updating Priors ☝🏽

Bayes Theorem, and conditional probability in general, can be framed in multiple ways:
- $P(A|B)$ = Probability that event $A$ occurs *given* that event $B$ has occured
- $P(A|E)$ = Probability of hypothesis $A$ being true *given* certain evidence $E$
- $P(A|B)$ = Updated probability of event $A$ occurring *given* additional information from event $B$

In each of these s scenarios, we:
1. take an event $A$ that has some probabilty of ocurrring in isolation - $P(A)$
2. we update that probability to some new probability based on new information obtained from another event ocurring $P(A|B)$

We've seen that Bayes Theorem helps us relate conditional probabilities to one another. Now let's look at how it can be leveraged to update our prior beliefs about certain events.

### Ravi Goes to Disneyland 🏰

Ravi is excited that Disneyland has finally opened up and he has taken the day off to go to LA and enjoy Goofy's Sky School after months of waiting. However, for Ravi to be able to enter Disneyland, the management needs to be certain that he doesn't have COVID-19; he needs to be tested.

Nervous about his fate, Ravi goes to the Disneyland clinic and waits for the doctor. When the doctor arrives, she makes an initial assessment of him. She comments that since Ravi looks asymptomatic and generally quite energized, he probably doesn not have COVID. However, he took a flight to LA and that increases the chance that he might have contracted it more recently. She thinks for a few minutes, notes down some stuff, and concludes that Ravi's prior (*pre-test*) probability of having COVID is:

$P(C) = 0.3 = 30\%$

The doctor's test, **Test X**, is 90% reliable. If someone tests positive, we are 95% certain that they have COVID; if they test negative, we are 90% sure that they don't. Both the false positive and false negative rates are 10% respectively.

Ravi is administered the test and his results come out **negative**! If $N_X$ is the event that Ravi tests negative with **Test X**, then

$$
P(C | N_X) = \frac{P(N_X | C)P(C)}{P(N_X)} 
           = \frac{P(N_X | C)P(C)}{P(N_X | C)P(C) + P(N_X | C^c)P(C^c)} 
           = \frac{(0.1)(0.3)}{(0.1)(0.3) + (0.9)(0.7)}
           = 0.045
           = 4.5\%
$$

The posterior probability of Ravi having COVID given a negative test result is 4.5% - a significant decrease from the prior probability of 30% that the doctor assumed when Ravi came in. In this way, we've 'updated our priors'.

Unfortunately, even a test with 90% reliability can't confirm that Ravi does not have COVID for sure. A 4.5% chance is above the **strict threshold of 2%** that Disney is enforcing on its visitors and Ravi doesn't meet that cut. Sorry bud ☹️

In [71]:
# TODO
# What if the doctor felt differently during her initial evaluation of Ravi's health? 
# Or what if Ravi was both corrupt and desperate and offered the doctor $1000 for a 'better' initial evaluation?
# Would that alter the posterior probability?
# Loop through a few values of using the compute_bayes_posterior function to see for yourself!

# calculate P(A|B) when provided P(A), P(B|A), P(B|not A)
def compute_posterior(p_a, p_b_given_a, p_b_given_not_a):
    # calculate P(not A)
    not_a = 1 - p_a
    # calculate P(B)
    p_b = p_b_given_a * p_a + p_b_given_not_a * not_a
    # calculate P(A|B)
    p_a_given_b = (p_b_given_a * p_a) / p_b
    return p_a_given_b

# P(Nx|C) = P(Tests Negative given Has COVID)
p_Nx_given_C = 0.1

# P(Nx|not C) = P(Tests Negative given Does Not Have COVID)
p_Nx_given_not_C = 0.9

print('Question 1\n---')
# Calculate P(C|Nx) for various values of P(C)
for c in [0.01, 0.05, 0.1, 0.4]:
    print('If P(C) = {0}, then P(C | Nx) = {1:.2%}'.format(c, compute_posterior(c, p_Nx_given_C, p_Nx_given_not_C)))

print('\nQuestion 2\n---')
# Use a while loop to figure out the first value of P(c) for which P(C|Nx) is less than 1%
c = 0.3
while c > 0:
    p_C_given_Nx = compute_posterior(c, p_Nx_given_C, p_Nx_given_not_C)
    if p_C_given_Nx < 0.01:
        print(f'P(C | Nx) is less than 1% when P(C) = {c:.2}')
        break
    c -= 0.01
else:
    print('Given the effectiveness of the test, there is no prior probability of COVID that can bring the posterior probability less than 1%')

Question 1
---
If P(C) = 0.01, then P(C | Nx) = 0.11%
If P(C) = 0.05, then P(C | Nx) = 0.58%
If P(C) = 0.1, then P(C | Nx) = 1.22%
If P(C) = 0.4, then P(C | Nx) = 6.90%

Question 2
---
P(C | Nx) is less than 1% when P(C) = 0.08


## Multiple Conditions

Fortunately, bribing the doc isn't necessarily Ravi's only solution to entering Disneyland.

The doctor sympathizes with Ravi and says that she might be able to help him out. She tells Ravi about a second test, **Test Y**, made by a different drug company than that of **Test X**. It wasn't as good as **Test X** with a reliability of just 70%, but it might just be what Ravi needs.

**TODO:** What is the doctor's latest assessment of Ravi's likelihood of having COVID? i.e. what is the new prior $F(C)$? 

The doctor administers **Test Y** and Ravi tests negative again!

In [70]:
# What is the probability that Ravi has COVID after the results of Test Y?

# F(C) = New prior probability that Ravi has COVID-19
f_C = 0.045

# P(Ny|C) = P(Tests Negative given Has COVID)
p_Ny_given_C = 0.3

# P(Ny|not C) = P(Tests Negative given Does Not Have COVID)
p_Ny_given_not_C = 0.7

print('With F(C) = {0:.2%}, P(C | Ny) = {1:.2%}'.format(f_C, bayes_theorem(f_C, p_Ny_given_C, p_Ny_given_not_C)))

With F(C) = 4.50%, P(C | Ny) = 1.98%
