# Bayes' Theorem

## $$P(A\mid B)=\frac {P(B\mid A) \cdot P(A)}{P(B)}$$

## Terminology

- $P(A)$ : 
    - The probability of an event irrespective of the outcomes of other random variables is called the ***marginal probability***.
    - In reference to Bayes' Theorem, this is known as the ***prior probability***.

- $P(A|B)$ :
    - The probability of one (or more) event(s) given the occurence of another event is called the ***conditional probability***.
    - In reference to Bayes' Theorem, this is known as the ***posterior probability***.

- $P(B|A)$ : ***Likelihood***.

- $P(B)$ : ***Evidence***.

This allows us to restate the theorem as

$$
\textrm{Posterior} = \frac{\textrm{Likelihood}\cdot\textrm{Prior}}{\textrm{Evidence}}
$$

The numerator, $P(B\mid A) \cdot P(A)$, is a **joint probability**.


- A joint probability is the probability of two (or more) simultaneous events
    - $P(A,B)$ or $P(A \cap B) = P(A|B)\cdot P(B)$
    - So, in the theorem: $P(B,A)$ or $P(B \cap A) = P(B|A)\cdot P(A)$

### Examples

_____

- What is the probability that there is rain given that there are clouds?



_____

- What is the probability that there is fire given that there is smoke?


_____

- What is the probability that you have cancer given that you tested positive?



_____

Yes, you do just need to remember which piece is which...

<center><img src='https://imgs.xkcd.com/comics/modified_bayes_theorem_2x.png' width=500></center>

[Image Source: XKCD](https://xkcd.com/2059/)

(for the record, $P(C)$ in this example is always very low)

_____


### Bayes' Theorem with...  Legos?

Will Kurt, who writes the [Count Bayesie blog](https://www.countbayesie.com/) and is the author of [_Bayesian Statistics the Fun Way_](https://nostarch.com/learnbayes), uses legos to derive Bayes' Theorem. Let's take a look: https://www.countbayesie.com/blog/2015/2/18/bayes-theorem-with-lego

### How About Bayes' Theorem with Waterfalls?

[This great resource by Arbital](https://arbital.com/p/bayes_rule/?l=1zq) will let you go into all kinds of detail about the intuition behind Bayes' Theorem.

We can skip straight to their one-pager: https://arbital.com/p/bayes_rule/?l=693

## Example: 1984 Congressional Voting Data

Let's do an example. Here's the real theorem again for reference:

## $$P(A\mid B)=\frac {P(B\mid A) \cdot P(A)}{P(B)}$$

Data source: [Congressional Quarterly Almanac, 98th Congress, 2nd session 1984](https://archive.ics.uci.edu/ml/datasets/Congressional+Voting+Records)

A congressman voted no on providing aid to El Salvador. Given that 61% of the congress were Democrats, 74.9% of whom voted 'No' for providing aid to El Salvador, and only 4.8% of Republicans voted 'No' to the proposal, what is the conditional probability that this individual is a Democrat?

1. Which probability are we trying to find?

    - P(Dem|no)
    
2. Based on that, what other pieces do we need?

    - P(no|Dem):.749
    - P(Dem): .61
    - P(no): (.749*.61) + (.048*.39) = .47561
    
3. Result?

    - 


In [2]:
(.749 * .61)/.47561

0.960640020184605

We have this data, we can do this even more exactly:

In [3]:
# Imports, then grab and explore the data
import pandas as pd

In [5]:
df = pd.read_csv("data/clean_house-votes-84.csv")

In [6]:
df.head()

Unnamed: 0,Class Name,handicapped-infants,water-project-cost-sharing,adoption-of-the-budget-resolution,physician-fee-freeze,el-salvador-aid,religious-groups-in-schools,anti-satellite-test-ban,aid-to-nicaraguan-contras,mx-missile,immigration,synfuels-corporation-cutback,education-spending,superfund-right-to-sue,crime,duty-free-exports,export-administration-act-south-africa
0,republican,n,y,n,y,y,y,n,n,n,y,?,y,y,y,n,y
1,republican,n,y,n,y,y,y,n,n,n,n,n,y,y,y,n,?
2,democrat,?,y,y,?,y,y,n,n,n,n,y,n,y,y,n,n
3,democrat,n,y,y,n,?,y,n,n,n,n,y,n,y,n,n,y
4,democrat,y,y,y,n,y,y,n,n,n,n,y,?,y,y,y,y


In [7]:
df.info

<bound method DataFrame.info of      Class Name handicapped-infants water-project-cost-sharing  \
0    republican                   n                          y   
1    republican                   n                          y   
2      democrat                   ?                          y   
3      democrat                   n                          y   
4      democrat                   y                          y   
..          ...                 ...                        ...   
430  republican                   n                          n   
431    democrat                   n                          n   
432  republican                   n                          ?   
433  republican                   n                          n   
434  republican                   n                          y   

    adoption-of-the-budget-resolution physician-fee-freeze el-salvador-aid  \
0                                   n                    y               y   
1                  

Let's find these pieces exactly!

In [8]:
# Grab just the data for the el-salvador-aid vote
vote = df[["Class Name", "el-salvador-aid"]]

In [9]:
vote.head()

Unnamed: 0,Class Name,el-salvador-aid
0,republican,y
1,republican,y
2,democrat,y
3,democrat,?
4,democrat,y


In [15]:
P_dem = vote['Class Name'].value_counts(normalize=True)['democrat']
P_dem

0.6137931034482759

In [18]:
#P_no
P_no = vote['el-salvador-aid'].value_counts(normalize=True)['n']
P_no

0.4781609195402299

In [21]:
#P(no|Dem)
#Given Dems, what's the likelihood that they voted no? 
dems = vote.loc[vote['Class Name'] == 'democrat']

In [25]:
p_no_dem = dems['el-salvador-aid'].value_counts(normalize=True)['n']
p_no_dem

0.7490636704119851

In [26]:
# now the math
(p_no_dem * P_dem) / P_no

0.9615384615384617

Or:

In [28]:
vote_no  = vote.loc[vote['el-salvador-aid'] == 'n']
vote_no['Class Name'].value_counts(normalize=True)

democrat      0.961538
republican    0.038462
Name: Class Name, dtype: float64

### Bonus: MLE? MAP?

If we have time, we can also chat about the two other seemingly-random pieces in this curriculum topic: Maximum Likelihood Estimation (MLE) and the Maximum A Posteriori Estimation (MAP). These are how we estimate parameters given some data.

For this, let's go back to Will Kurt: 

> "When we start learning probability we often are told the probability of an event and from there try to estimate the likelihood of various outcomes. In reality the inverse is much more common: we have data about the outcomes but don't really know what the true probability of the event is. Trying to figure out this missing parameter is referred to as Parameter Estimation."

-- https://www.countbayesie.com/blog/2015/4/4/parameter-estimation-the-pdf-cdf-and-quantile-function

Also: https://www.countbayesie.com/blog/2015/4/4/parameter-estimation-adding-bayesian-priors