# Bayes Formula

Bayes’ Formula describes the probability of an event, based on prior knowledge of conditions that might be related to the event. The probability of E being present in the samples when the test yields a positive result P(E|Pos). 

## Explain True/False positives/negatives

To understand and use Bayes' Formula, we first need to explain what the outcomes of a diagnostics test can be. 
Let's start with an example (from the book Ott & Longnecker):

Suppose a meat inspector must decide whether a randomly selected meat sample contains E. coli bacteria. 

For this test we have the following data:
- two sets of samples, 10.000 samples large
- where the E. coli bacteria is placed in the first set
- where the E. coli traces have all been removed in the second data set.

In [3]:
#first import the required librairies:
import pandas as pd
import latex

#now let's make the dataframe, we will call this meatdata
meatdata = pd.DataFrame({'Diagnostic Test Result': ['positive', 'negative', 'total'], 
                        'Meat with E (E)': [9500, 500, 10000],
                        'Meat without E (NE)': [100, 9900, 10000]})
meatdata

Unnamed: 0,Diagnostic Test Result,Meat with E (E),Meat without E (NE)
0,positive,9500,100
1,negative,500,9900
2,total,10000,10000


- True positive rate (sensitivity) = $P(Pos|E)$ = $\frac{9.500}{10.000}$ = .95 
- False positive rate = $P(Pos|NE)$ = $\frac{100}{10.000}$ = .01
- True negative rate (specificity) = $P(Neg|NE)$ = $\frac{9.900}{10.000}$ = .99
- False negative rate = $P(Neg|E)$ = $\frac{500}{10.000}$ = .05

## Assignment:
Compute via Bayes' Formula what the $P(E|Pos)$ is with the rate $P(E)$ of 4.5%.

## Solution:

Now that we have gotten the data, the rate and computed the true positive, false positive, true negative and false negative rate, we will be able to fill in this original formula with all the details:

$P(A|B)$ = $\frac{P(B|A)P(A)}{{P(B|A)P(A)} + {P(B|{\bar A})P({\bar A})}}$

Let's begin:

$P(E|Pos)$ = $\frac {P(Pos|E)P(E)} {{P(Pos|E)P(E)} + {P(Pos|NE)P(NE)}}$

Where:
- $P(Pos|E)$ = .95
- $P(E)$ = rate = .045 (4.5%)
- $P(Pos|NE)$ = .01
- $P(NE)$ = 1 - .045 

so:

$P(E|Pos)$ = $\frac {({.95})*({.045})} {{({.95})*({.045})} + {({.01})*({1 - .045})}}$ 

$P(E|Pos)$ = $\frac {0.04275} {({0.04275}) + ({0.00955})}$ = .817

## conclusion: 
In 81,7% of the time a True Positive can be found.
This also means that we can conclude that in 18,3% of the time the test result indicated that there was a presence of E. coli, when in fact this was not correct. 

## Extra:
Now, this was a simple example with only two columns of data. 
You can also test this on larger datasets, this would mean that the formula will become:

$P(A|B)$ = $\frac{P({B_j}|{A_i})P({A_i})} {{P({B_j}|{A_1})P({A_1})} + {P({B_j}|{A_2})P({A_2})} + ... + {P({B_j}|{A_k})P({A_k})}}$

This can be summarized in:

$P(A|B)$ = $\frac{P({B_j}|{A_i})P({A_i})} {{\sum_i}{P({B_j}|{A_i})P({A_i})}}$

here you see there is some number $k$ of possible mutually exclusive underlying events.