## Understanding conditional probability and Bayes' Theorem

This note is intended to take people through how to apply conditional probability and Bayes' theorem to a sample dataset. It is largely based on the tutorial provided here: https://machinelearningmastery.com/bayes-theorem-for-machine-learning/. 

For this example we'll take some data from Wave 33 of the BEIS public attitude survey (https://www.gov.uk/government/statistics/beis-public-attitudes-tracker-wave-33) and just look whether someone supported or opposed a policy based on their gender. For the purposes of this exercise we are ignoring those who stated they don't know and we're going to combine support and oppose options into one.

The data used here is taken from a question on whether you support or oppose the use of renewable energy to provide electricty, fuel and heat and what gender the respondent was.

The breakdown is as follows.

In [2]:
#note each includes strongly support and support and strongly oppose and oppose hence the additions
male_support = (408 + 352) 
male_oppose = (4 + 17)
female_support = (293 + 458)
female_oppose =( 5 + 18)

n = male_support + male_oppose + female_support + female_oppose
n

1555

### Marginal probability
Let's start with marginal probability. This is the probability  of an event irrespective of outcome of other variables. This might be seen written as:

P(A)

So here, if we wanted to calculate the probability of being female (assuming out study is representative) it might be something like:

In [3]:
total_females = (female_support + female_oppose)
p_female =  total_females / n
round(p_female,3)

0.498

### Joint probability

This is the probability of two or more similtaneous events occuring, the joint probability is often noted as just the outcomes and written as follows:

P(A and B) or P(A, B).

So in this example we might look at the joint probability that someone is both male and support the renewable enrergy:

P(male AND supports renewables)

In [4]:
round(male_support / n, 3)

0.489

### Conditional probability 

This is the probability of one event given the occurence of another event. This is often noted as 

P(A given B) or P(A | B).

In this example it might be the probability of being male and supporting renewable given than you support renewables.

The conditional probability can be calculated as follows:

P(A | B) = P(A + B) / P(B)

P(Male given that support renewables) = P( Male and Supports renewables) / P(supports renewables)



In [5]:
total_support = male_support + female_support
male_given_support = male_support / (total_support)

round(male_given_support, 4)

0.503

Note we can also include the denominator in the equation (so turn it into a probability) but it doesn't change the outcome.

In [6]:
male_given_support = (male_support / n) / ((total_support)/n)

round(male_given_support, 4)

0.503

We can also calculate the joint probability using the conditional probability:
    
P(A, B) = P(A | B) * P(B)

P(Male and Supports renewables) = (Probability of male given Supports renewables) * Probability of supporting renewables

In [7]:
joint_prob = male_given_support * (total_support / n)
round(joint_prob, 4)

0.4887

### Calculating conditional probability from the other conditional probability; Bayes' Theorem

Unlike joing probabilites, conditional probabilities are not symmetrical:

P(A | B) != P(B | A)

But we can use one conditional probability to derive the other:

P(A | B) = P(B | A) * P(A) / P(B)

And

P(B | A) = P(A | B) * P(B) / P(A)

This is known as Bayes' Theorem and is useful for calculating the conditional probability when the join is unknown.

A quick note here, that although it is written P(A | B), it's actually probability of A and B given B. Sometimes it's useful to write out the whole thing as it helps in understanding you are referencing the same data. Therefore in some of the examples below you might see I have written out the whole formula out.

In our example here we calculate the conditional probability of being male and supporting renewable energy using the conditional probability for males and supporting renenwables.

**P(male and support renewable given supports renewables) = P(supports renewables and is male | is male) * P(male) / P(Supports renewables)**

In [8]:
total_males = male_support + male_oppose
male_given_support = ((male_support / total_males)  * (total_males / n)) / (total_support / n)
round(male_given_support,3)

0.503

To make this calculation concrete, as we noted above above, we can use the conditional probability to work out the join probability. All the below are equivalent:

P(A, B) = P(A | B) * P(B) = P(A, B) = P(B | A) * P(A)

Which can therefore simplify to

P(A | B) * P(B) = P(B | A) * P(A)

This equlity means we can derive Bayes' formula by dividing by P(B):

P(A|B) = P(B|A)P(A)/P(B)


#### Bayes' Theorem

We often do not have access to the denominator directly, e.g. P(B) but again we can calculate this alternatively with only part of the information.

P(B) = P(B | A) * P(A) + P(B | not A) * P( not A).

Therefore the whole formula becomes:

P(A | B) = P(B | A) * P(A) / P(B | A) * P(A) + P(B | not A) * P( not A).

This can also be thought of in terms of the following:

Posterior probability = Likelhood * Prior probability / Evidence

Where:

P(A | B) = Posterior = Your updated probability or belief following the evidence

P(B | A) = Likelihood = The likelihood of observing the given phenomenon

P(A) = Prior = Prior probability or belief before new evidence is introduced.

P(B) = Evidence = Considering all possible hypothesis (normalising factor)

For our purposes this can be writen out as:

``P(male and support renewable given supports renewables) = 
P(support renewables and is male given is male) * P(male) / 
P(support renewables and is male given is male) * P(male) + P(supports renewables and is NOT male given is NOT male) * P(Not male)``

To derive those  complement values we simply do the following:
    
P(not A) = 1 - P(A) = 1 - P(male)

And

P(B | not A) = 1 - P(not B | not A) =  P(supports renewables and is NOT male given is NOT male) = 1 - P(does NOT support renwables given NOT Male)

We can plug this in below as follows

In [18]:
male_given_support = ((male_support / total_males)  * (total_males / n)) / (((male_support / total_males)  * (total_males / n)) + ((total_support - male_support) / (n - total_males)) * (1-(total_males / n)))
## which results in
round(male_given_support,3)

0.503

The result is the same as when we had all of the data and information above.

You'll note we get the same probability if we take away the proportion of females who oppose given they are females, effectively the same porportion as the number as those who support the measure given they are not male. 
The equivalent of:
1 - (not B | not A) =  1 - P(Not support and not male given not male)

In [107]:
#Not support given not male / support given not male
print(1 - female_oppose / total_females,
     ((total_support - male_support) / (n-  total_males)))

0.9702842377260982 0.9702842377260982


### Bayes theorem in practice

#### The coin toss question

This has fast become a classic tech job interview question and we can use Bayes' theorem to derive the answer.

***You have a bag of 100 coins and one of them is biased towards heads. You randomly take out one of those coins and toss it 3 times, and each time it lands on heads.
What's the probability you have have chosen the bias coin?***

We only have a very limited amount of information here but we can use Bayes's theorem as follows.

First let's start with writing out the question as a formula.

  ```P(Coin is biased and 3 heads | 3 heads) = P(3 heads and the coin is biased | given coin is biased) * P(biased coin) / P(B).```
  
  ```P(A | B) = P(3 heads and the coin is biased | given coin is biased) * P(biased coin) / P(3 heads and the coin is biased | given coin is biased) * P(biased coin) + P(3 heads given coin is NOT biased) * P( Coin NOT biased) ```

Let's start to plug this is to the formula:

P(B | A) = P(3 heads given coin is biased) = 1 

P(A)  = P(Choosing a bias coin)  = 1 / 100

P(three heads) = 0.5 ^ 3 = 1/8

Therefore:

  ```P(Coin is biased and three head | 3 heads) = 1 * 1/100 / (1 * 1/100) + (99 / 100 * 1/8).```



In [124]:
heads_bias_coin =  1 # if the coin is bias it will always be heads!
bias_coin  = 1/100
three_heads = 0.5 **3

p_coin_bias = (1* 1/100) / ((1* 1/100) + (0.5**3 * 99/100))

#therefore the probability is
round(p_coin_bias, 4)                            
                            

0.0748

There is therefore a **7%** chance you picked the biased coin.

#### The Covid test question

Now we are more comfrotable with BEIS theorem we can start applying it to see what the likelihood of having covid is following a positive result for a lateral flow test.

A cochrane study found that on average the lateral flow test sensitivity (it's ability to detect true positives) was 78% (https://www.bmj.com/content/372/bmj.n823).

The ONS have previously estimated that around 2% of the population (1 in 50) had covid.

So our question here is, given a positive result from a lateral flow test what is the likelihood that an individual has covid?

Out formula will therefore look something like this:

P(has covid | positive test) = P(positive test | has covid) * P(has covid) / P(given positive test | has covid) * P(has covid) + P(Positive test |  NOT covid) * P(NOT covid)

In order to solve this we need to know one final bit of info which is how good the test is at ideintifying true negatives. This is referred to as the specificicity and in this case, the same study suggest it was on average 97%.

Now if we recall above, where we don't know the P(B) we can derive it as follows: 

P(B | not A) = 1 - P(NOT B | NOT B)

P(Positive test given | NOT Covid)  =  1 - (NOT Positive | NOT Covid) = 1- (True Negative) = 1 -0.97 = 0.03



In [29]:
test_positive = 0.78
p_covid = 0.02
p_pos_not_covid = 1 - 0.97
p_not_covid = 1 - p_covid

p_covid = (test_positive * p_covid) / ((test_positive * p_covid) + (p_pos_not_covid * p_not_covid))

round(p_covid, 3)

0.347

Therefore the proability that the test has returned a positive result and that the individual has covid is actually 35%.

#### Updating our priors

A key part of Bayes theoreom and application is about updating our priors as we get new evidence. Now that we have done the covid test, our prior for having covid has increased to **35%** or more accurately 0.347.

Therefore if we were to take a second covid test and get a positive result:

In [30]:

p_covid = (test_positive * p_covid) / ((test_positive * p_covid) + (p_pos_not_covid * p_not_covid))

round(p_covid, 3)


0.902

So we note following a second test that our new posterior probability that we have covid has increased to **90%**. If we did a third test and also got a positive result, the probability would be even higher. 