# Probability theory

<b>Probability is how strongly we believe an event will happen (usually expressed as a percentage)</b>

For example:
* What is the probability patient X arriving to A&E will require a bed?
* Will patient Y cancel their appointment?
* What is the likelihood patient Z has O negative blood type?

<div class="alert alert-block alert-warning">
<b>Probability theory is a difficult and unintuitive area of statistics!</b>
</div>

## What is the probabilty of someone having a certain blood type?

In [1]:
# Reference: https://www.blood.co.uk/why-give-blood/blood-types/
blood_type_data = {
  "Blood Type": ['O positive', 'O negative', 'A positive', 'A negative', 'B positive',
                 'B negative', 'AB positive', 'AB negative'],
  "Prevailance": [0.35, 0.13, 0.3, 0.08, 0.08, 0.02, 0.02, 0.01]
}

Using our above definition and assuming the population of Somerset has the same proportions. Lets consider the following event:
* Randomly selecting someone in Somerset and finding they are O positive

Mathematically, this is written as:
<br>
$P(X) = 0.35$, where X is the event of interest (person selected being O positive).

There are a number of rules of probability. We'll cover some of these rules and see how they can be applied in the context of selecting patients from Somerset and observing their blood type.

<div class="alert alert-block alert-info">
* <b>Rule 1</b> * The probability of an event must strictly be between 0 and 1.

- 0 = Impossible event
- 1 = Certain event
</div>

$P(X) = 0.35$ is in agreement with Rule 1.

Note that $P(X) = -0.35$ or $P(X) = 1.35$ are not valid probabilities!

What is the probability of selecting someone at random and them * ***not*** * being O positive?

<div class="alert alert-block alert-info">
* <b>Rule 2 (Complement rule)</b> * The probability of an event NOT happening is 1 minus the probability of the event happening.
</div>

By applying rule 2, we see that $P(not X) = 1 - 0.35 = 0.65$ (65%)

Next, we randomly select two individuals from Somerset.

<b>What is the probability they are * ***both*** * O positive?</b>

<div class="alert alert-block alert-info">
* <b>Rule 3 (Product rule)</b> * Multiply probabilities to get overall probability of a sequence of *INDIPENDANT EVENTS* (i.e., the outcome of one event does not affect the other).
</div>

Note that this is equivilant to the intercept in set theory (or the AND operator).

Let's consider the event Y "Randomly picking two individuals from Somerset and them both being O positive"

By applying rule 3,  P(Y) = $0.35\times0.35 = 0.12$

<b>What is the probability that at least 1 of them are O positive?</b>

<div class="alert alert-block alert-info">
* <b>Rule 4 (Union probabilities)</b> * Sum probabilities to get overall probability of a sequence of *MUTUALLY EXCLUSIVE EVENTS* (i.e., the events can't occur at the same time).
</div>

Note that this is equivilent to the union in set theory (or the OR operator).

Let's consider the event Z "Randomly picking two individuals from Somerset and finding that either of them are O positive"

By Rule 4 (or considering the probability tree) we can see $P(Z) = 0.12 + 0.23 + 0.23 = 0.58$. <br>
(Note, we could have equally used Rule 2 i.e., $P(Z) = 1 - 0.42 = 0.58$)

Let's see if we can verify rules 3 & 4 by using simulation in python!
<br>
To do this, we will make use of the ```random``` library (read up on the documentation here: https://docs.python.org/3/library/random.html)

In [2]:
import random
#We will write a function for randomly selecting patients (who has a 35% chance of being O positive).

def select_patients(n):
    outcome = [] #Initialise empty list to store patient blood type
    for x in range(n):
        blood_type = random.randint(1, 100) # Randomly pick a number between 1 and 100
        if blood_type <= 35: # Therefore 35% of the time, patient will be O positive
            outcome.append('O positive')
        else: # The other 65% of the time, patient will have differnce blood type
            outcome.append('NOT O positive')
    return outcome

In [3]:
# Let's run this function and randomly select 2 patients:
select_patients(2)

['NOT O positive', 'O positive']

### Task (20 minutes)

<b>1. Verifying Rule 3</b>

Using the ```select_patients``` function, can you write some code that approximates the proportion of time two members are selected where we get BOTH being O positive? Is this in agreement with Rule 3 and the above probability tree diagram?

HINT: Try writing a for loop and keep a count of the number of times we get ```['O positive', 'O positive']```.

<b>2. Verifying Rule 4 (if you have time)</b>

Modify your code from the previous task to answer the above question through simulation. In this case, we are interested in the outcomes ```['O positive', 'O positive'], ['NOT O positive', 'O positive'], ['O positive', 'NOT O positive']```

HINT: Try writing a for loop and keep a count of the number of times we get ```['NOT O positive', 'NOT O positive']```

In [4]:
#Put your code here

In [5]:
## ANSWER
runs = 5000
counter_blood_types = 0
for x in range(runs):
    outcome = select_patients(2)
    if outcome == ['O positive', 'O positive']:
        counter_blood_types += 1
print(counter_blood_types/runs)

0.1216


In [6]:
## ANSWER
runs = 10000
counter_blood_types = 0
for x in range(runs):
    outcome = select_patients(2)
    if outcome == ['NOT O positive', 'NOT O positive']:
        counter_blood_types += 1
print(1 - (counter_blood_types/runs) )

0.5763


### Conditional probability and Bayes' Theorom

Conditional probability is defined as the likelihood of an event or outcome occurring, based on the occurrence of a previous event or outcome. For example, the probability a patient has a particular disease given the presense of specific set of symptoms or by a result of a diagnostic test.

<div class="alert alert-block alert-info">
Bayes' Theorem is given by the following mathematical formulae:

### $P(A|B)=\frac{P(B|A)P(A)}{P(B)}$
where:<br>
* $P(A|B)$ = Probability of event A happening *given* B has happened.<br>
* $P(B|A)$ = Probability of event B happening *given* A has happened.<br>
* $P(A)$ = Probability of event A happening.<br>
* $P(B)$ = Probability of event B happening.<br>
<div/>

Lets use a very simple example of a fire alarm going off while working.

A fire alarm is defined as "a device making a loud noise that gives warning of a fire.", but how often have you heard an alarm and not left the building? This is due to prior experience! Let's use Bayes' Thorom to demonstrate this.

* prob_fire = 0.0001 #Very unlikely for there to be an actual fire (but this is a made up figure)
* prob_alarm = 0.01 # Whats the probability of the alarm going off?
* prob_alarm_given_fire = 0.95 # Should be very good at detecting fires
* prob_fire_given_alarm = ?

In [7]:
# We can plug these numbers into Bayes' Therom:

prob_fire_given_alarm = (0.95*0.0001)/0.01
print(prob_fire_given_alarm)

0.0095


So even though the alarm is very good at detecting fires, there's actually a low probability of there actually being a fire, since they occur so rarely!