# MCBDD 2022 Module I Offline Activities
Jitao David Zhang

Offline activities and the Google Form to submit: https://forms.gle/Upe54w9upH8JFPsg6

In [None]:
## prepare for visualizations
%matplotlib inline

import matplotlib.pyplot as plt
import numpy

plt.style.available
plt.style.use('ggplot')

## Task 1

### Question

The company *Fränzi and Friends* developed a new quick test at home for SARS-Cov-2 which is pending regulatory agency’s review. The test has been shown to have a sensitivity of 99% and a specificity of 99%. Suppose that Fred uses the test by Fränzi and Friends and the test was positive. Assume that 5% of the population is in fact infected. Was is your guess about the probability that Fred is indeed infected?

### Analysis

We let $I$ denote an event of *infection* and $H$ to indicate an event of non-infection (healthy). Then $p(I)$ indicates the probability that someone from a population that we are studying is infected, $p(H)$ the prbobability of someone being healthy. Let us assume that someone is either infected or healthy, which means that $p(I)+p(H)=1$. 

Similarily, $P$ indicates a positive test and $N$ indicates a negative test. Let us assume that a test is either positive or negative, then $p(P)+p(N)=1$.

Furthermore, we can define the following terms:

* $p(P|I)$ indicates the *conditional probability* of someone receiving a positive test *given that* she is infected, which is the sensitivity of the test - 99% in our example above.
* $p(N|H)$ indicates the *conditional probability* of someone receiving a negative test *given that* she is healthy, which is the specificity of the test - 99% in our example above.

We let $p(P,I)$ indicate the probability that someone is both infected and tested positive. What is this probability? It is **not** $p(P) \times p(I)$, because an infection and a positive test is definitely not independent from each other. Instead, we can think of the event that someone is both infected and tested positive as a two-step event: first, someone must first be infected ($p(I)$); second, given that she is infected, she is tested positive ($p(P|I)$). Since both steps must happen, we have $p(P,I)=p(I)p(P|I)=p(I|P)p(P)$, *i.e.* the probability of being infected ($p(I)$) multiplied by the probability of being positively tested given that the person is infected ($p(P|I)$). This is known as the *chain rule* of probability.

We can also switch the two steps to arrive at $p(P,I)$: first, someone must be tested positive ($p(P)$); second, given that she is positively tested, she is infected ($P(I|P)$). Or equivalently, $p(P,I)=p(P)p(I|P)$. If you may find this less intuitive, think of a quick test which is followed by an more accurate but laborious PCR test.

We reach an interesting equation:

$$ p(P)p(I|P)=p(I)p(P|I) $$

Or equivalantly,

$$ P(I|P) = \frac{p(I)p(P|I)}{p(P)} $$

This equation is known as the *Bayes theorem*. 

Given that any person that is being tested positive is either infected or healthy, we can write $p(P)=p(P,I)+p(P,H)=p(P|I)p(I)+p(P|H)p(H)$, or equivalently $p(P)=p(I|P)p(P)+p(H|P)p(P)$, thanks to the Bayes theorem. Therefore we also call $p(P)$ a *marginal probability*, which sum up all subordinate *conditional probabilities*. 

Having gained the ability of writing down the marginal probability, we can rewrite the Bayes theorem as

$$ P(I|P) = \frac{p(I)p(P|I)}{p(I)p(P|I)+p(H)p(P|H)} $$

Now we are ready to tackle the original question: following the denotations above, we can translate the question into $p(I|P)$, the probability of Fred being infected that he is tested positive. Then we just need to get values for each symbol on the right side of the equation:

1. The question states that 5% of the population is in fact infected, namely $p(I)=0.05$. Since we assume that anyone is either healthy or infected, we have $p(H)=0.95$.
2. $p(P|I)=0.99$
3. We are only left with $p(P|H)$, *i.e.* the probability that a healthy person is tested positive, which we not know yet. 
    1. However, using Bayes theorem again, we can re-write it as $p(P|H)=\frac{p(H|P)p(P)}{p(H)}$. So the only item we do not know now is $p(H|P)$.
    2. 

$p(I|P) = \frac{p(P|I)p(I)}{p(I)}$

It follows that $p(H)=1-0.05=0.95$. Given the sensitivity $p(P|I)=0.99$, we can derive the *false positive rate* $p(P|H)