# Question 3: Bayes Law (28 points)

At the end of the last problem, we plugged numbers into the Bayes law formula to verify that it worked. In this problem, we will try to gain a bit more intuition, and use Bayes law to understand the sensitivity/specificity tradeoff.

Here again is the derivation of Bayes law.

First, this is the definition of conditional probability: $P(A \mid B) = P(A \mathbin{and} B) \mathbin{/} P(B)$.

Equivalently: $P(A \mathbin{and} B) = P(A \mid B) \cdot P(B)$.

If $P(A \mathbin{and} B) = P(B \mathbin{and} A)$, then it must be the case that $P(A \mid B) \cdot P(B) = P(B \mid A) \cdot P(A)$.

Dividing by $P(B)$ then gives Bayes law:
$$P(A \mid B) = \frac{P(B \mid A) \cdot P(A)}{P(B)}$$

In class, we talked about a test that was positive for 99% of samples that actually had the activity we were testing for (i.e. "positive controls"). The test was negative for 95% of the "inactive" samples (i.e. those without the activity, or "negative controls"). This first number is called "sensitivity": the fraction of active samples that detected by the test. The second number is "specificity": the fraction of inactive samples that are correctly *not* detected by the test.

We will use the following language throughout: a test can be "positive" or "negative", and a sample can be "active" or "inactive". So a "false positive" is when the test is positive, but the sample is inactive. (To save space, in notation we will use $+$ and $-$ for positve and negative test results, respectively.)

In the space below, write a definition (just in words, no need for formulas) of sensitivity and specificity using the language of conditional probability. I.e. "sensitivity is the probability that ..., given that ...". 

**Note**: in a markdown cell (the text entry cells, like below), to start a new paragraph you need to put a blank line. So separate your two answers by an empty line.

YOUR ANSWER HERE

Now, in words, write out what this probability statement is: $P(\mathrm{active}\mid+)$.

YOUR ANSWER HERE

$P(\mathrm{active}\mid+)$ is also known as the "positive predictive value". This is the value we really want to know!

Test performance can often be written in a $2\times 2$ table. Here is how our test above would look with 100 positive control ("active") and 100 negative control ("inactive") compounds:

|             | active | inactive |
|-------------|--------|----------|
|$\mathbf{+}$ |   99   |     5    |
|$\mathbf{-}$ |   1    |     95   |

In this case, what is the positive predictive value? Define the variable `ppv` as a fraction of two numbers:

In [None]:
# YOUR ANSWER HERE

In [None]:
assert 0.951 < ppv < 0.952

If we screen 2000 total samples, consisting of 100 active samples (i.e. true apoptosis-inducing drugs) and 1900 inactive samples, what is the total number of true positive tests and false positive tests we will get? Adding these together gives the total number of positives. From these, you can calculate $P(+)$ and $P(+\mathbin{and} \mathrm{active})$.

Using the `sensitivity` and `specificity` variables, calculate `num_true_positive`, `num_false_positive`, `num_positive`, `p_positive` and `p_positive_and_active`. Define these variables in terms of one another, and the variables defined for you below.

In [None]:
sensitivity = 0.99
specificity = 0.95
num_active = 100
num_inactive = 1900
total_compounds = num_active + num_inactive
# YOUR ANSWER HERE
print('true positives:', num_true_positive)
print('false positives:', num_false_positive)
print('total positive tests:', num_positive)
print('percent of tests positive:', p_positive*100)
print('percent of (tests positive) and (active):', p_positive_and_active*100)

In [None]:
assert 0.096 < p_positive < 0.098
assert 0.049 < p_positive_and_active < 0.05

What is $P(\mathrm{active}\mid+)$ (i.e. the positive predictive value) in this case? Calculate out the answer two ways.

First, just use what conditional probability means: of the total number of positive results (which you calculated above), how many are from active compounds? Define `ppv1` as this fraction.

Second, use the mathematical definition: 
$$P(\mathrm{active}\mid+) = \frac{P(+\mathbin{and} \mathrm{active})}{P(+)}$$

Using your values for $P(+)$ and $P(+\mathbin{and} \mathrm{active})$ from the previous question, calculate $P(\mathrm{active}\mid+)$, as `ppv2`. Make sure you can see how it simplifies into the same fraction as above.

In [None]:
# YOUR ANSWER HERE

In [None]:
assert 0.5 < ppv1 < 0.52
assert 0.5 < ppv2 < 0.52

So, in the above scenario, about 50% of the positive tests will be for truly active compounds.

Now, write out a $2\times 2$ table for testing 200 active and 49800 inactive samples, using the above sensitivty and specificity. If you write it as:
```markdown
| | active | inactive |
|-|--------|----------|
|+|   xx   |    yy    |
|-|   aa   |    bb    |
```
Then it will turn into a nifty table when you "run" the cell with shift-enter:

| | active | inactive |
|-|--------|----------|
|+|   xx   |    yy    |
|-|   aa   |    bb    |


(**Note**: it won't format right if there is a line of text right above the table. Make sure the table is at the beginning of the cell, or separated by a blank line.)


YOUR ANSWER HERE

Now write the positive predictive value in terms of the numbers in the above table (i.e. the fraction of total positive results that come from active compounds). Store the result in a variable `ppv`. 

In [None]:
# YOUR ANSWER HERE
print('positive predictive value (percent):', ppv*100)

In [None]:
assert 0.073 < ppv < 0.074

Next, let's try to write out a direct formula for  positive preditive value, i.e. $P(\mathrm{active}\mid+)$.

From Bayes law, we have: 
$$P(\mathrm{active}\mid+) = \frac{P(+\mid \mathrm{active})\cdot P(\mathrm{active})}{P(+)}$$

Now, how can we find the $P(+)$, the probability of a postive test? Above, we defined the number of positive results simply as the sum of the true positives and the false positives. Similarly, the probability of a postive test is simply the probability of a true positive *or* a false positive. Because there's no overlap between true and false positives, we can just say $P(+) = P(\textrm{true positive}) + P(\textrm{false positive})$.

What is $P(\textrm{true positive})$? Well, a true positive is an active compound that tested positive. In other words, $\mathrm{active} \mathbin{and} +$. So $P(\textrm{true positive}) = P(\mathrm{active} \mathbin{and} +)$.

So, starting with Bayes law above, we have so far:
\begin{align}
P(\mathrm{active}\mid+) &= \frac{P(+\mid \mathrm{active})\cdot P(\mathrm{active})}{P(+)}\\
&= \frac{P(+\mid \mathrm{active})\cdot P(\mathrm{active})}{P(\textrm{true positive}) + P(\textrm{false positive})}\\
&= \frac{P(+\mid \mathrm{active})\cdot P(\mathrm{active})}{P(\mathrm{active} \mathbin{and} +) + P(\mathrm{inactive} \mathbin{and} +)}
\end{align}

Now, all we need is to come up with a formula for $P(\mathrm{active} \mathbin{and} +)$ and $P(\mathrm{inactive} \mathbin{and} +)$.

How did you calculate $P(\mathrm{active} \mathbin{and} +)$ in the question above? It was pretty straightforward: first, you calculated the number of true positive tests as the sensitivity of the test multiplied by the number of active compounds. (Similarly, the number of false positive tests is 1 - specificity multiplied by the number of inactive compounds.)

Dividing the number of true positives by the total number of compounds tested gave the *probability* of a true positive test. In other words:
$$P(\mathrm{active} \mathbin{and} +) = \frac{\mathrm{sensitivity} \cdot \mathrm{total}(\mathrm{active})}{\textrm{total compounds}}$$

Of course,
$$\frac{\mathrm{total}(\mathrm{active})}{\textrm{total compounds}} = P(\mathrm{active})$$

So: $P(\mathrm{active} \mathbin{and} +) = \mathrm{sensitivity} \cdot P(\mathrm{active})$.

If we substitute  the formal definition of sensitivity into the above, we get: $P(\mathrm{active} \mathbin{and} +) = P(+\mid\mathrm{active}) \cdot P(\mathrm{active})$.

This is, of course, our definition of conditional probability again. It all makes intuitive sense, and mathematical sense!

Plugging everything in together:
$$P(\mathrm{active}\mid+) = \frac{P(+\mid \mathrm{active})\cdot P(\mathrm{active})}{P(+ \mid \mathrm{active})\cdot P(\mathrm{active}) + P(+ \mid \mathrm{inactive})\cdot P(\mathrm{inactive})}$$

Or:
$$P(\mathrm{active}\mid+) = \frac{\mathrm{sensitivity}\cdot P(\mathrm{active})}{\mathrm{sensitivity}\cdot P(\mathrm{active}) + (1 - \mathrm{specificity})\cdot P(\mathrm{inactive})}$$


What are $P(\mathrm{active})$ and $P(\mathrm{inactive})$ in this collection of 200 active and 49800 inactive samples?

Write out calculations in Python for $P(\mathrm{active})$, $P(\mathrm{inactive})$, and $P(\mathrm{active}\mid+)$. Store the results in the variables `p_active`, `p_inactive`, and `ppv`, respectively. Use the variables `p_active`,  `p_inactive`, `sensitivity` and `specificity` in your PPV calculation. 

In [None]:
sensitivity = 0.99
specificity = 0.95
# YOUR ANSWER HERE
print(p_active)
print(p_inactive)
print(ppv)

In [None]:
# test your answers
assert p_active == 4/1000
assert p_inactive == 1-p_active
assert 0.073 < ppv < 0.074 

So both via the formula and just by direct inspection of the $2\times2$ table, we saw a PPV of of 0.0736. This means that 7.36% of the time you get a positive assay result, the compound will be truly active.

If validating a screen hit takes 10 days, it will take 1000 days to validate 100 hits, of which 7 are probably real. That works out to ~136 days for every real hit. No good!

As we mentioned in class, sensitivity and specificity usually trade off against one another. We could make our test more stringent, so that our specificity goes to 99%, say. But that might reduce the sensitivity to 95%. 

Copy the code above to the next cell and change the sensitivity and specificity to 95% and 99%, respectively. Print the PPV.

In [None]:
# YOUR ANSWER HERE
print(ppv)

In [None]:
assert 0.27 < ppv < 0.28

Wow, almost 28% of the positive hits are real!

In the original case, we got 198 true positives and 2490 false positives. With a 95% sensitive, 99% specific test, how many true and false positives do you get?

Using the variables `sensitivity` and `specificity` that you re-defined above, calculate the total number of true and false positives, storing them in the variables `true_positives` and `false_positives`. (Sometimes "sensitivity" is called the "true positive rate" and specificity is called the "true negative rate". Now you should see why...)

In [None]:
# YOUR ANSWER HERE
print(true_positives)
print(false_positives)

In [None]:
assert int(true_positives) == 190
assert int(false_positives) == 498

That's not bad! Switching 99% sensitive / 95% specific for the one that is 95% sensitive / 99% specific means missing out on eight active compounds (going from 198 to 190 true positives), but also excluding 1992 false positives (going from 2490 to only 498).

Unfortunately, many times sensitivity and specificity don't trade off symmetrically. A more realistic scenario might be 99% specific, 80% sensitive. Copy your answer from above and modify to calculate the ppv and number of true and false positives in this case.

In [None]:
# YOUR ANSWER HERE
print(ppv)
print(true_positives)
print(false_positives)

In [None]:
assert 0.24 < ppv < 0.25
assert int(true_positives) == 160
assert int(false_positives) == 498

The positive predictive value didn't change much! But what did change? Write your answer below.

Often a drug company's goal isn't just to find one active compound, but to find many, because so many candidate drugs fail to work out for reasons having nothing to do with activity. (For example, unwanted side effects.) Why is a test with poor sensitity bad in this case? Write your answer below.

YOUR ANSWER HERE

Sometimes false positives and false negatives occur just because of completely random factors. If we assume that errors are completely independent, what is the probability of an inactive compound testing positive on two tests in a row? What is the probability of an active compound testing positive on two tests in a row?

Calculate these probabilites, using the test parameters of 99% sensitivity and 95% specificity. Save the results in variables `p_two_false_pos` and `p_two_true_pos`.

In [None]:
sensitivity = 0.99
specificity = 0.95
# YOUR ANSWER HERE
print(p_two_false_pos)
print(p_two_true_pos)

In [None]:
assert int(p_two_false_pos * 10000) == 25
assert int(p_two_true_pos * 100) == 98

Let's say our testing procedure is just to repeat the individual test twice, and only score the test as positive if **both** repeats score positive. What is the overall sensitivity and specificity of this double test? Define `sensitivity` and `specificity` in terms of `p_two_false_pos` and `p_two_true_pos`, and then calculate the PPV, number of false positives, and number of true positives as before.

In [None]:
# YOUR ANSWER HERE
print('sensitivity:', sensitivity)
print('specificity:', specificity)
print('ppv:', ppv)
print('# true positives:', true_positives)
print('# false positives:', false_positives)

In [None]:
assert 0.61 < ppv < 0.62
assert int(true_positives) == 196
assert int(false_positives) == 124

This is great! Re-testing is a great way to get rid of random errors. We missed a few true active compounds though -- the true positives went from 198 in the original case to 196. For an 80% sensitive test, though, things would be much worse: double-testing would go from 80% sensitive to 64% sensitive.

However, re-testing is not a panacea. What other kinds of errors will re-testing not eliminate? Give your answer in the context of drug screening: why might a compound that doesn't cause apoptosis  cause a colorimetric assay (e.g. turning the well blue / not blue) to repeatedly give a false positive?

YOUR ANSWER HERE