# Bayes' rule

I thought about doing this after a video that I have seen from 3b1b. You can find the video [here](https://youtu.be/lG4VkPoG3ko).

The video introduces the Bayes' rule and shows its applications to the relationship between the accuracy and predictability of a cancer test.

Bayes' rule is one of the first things one learns in a probability theory class. It looks like this:

$P(A|B) = \frac{P(A \cap B)}{P(B)} = \frac{P(B|A)P(A)}{P(B)}$.

Bayes' rule helps answering the following kind of questions:
- What is the probability of raining today *given* that it rained yesterdy?
- What is the probability of a car being Italian *given* that it breaks every two months?

And, based on the video from 3b1b:
- What is the probability of having cancer given that the result is positive?

So, let's take a look!

## Setting up the problem

Firstly, it is useful to introduce some terms that are going to appear quite often:
- **prevalence rate** a.k.a. how many people in the population have breast cancer?
- **sensitivity** a.k.a *true positives rate* a.k.a how many out of the people that have breast cancer are identified as positives by the test?
- **specificity** a.k.a. *true negatives rate* a.k.a how many out of the people that DO NOT have breast cancer are identified as negatives by the test?

Now, let's take a look at the problem presented in the video:

It is assumed that 1% of the population has breast cancer. Also, it is known that given that a person has breast cancer the test is going to be positive 90% of the time. At the same it is known that given that person DOES NOT have breast cancer the test is going to be negative 91% of the time.

Now, the quesion is: what is the chance that given that someone tested positive they actually have cancer?

And the answer is... close to 10%.

Let's see how to get there.

## Actually solving the problem

What we are trying to find is the following: $P(cancer|positive)$.

Let's write it using Bayes' rule:

$P(cancer|positive) = \frac{P(cancer \cap positive)}{P(positive)} = \frac{P(positive|cancer)P(cancer)}{P(positive)}$.

Here we can already see two of the terms we have defined before:

- P(cancer) is the prevalance rate
- P(positive|cancer) is the sensitivity

The question is: how to deal with P(positive)?

Well, intuitively there are two ways that someone is considered positive: they have cancer and test positive or they do not have cancer and test positive. In *math* terms:

$P(positive) = P(positive \cap no\, cancer) + P(positive \cap cancer)$

In order to find the two probabilities we can apply Bayes again:

$P(positive \cap no\, cancer) = P(positive|no\, cancer)P(no\, cancer)$, where $P(positive|no\, cancer) = 1-specificity$ and $P(no\, cancer) = 1-prevalence.$

Similarly:

$P(positive \cap cancer) = P(positive|cancer)P(cancer)$

Wraping it up, we get that:

$P(cancer|positive) = \frac{P(positive|cancer)P(cancer)}{P(positive|no\, cancer)P(no\, cancer)+P(positive|cancer)P(cancer)}$

Let's see some results:

In [6]:
def cancer_positive(prevalence, sensitivity, specificity):
    return (sensitivity*prevalence) / ((1-specificity)*(1-prevalence) + sensitivity*prevalence)

prevalence = 0.01
sensitivity = 0.9
specificity = 0.91

print(
    round(cancer_positive(prevalence, sensitivity, specificity), 2)
)

0.09


This indicates that even though the test have a relatively high accuracy, because of the very small prevalnce of the disease their capability to predict if someone is sick is relatively low.

## How is the result changing when varying parameters?