## Bayesian Example

**In this example we want to illustrate how Bayes' Theorem can be applied for a concrete example.**<br>
<br>
Let's say there is a medical test, for example taking a blood sample in order to diagnose a disease like cancer based on tumor markers in the blood (see e.g. https://www.cancer.gov/news-events/cancer-currents-blog/2020/cancerseek-blood-test-detect-early-cancer).<br>
<br>
For one particular test the probability to get a positive test result (positive here means tumor markers are found), **given** the patient is sick is 95%. Denoting $S$ as the sick state, $H$ as the healthy state and $+$ as positive test result whereas $-$ is the state for the negative test result, we can formalize the above statement:<br>
<br>
The probability to get a positive test result **given** the patient is sick is 95%: $P(+|S) = 0.95$<br>
<br>

Now, a patient took the test and the **test result is positive**. What is the probability, that the patient is indeed sick?<br>
First, we should understand, that the answer **is not** $P(+|S)$! The probability we are asking for is actually:<br>
<br>
The probability that the patient is sick given there is a positive test result: $P(S|+)$<br>
<br>
And of course $P(S|+)\neq P(+|S)$.

How do we calculate this probability? We recall Bayes' Theorem:<br>
<br>
$P(A|B) = \frac{P(B|A)P(A)}{P(B)}$<br>
<br>

and just apply it to our problem:<br>
<br>
$P(S|+) = \frac{P(+|S)P(S)}{P(+)}$
<br>

Here, $P(S)$ is the prior probability. It is the probability that the patient is sick in the first place, **before** we even were thinking about taking the test. $P(S)$ is called *prevalence* and can be looked up in studies. Usually $P(S)$ is pretty low for most cancer types, but also could be higher for diseases like Alzheimers, depending on the age of the patient.<br>
Since we are testing for a specific disease, $P(S)$ can be looked up and the value is, say $P(S) = 0.0001$ in our case.

The next step is to take care about $P(+)$, the probability to get a positive test result. Here, we have to apply **marginalization**. There are two paths which lead to that result: either the patient is healthy and the test result is just a **false positive** (which becomes more likely the more sensitive the test is), or the patient is indeed sick. Hence:<br>
<br>
$P(+) = P(+|S)P(S)\,+\,P(+|H)P(H)$<br>
<br>
where $P(H)\,+\,P(S) = 1$

The value for $P(+|H)$, the **false positive rate**, is usually also known from studies and is identical to the so called **p-value** in this case. The threshold is usually set arbitrarily to $P(+|H) = 0.01$.<br>
Now, we have all the variables we need:

<br>
$P(S|+) = \frac{P(+|S)P(S)}{P(+|S)P(S)\,+\,P(+|H)P(H)} = \frac{1}{1+ \frac{P(+|H)P(H)}{P(+|S)P(S)}} = \frac{1}{1+ \frac{P(+|H)\left[1-P(S)\right]}{P(+|S)P(S)}}$
<br>

Let's do the math:<br>
<br>
$P(S|+) =  \frac{1}{1+ \frac{0.01\,(1 - 0.0001)}{0.95\,0.0001}} \approx 1\%$
<br>

That is good news! Eventhough the test might be very accurate (high $P(+|S)$), a positive test result does not mean that the patient is sick. Looking at the above equation we see that $P(S|+)$ is so low because the prior $P(S)$ is very small. That is the mathematical reason why unsubstantiated screening for rare diseases is useless, even if the procedure itself is harmless.<br>
However, when having a particular reason, i.e. a symptom, then $P(S)$ changes. Say, you have an obese elderly person with backpain, $P(S)$ for say pancreatic cancer increases by orders of magnitude. We have some **biased prior knowledge** - the symptom. Now, with that symptom, we could in principle return to Bayes' Theorem and apply it to the last equation (see next lecture).