# Bayes' Theorem - Blood Test

> This document is written in *R*.
>
> ***GitHub***: https://github.com/czs108

## Background

> In a certain clinic **0.15** of the patients have a certain *virus*. Suppose a blood test is carried out on a patient.
>
> If the patient has got the virus the test will turn out *positive* with probability **0.95**. If the patient does *not* have the virus the test will turn out *positive* with probability **0.02**.

## Question A

> Calculate the probability that a patient will have a *positive* test.

Give the events the following labels:

\begin{align}
V &= \text{The patient has got the virus.} \\
P &= \text{The outcome of the test is positive.}
\end{align}

According to the background:

\begin{align}
P(V) = 0.15 \\
P(P \mid V) = 0.95 \\
P(P \mid \overline{V}) = 0.02
\end{align}

There are 2 situations that the outcome of the test can be *positive*.

\begin{equation}
\begin{split}
P(P) &= P(V \cap P) + P(\overline{V} \cap P) \\
    &= P(V) \cdot P(P \mid V) + P(\overline{V}) \cdot P(P \mid \overline{V}) \\
    &= 0.15 \times 0.95 + 0.85 \times 0.02 \\
    &= 0.1595
\end{split}
\end{equation}

## Question B

> If the test is *positive* then:

### Question 1

> What is the probability that the patient has the virus?

\begin{equation}
\begin{split}
P(V \mid P) &= \frac{P(P \mid V) \cdot P(V)}{P(P)} \\
    &= \frac{0.95 \times 0.15}{0.1595} \\
    &= 0.8934
\end{split}
\end{equation}

### Question 2

> What is the probability that the patient does *not* have the virus?

\begin{equation}
\begin{split}
P(\overline{V} \mid P)
    &= 1 - P(V \mid P) \\
    &= 0.1066
\end{split}
\end{equation}

## Question C

> If the test is *negative* then:

### Question 1

> What is the probability that the patient has the virus?

\begin{equation}
\begin{split}
P(V \mid \overline{P}) &= \frac{P(\overline{P} \mid V) \cdot P(V)}{P(\overline{P})} \\
    &= \frac{(1 - 0.95) \times 0.15}{1 - 0.1595} \\
    &= 0.0089
\end{split}
\end{equation}

### Question 2

> What is the probability that the patient does *not* have the virus?

\begin{equation}
\begin{split}
P(\overline{V} \mid \overline{P})
    &= 1 - P(V \mid \overline{P}) \\
    &= 0.9911
\end{split}
\end{equation}

## Question D

> Write some *R* code which simulates the possible outcomes of a blood test. You can use the following line of example code:
>
> ```r
> if (runif(1) <= 0.15)
> ```
>
> The *R* command `runif(1)` generates a random number between the values **0.0** and **1.0**.
>
> If this random number is less than or equal to **0.15** then we can say that the patient has the virus, otherwise the patient does *not* have the virus.
>
> In a similar way, you can decide if the test is *positive* or *negative*.
>
> Then run this code **100000** times.

In [1]:
Virus <- function(count) {
    virus <- rep(FALSE, times=count)
    for (i in c(1:count)) {
        if (runif(1) < 0.15) {
            virus[i] <- TRUE
        }
    }

    return (virus)
}

Test <- function(virus) {
    pos <- rep(FALSE, times=length(virus))
    for (i in c(1:length(virus))) {
        if (virus[i]) {
            if (runif(1) < 0.95) {
                pos[i] <- TRUE
            }
        } else {
            if (runif(1) < 0.02) {
                pos[i] <- TRUE
            }
        }
    }

    return (pos)
}

count <- 100000
virus <- Virus(count)
tests <- Test(virus)

### Question 1

> How often does the test turn out *positive*?

In [2]:
sum(tests) / count

### Question 2
 
> How often that the patient has the virus if the test is *positive*?

In [3]:
sum(tests & virus) / sum(tests)

## Question E

> Modify the code to include a *2nd* blood test on the patient. You can assume that the *2nd* test is unaffected by the *1st* test.
>
> Then run this code **100000** times.

In [4]:
tests.1 <- tests
tests.2 <- Test(virus)

### Question 1

> How often do you get two *positive* tests?

The two tests are *conditionally independent* under the status of a patient, so:

\begin{equation}
P(P_{1} \cap P_{2} \mid V) = P(P_{1} \mid V) \cdot P(P_{2} \mid V)
\end{equation}

\begin{equation}
\begin{split}
P(P_{1} \cap P_{2})
    &= P(P_{1} \cap P_{2} \mid V) \cdot P(V) + P(P_{1} \cap P_{2} \mid \overline{V}) \cdot P(\overline{V}) \\
    &= P(P_{1} \mid V) \cdot P(P_{2} \mid V) \cdot P(V)
    + P(P_{1} \mid \overline{V}) \cdot P(P_{2} \mid \overline{V}) \cdot P(\overline{V}) \\
    &= 0.95 \times 0.95 \times 0.15 + 0.02 \times 0.02 \times 0.85 \\
    &= 0.1357
\end{split}
\end{equation}

In [5]:
allpos <- sum(tests.1 & tests.2)

allpos / count

### Question 2

> If you get two *positive* tests, how often does the patient have the virus?

\begin{equation}
\begin{split}
P(V \mid P_{1} \cap P_{2})
    &= \frac{P(P_{1} \cap P_{2} \mid V) \cdot P(V)}{P(P_{1} \cap P_{2})} \\
    &= \frac{0.95 \times 0.95 \times 0.15}{0.1357} \\
    &= 0.9976
\end{split}
\end{equation}

In [6]:
sum(virus & tests.1 & tests.2) / allpos

### Question 3

> If the *1st* test is *postive* and the *2nd* is *negative*, how often does the patient have the virus?

According to *conditional independence*:

\begin{equation}
P(P_{1} \cap \overline{P_{2}} \mid V) = P(P_{1} \mid V) \cdot P(\overline{P_{2}} \mid V)
\end{equation}

Then:

\begin{equation}
\begin{split}
P(P_{1} \cap \overline{P_{2}})
    &= P(P_{1} \cap \overline{P_{2}} \mid V) \cdot P(V) + P(P_{1} \cap \overline{P_{2}} \mid \overline{V}) \cdot P(\overline{V}) \\
    &= P(P_{1} \mid V) \cdot P(\overline{P_{2}} \mid V) \cdot P(V)
    + P(P_{1} \mid \overline{V}) \cdot P(\overline{P_{2}} \mid \overline{V}) \cdot P(\overline{V}) \\
    &= 0.95 \times 0.05 \times 0.15 + 0.02 \times 0.98 \times 0.85 \\
    &= 0.0238
\end{split}
\end{equation}

\begin{equation}
\begin{split}
P(V \mid P_{1} \cap \overline{P_{2}})
    &= \frac{P(P_{1} \cap \overline{P_{2}} \mid V) \cdot P(V)}{P(P_{1} \cap \overline{P_{2}})} \\
    &= \frac{0.95 \times 0.05 \times 0.15}{0.0238} \\
    &= 0.2994
\end{split}
\end{equation}

In [7]:
sum(virus & tests.1 & !tests.2) / sum(tests.1 & !tests.2)