# Exercise 05.03

## Problem:

**[Purpose: To see a hands-on example of data-order invariance.]**

Consider again the disease and diagnostic test of the previous two exercises.

(A) Suppose that a person selected at random from the population gets the test and it comes back negative. Compute the probability that the person has the disease.

(B) The person then gets re-tested, and on the second test the result is positive. Compute the probability that the person has the disease. How does the result compare with your answer to Exercise 5.1?

## Solution:

#### A.

The following conditional probabilities are given of the treatment results given disease presence:

\begin{align*}
p(\text{T} \, = \, \text{positive} \, | \, \text{D} \, = \, \text{positive}) &= 0.99 \\
p(\text{T} \, = \, \text{negative} \, | \, \text{D} \, = \, \text{positive}) &= 0.01 \\
p(\text{T} \, = \, \text{positive} \, | \, \text{D} \, = \, \text{negative}) &= 0.05 \\
p(\text{T} \, = \, \text{negative} \, | \, \text{D} \, = \, \text{negative}) &= 0.95 \\
\end{align*}

They are represented in the following equations:

In [4]:
# probability of T = + given D = +
pTPos_DPos = 0.99

# probability of T = - given D = +
pTNeg_DPos = 0.01

# probability of T = + given D = -
pTPos_DNeg = 0.05

# probability of T = - given D = -
pTNeg_DNeg = 0.95

# background probability of D = +
pDPos = 0.001

# background probability of D = -
pDNeg = 1 - pDPos

Table 5.4 gives the following information as the joint distribution of the test results and disease presence:

<table>
    <th>
        <td>Disease Present</td>
        <td>Disease Absent</td>
        <td>Marginal (test result)</td>
    </th>
    <tr>
        <td>Positive Test</td>
        <td>0.00099</td>
        <td>0.04995</td>
        <td>**0.05094**</td>
    </tr>
    <tr>
        <td>Negative Test</td>
        <td>0.00001</td>
        <td>0.94905</td>
        <td>**0.94906**</td>
    </tr>
    <tr>
        <td>Marginal (disease presence)</td>
        <td>**0.00100**</td>
        <td>**0.99900**</td>
        <td>**1.00000**</td>
    </tr>
</table>

The probability that an individual has the disease given that the test was negative is given by the following equation:

\begin{equation}
p(\text{D} \, = \, \text{positive} \, | \, \text{T} \, = \, \text{negative}) = \frac{p(\text{T} \, = \, \text{negative}, \, \text{D} \, = \, \text{positive})}{p(\text{T} \, = \, \text{negative})}
\end{equation}

The following calculation can be performed in R:

In [2]:
pDPos_TNeg = 0.00001/0.94906
print(pDPos_TNeg)

[1] 1.053674e-05


There is an almost negigent probability that a person testing negative will have the disease.

#### B.

Assume that the $p(\text{D} \, = \, \text{positive} \, | \, \text{T} \, = \, \text{negative})$ that was just calculated is now the new $p(\text{D} \, = \, \text{positive})$ value. That is, the posterior becomes the next prior. Now:

\begin{equation*}
p(\text{D} \, = \, \text{negative}) = 1 - p(\text{D} \, = \, \text{positive})
\end{equation*}

Finding the new posterior that someone has the disease given that the re-test was positive can be performed in R:

In [6]:
# set the posterior as the new prior
pDPos2 = pDPos_TNeg
pDNeg2 = 1 - pDPos2

# T = + is the sum of p(T = +, D = +) and p(T = +, D = -)
pTPos2 = pTPos_DPos*pDPos2 + pTPos_DNeg*pDNeg2

pDPos_TPos2 = pTPos_DPos*pDPos2/pTPos2
print(pDPos_TPos2)

[1] 0.0002085862


Unsurprisingly, this result is the same as the one from ex. 05.01.