# Exercise 05.02

## Problem:

**[Purpose: Getting an intuition for the previous results by using “natural frequency” and “Markov” representations]**

(A) Suppose that the population consists of 100,000 people. Compute how many people would be expected to fall into each cell of Table 5.4. To compute the expected frequency of people in a cell, just multiply the cell probability by the size of the population. To get you started, a few of the cells of the frequency table are filled in the textbook. Notice the frequencies on the lower margin of the table. They indicate that out of 100,000 people, only 100 have the disease, while 99,900 do not have the disease. These marginal frequencies instantiate the prior probability that p(θ = +) = 0.001. Notice also the cell frequencies in the column θ = +, which indicate that of 100 people with the disease, 99 have a positive test result and 1 has a negative test result. These cell frequencies instantiate the hit rate of 0.99. Your job for this part of the exercise is to fill in the frequencies of the remaining cells of the table.

(B) Take a good look at the frequencies in the table you just computed for the previous part. These are the so-called “natural frequencies” of the events, as opposed to the somewhat unintuitive expression in terms of conditional probabilities (Gigerenzer & Hoffrage, 1995). From the cell frequencies alone, determine the proportion of people who have the disease, given that their test result is positive. Before computing the exact answer arithmetically, first give a rough intuitive answer merely by looking at the relative frequencies in the row T = +. Does your intuitive answer match the intuitive answer you provided when originally reading about Table 5.4? Probably not. Your intuitive answer here is probably much closer to the correct answer. Now compute the exact answer arithmetically. It should match the result from applying Bayes’ rule in Table 5.4.

(C) Now we’ll consider a related representation of the probabilities in terms of natural frequencies, which is especially useful when we accumulate more data. This type of representation is called a “Markov” representation by Krauss, Martignon, and Hoffrage (1999). Suppose now we start with a population of N = 10, 000, 000 people. We expect 99.9% of them (i.e., 9,990,000) not to have the disease, and just 0.1% (i.e., 10,000) to have the disease. Now consider how many people we expect to test positive. Of the 10,000 people who have the disease, 99%, (i.e., 9,900) will be expected to test positive. Of the 9,990,000 people who do not have the disease, 5% (i.e., 499,500) will be expected to test positive. Now consider re-testing everyone who has tested positive on the first test. How many of them are expected to show a negative result on the retest? Use the diagram in the textbook to compute your answer. When computing the frequencies for the empty boxes above, be careful to use the proper conditional probabilities!

(D) Use the diagram in the previous part to answer this: What proportion of people, who test positive at first and then negative on retest, actually have the disease? In other words, of the total number of people at the bottom of the diagram in the previous part (those are the people who tested positive then negative), what proportion of them are in the left branch of the tree? How does the result compare with your answer to Exercise 5.1?

## Solution:

#### A.

Table 5.4 is copied here:

<table>
    <th>
        <td>Disease Present</td>
        <td>Disease Absent</td>
        <td>Marginal (test result)</td>
    </th>
    <tr>
        <td>Positive Test</td>
        <td>0.00099</td>
        <td>0.04995</td>
        <td>**0.05094**</td>
    </tr>
    <tr>
        <td>Negative Test</td>
        <td>0.00001</td>
        <td>0.94905</td>
        <td>**0.94906**</td>
    </tr>
    <tr>
        <td>Marginal (disease presence)</td>
        <td>**0.00100**</td>
        <td>**0.99900**</td>
        <td>**1.00000**</td>
    </tr>
</table>

To calculate the expected cell frequencies, multiply every probability value in the table with N, N being 100,000 people. The calculations are done in R:

In [3]:
pJointDens = matrix(
                    c(0.00099, 0.04995, 0.05094, 0.00001, 0.94905, 0.94906, 0.00100, 0.99900, 1.00000)
                    , nrow = 3, ncol = 3, byrow = TRUE)

N = 100000

pCellFreq = pJointDens*N
print(pCellFreq)

     [,1]  [,2]   [,3]
[1,]   99  4995   5094
[2,]    1 94905  94906
[3,]  100 99900 100000


#### B.

According to the original assumptions, 99% of the people who test positive have the disease. Looking at the relative frequencies, 99 people who tested positive have the disease while 4,995 people who tested positive do not. These frequencies do not match the expected frequencies given our initial assumption.

The actual proportion is calculated in R:

In [4]:
propDPos_TPos = pCellFreq[1, 1]/pCellFreq[1, 3]
print(propDPos_TPos)

[1] 0.01943463


This value matches the value we calculated in ex. 05.01 using Bayes' Theorem.

#### C.

Given that $N = 10,000,000$ and the percentage of people in the population with the disease is 99.9%, we expect $p(\text{D} \, = \, \text{positive}) \times N = 10,000$ and $p(\text{D} \, = \, \text{negative}) \times N = 9,990,000$.

The amount of people who test positive given the disease are:

\begin{equation*}
p(\text{T} \, = \, \text{positive} \, | \, \text{D} \, = \, \text{positive}) \times p(\text{D} \, = \, \text{positive}) \times N = 0.99 \times 10000 = 9900
\end{equation*}

The amount of people who re-test negative given the disease are:

\begin{equation*}
p(\text{T} \, = \, \text{negative} \, | \, \text{D} \, = \, \text{positive}) \times p(\text{T} \, = \, \text{positive} \, | \, \text{D} \, = \, \text{positive}) \times p(\text{D} \, = \, \text{positive}) \times N = 0.01 \times 9900 = 99
\end{equation*}

The amount of people who test positive given no disease are:

\begin{equation*}
p(\text{T} \, = \, \text{positive} \, | \, \text{D} \, = \, \text{negative}) \times p(\text{D} \, = \, \text{negative}) \times N = 0.05 \times 9990000 = 499500
\end{equation*}

The amount of people who re-test negative given no disease are:

\begin{equation*}
p(\text{T} \, = \, \text{negative} \, | \, \text{D} \, = \, \text{negative}) \times p(\text{T} \, = \, \text{positive} \, | \, \text{D} \, = \, \text{negative}) \times p(\text{D} \, = \, \text{negative}) \times N = 0.95 \times 499500 = 474525
\end{equation*}

#### D.

The amount of people who re-test negative who actually have the disease are:

In [6]:
print(99/(474525 + 99))

[1] 0.0002085862


This is the same answer as given in ex. 05.01, as to be expected.