<div style="width: 38.5%;">
    <p><strong>City College of San Francisco</strong><p>
    <hr>
    <p>MATH 108 - Foundations of Data Science</p>
</div>

# Lecture 36: Updating Predictions

Associated Textbook Sections: [18.0 - 18.2](https://inferentialthinking.com/chapters/18/Updating_Predictions.html)

<h2>Set Up the Notebook<h2>

In [None]:
from datascience import *
import numpy as np
%matplotlib inline
import matplotlib.pyplot as plots
plots.style.use('fivethirtyeight')
import warnings
warnings.simplefilter('ignore')

def create_population(prior_disease_prob, n):
    disease = round(n * prior_disease_prob)
    no_disease = round(n * (1 - prior_disease_prob))

    status = np.array(['Disease'] * disease  +  ['No disease'] * no_disease)
    result = np.array(['Test +'] * (disease) + ['Test +'] * (round(no_disease * 0.05))  + \
                 ['Test -'] * (round(no_disease * 0.95)))
                 
    t = Table().with_columns(
    'Status', status,
    'Test Result', result
    )
    return t.pivot('Test Result', 'Status')

## Decisions

### Decisions Under Uncertainty

[*Interpretation by Physicians of Clinical Laboratory Results (1978)*](https://www.nejm.org/doi/full/10.1056/nejm197811022991808)

> We asked 20 house officers, 20 fourth-year medical students and 20 attending physicians, selected in 67 consecutive hallway encounters at four Harvard Medical School teaching hospitals, the following question:
>
> **If a test to detect a disease whose prevalence is 1/1000 has a false positive rate of 5%, what is the chance that a person found to have a positive result actually has the disease, assuming that you know nothing about the person's symptoms or signs?**
>
> Eleven of 60 participants, or 18%, gave the correct answer. These participants included four of 20 fourth-year students, three of 20 residents in internal medicine and four of 20 attending physicians. The most common answer, given by 27, was that \[the chance that a person found to have a positive result actually has the disease\] was 95%.

## Conditional Probability

### Scenario 1

* Scenario:
    * Class consists of second years (60%) and third years (40%)
    * 50% of the second years have declared their major
    * 80% of the third years have declared their major
* Pick one student at random.
* Which is more likely: Second year or Third year?

*Response: ...*

### Scenario 2

* Slightly different scenario:
    * Class consists of second years (60%) and third years (40%)
    * 50% of the second years have declared their major
    * 80% of the third years have declared their major
* Pick one student at random... 
* That student has declared a major!
* Which is more likely: Second Year or Third Year?

### Demo: Scenario 2

The following table provided a representation of the above scenario with 100 students.

In [None]:
n = 100
second = round(n * 0.6)
third = round(n * 0.4)

year = np.array(['Second'] * second + ['Third'] * third)
major = np.array(['Declared'] * (round(second * 0.5)) + ['Undeclared'] * (round(second * 0.5)) + \
                 ['Declared'] * (round(third * 0.8))  + ['Undeclared'] * (round(third * 0.2)))
                 
students = Table().with_columns(
    'Year', year,
    'Major', major
)
students.show(3)

In [None]:
students.pivot('Major', 'Year')

Verify: 60% of students are Second years, 40% are Third years

In [None]:
...

Verify: 50% of Second years have Declared

In [None]:
...

Verify: 80% of Third years have Declared

In [None]:
...

Calculate: Chance of third year, given that they have declared. P(third year | declared)

In [None]:
...

Calculate: P(second year | declared)

In [None]:
...

## Bayes' Rule

### Purpose of Bayes' Rule

* Update your prediction based on new information
* In a multi-stage experiment, find the chance of an event at an earlier stage, given the result of a later stage

### Diagram and Terminology

<img src="img/lec36_diagram_1.png" width=70%>

### Data & Calculation

<img src="img/lec36_diagram_2.png" width=50%>

* Pick a student at random.
* Posterior probability: $$P(\mbox{Third Year} \mid \mbox{Declared}) = \frac{(0.4 \times 0.8)}{(0.4 \times 0.8) + (0.6 \times 0.5)} \approx 0.5161$$

In [None]:
prob_third_year_given_declared = ...
prob_third_year_given_declared

### Bayes' Rule and Covid Testing

<img src="img/lec36_bayes.png" width=70%>

*Image Source: [The obscure maths theorem that governs the reliability of Covid testing - The Guardian](https://www.theguardian.com/world/2021/apr/18/obscure-maths-bayes-theorem-reliability-covid-lateral-flow-tests-probability)*

### Example: Doctors & Clinical Tests

<img src="img/lec36_diagram_3.png" width=50%>

* Problem did not give the true positive rate.
* That's the chance the test says "positive" if the person has the disease.
* It was assumed to be 100%.


### Data and Calculation - Doctor

<img src="img/lec36_diagram_4.png" width=50%>


Posterior probability: $$P(\mbox{Disease} \mid \mbox{Test +}) = \frac{(0.001 \times 1)}{(0.001 \times 1) + (0.999 \times 0.05)} \approx 0.019627$$

In [None]:
prob_disease_given_pos_test = (0.001 * 1) / ((0.001 * 1) + (0.999 * 0.05))
prob_disease_given_pos_test

See this probability through an example population with 10000 individuals.

In [None]:
create_population(1/1000, 10000)

In [None]:
...

## Subjective Probabilities

### Subjective Probabilities

* A probability of an outcome is ...
    * The frequency with which it will occur in repeated trials, or
    * The subjective degree of belief that it will (or has) occurred
* Why use subjective priors?
    * In order to quantify a belief that is relevant to a decision
    * If the subject of your prediction was not selected randomly from the population


### A Subjective Opinion

<img src="img/lec36_diagram_5.png" width=70%>

Calculate the P(disease | tested +) if prior probability of disease is 0.1.

In [None]:
....

See this probability through an example population with 10000 individuals.

In [None]:
create_population(1/10, 10000)

In [None]:
...

### A Different Subjective Opinion

<img src="img/lec36_diagram_6.png" width=70%>

Calculate the P(disease | tested +) if prior probability of disease is 0.5.

In [None]:
...

See this probability through an example population with 10000 individuals.

In [None]:
create_population(0.5, 10000)

In [None]:
...

<footer>
    <hr>
    <p>Adopted from UC Berkeley DATA 8 course materials.</p>
    <p>This content is offered under a <a href="https://creativecommons.org/licenses/by-nc-sa/4.0/">CC Attribution Non-Commercial Share Alike</a> license.</p>
</footer>