<div style="width: 38.5%;">
    <p><strong>City College of San Francisco</strong><p>
    <hr>
    <p>MATH 108 - Foundations of Data Science</p>
</div>

# Lecture 36: Updating Predictions

Associated Textbook Sections: [18.0 - 18.2](https://inferentialthinking.com/chapters/18/Updating_Predictions.html)

## Outline

* [Decisions](#Decisions)
* [Conditional Probability](#Conditional-Probability)
* [Tree Diagrams](#Tree-Diagrams)
* [Bayes' Rule](#Bayes'-Rule)
* [Subjective Probabilities](#Subjective-Probabilities)

## Set Up the Notebook

In [None]:
from datascience import *
import numpy as np
import matplotlib
%matplotlib inline
import matplotlib.pyplot as plots
plots.style.use('fivethirtyeight')

## Decisions

### Decisions Under Uncertainty

[*Interpretation by Physicians of Clinical Laboratory Results (1978)*](https://www.nejm.org/doi/full/10.1056/nejm197811022991808)

> We asked 20 house officers, 20 fourth-year medical students and 20 attending physicians, selected in 67 consecutive hallway encounters at four Harvard Medical School teaching hospitals, the following question:
>
> _If a test to detect a disease whose prevalence is 1/1000 has a false positive rate of 5%, what is the chance that a person found to have a positive result actually has the disease, assuming that you know nothing about the person's symptoms or signs?_
>
> Eleven of 60 participants, or 18%, gave the correct answer. These participants included four of 20 fourth-year students, three of 20 residents in internal medicine and four of 20 attending physicians. The most common answer, given by 27, was that \[the chance that a person found to have a positive result actually has the disease\] was 95%.

### Medical Testing Scenario

* Rare disease with prevalence of 1/1000 in population
* There is a test (e.g., antigen test) with the following properties
    * False Positive Rate of 5%: If you do NOT have the disease then 5% of the time, the test says you do.
    * False Negative Rate of 1%: If you DO have the disease then 1% of the time, the test says you do not have the disease.
* If you sample a person at random and they test positive, what is the chance they have the rare disease?

### Truth and Test Results

All patients fall into one of 4 categories:

<img src="./img/lec36_truth_and_test_results.png" alt="Table showing the 4 possible outcomes for the patient." width=70%>

### False Positive Rate

<img src="./img/lec36_false_positive_rate.png" alt="Same table focusing on the first row as if they test indicated positive."  width=70%>

### False Negative Rate

<img src="./img/lec36_false_negative_rate.png" alt="Same table focusing on the first row as if they test indicated negative."  width=70%>

### Another Scenario

* Class consists of Freshmen (60%) and Sophomores (40%)
* Some of the students have declared their major
    * 50% of the Freshmen years have declared their major
    * 80% of the Sophomores years have declared their major
* I pick one student at random ... 
* That student has declared a major!
* Which is more likely: Freshman or Sophomore?

### What do these scenarios have in common?

* There is some chance event that I am interested in 
    * person has a disease
    * the student's year
* I start with some prior (before observing anything) information about that quantity P(Disease) or P(Year)
* I then observe something whose value depends probabilistically on the original chance event Test is Positive, student has declared Neither exactly determines the original event
* How do I update the probability of the original event given the additional information?

## Conditional Probability

### Conditional Probability

Probability of an event given some information (it is conditioned on the information)
Example: 
* “80% of third years are Declared” 
* P(Declared | Sophomore) = 0.8 <--- Notation



### Conditional vs Joint Probabilities 

* Recall the joint probability of two events: 
    * P(Declared, Sophomore) = chance of a random student being a declared and a Sophomore
* Conditional probability (the stuff after | is given):
    * P(Declared | Sophomore) = chance of a random Sophomore student being declared 
* Which one is bigger?

Answer: the conditional, will see why in a moment.

### An Example

In [None]:
from IPython.display import IFrame
IFrame('https://docs.google.com/presentation/d/e/2PACX-1vRiLsFDsuuT\
_fGEkjNJJ5Yv6MdEkWshYniIDyrzR4F4vN7UkAUgwT-MrhUTy8_gxwyhLv3rTleNScXw\
/embed?start=false&loop=false&delayms=3000', 960, 569)

## Tree Diagrams

### Tree Diagrams

In [None]:
from IPython.display import IFrame
IFrame('https://docs.google.com/presentation/d/e/2PACX-1vTYqt2\
-0qckaBNAHfug29S4o0IV-tCrPkOp3a01wWsx65iyAmpFX3gI9ROkaZ21Syf77\
xyiIIDrGAgS/embed?start=false&loop=false&delayms=3000', 960, 569)

## Bayes' Rule

### Bayes' Rule

In [None]:
from IPython.display import IFrame
IFrame('https://docs.google.com/presentation/d/e/2PACX-1vSTI_AHfonqA-\
ww_uTioJOpF_sy8PHvEkaZ1B0ahy-KdKXygejBtQeQpIACZ0xNLnEYCfTbfkSC3Klw/\
embed?start=false&loop=false&delayms=3000', 960, 569)

### A Closer Look at the Answer

Assume a patient is picked at random.
* Prior probability of disease 
    * P(Disease) = 0.001 = one-tenth of 1%
* Posterior probability of disease given positive test
    * P(Disease | Test positive) = 0.0194... ≅ 2%
* Bigger than the prior, but still pretty small
* Should we approve such a test?
    * The test has low error rates compared to most tests
* How can this be?


### Assumptions Matter

* "Assume a patient is picked at random."
    * But usually, people aren’t picked at random for medical tests
    * So our intuition about randomly picked patients may not be great
* For a randomly picked patient, the result does make sense, because the disease is very rare.
* What if the doctor believes there is a 10% chance the patient has the disease?

### Bayes' Rule and Covid Testing

<img src="img/lec36_bayes.png" width=50%>

*Image Source: [The obscure maths theorem that governs the reliability of Covid testing - The Guardian](https://www.theguardian.com/world/2021/apr/18/obscure-maths-bayes-theorem-reliability-covid-lateral-flow-tests-probability)*

### Demo: Bayes' Rule

Create a function that calculates $P(A \mid B) = \frac{P(A) \cdot P(B\mid A)}{P(B)}$

In [None]:
def bayes_rule(pr_a, pr_b_given_a, pr_b_given_not_a):
    """
    Bayes' Rule
    P(A | B) = P(A)P(B|A) / P(B)
    
    To Compute P(B)
        P(B) = P(B, A) + P(B, Not A) 
             = P(A)P(B|A) + P(Not A)P(B | Not A)
    """
    prb_b = ...
    return ...

Use `bayes_rule` to calculate the probability for the original medical question.

In [None]:
pr_disease = ...
pr_pos_given_disease = ...
pr_pos_given_no_disease = ...

bayes_rule(...)

How does the conditional probability change when the prior is larger?

In [None]:
pr_disease_update = ...
pr_pos_given_disease = ...
pr_pos_given_no_disease = ...

bayes_rule(...)

Notice how quickly the Posterior probability climbs as the Prior probability increases.

In [None]:
pr_disease = np.arange(1,999)/1000
pr_pos_given_disease = 0.99
pr_pos_given_no_disease = 0.05

post = bayes_rule(pr_disease, pr_pos_given_disease, pr_pos_given_no_disease)
Table().with_columns(
    "Prior Pr(Disease)", pr_disease, 
    "Posterior Pr(Disease | Pos. Test)", post).iplot("Prior Pr(Disease)")

## Subjective Probabilities

### Subjective Probabilities

* A probability of an outcome can be thought of as:
    * A Perspective: The frequency with which it will occur in repeated trials
    * Another Perspective: The **subjective** degree of belief that it will (or has) occurred
* Why use **subjective** priors?
    * In order to quantify a belief that is relevant to a decision
    * If the subject of your prediction was not selected randomly from the population


<footer>
    <hr>
    <p>Adopted from UC Berkeley DATA 8 course materials.</p>
    <p>This content is offered under a <a href="https://creativecommons.org/licenses/by-nc-sa/4.0/">CC Attribution Non-Commercial Share Alike</a> license.</p>
</footer>