# Hypothesis Testing Review and Type I/Type II Errors

CBE 20258. Numerical and Statistical Analysis. Spring 2020.

&#169; Alexander Dowling, University of Notre Dame

In [1]:
# load libraries
import scipy.stats as stats
import numpy as np
import math
import matplotlib.pyplot as plt

## Learning Objectives

After studying this notebook, completing the activities, participating in class, and reviewing your notes, should be able to:
* Formulate null and alternative hypotheses from a problem description
* Draw conclusions by interpreting a calculated P-value
* Explain Type I and Type II errors in the context of an application

## Review Hypothesis Testing Conclusions

**Further Reading:** §6.2 in Navidi (2015)

<div style="background-color: rgba(0,255,0,0.05) ; padding: 10px; border: 1px solid darkgreen;"> 
<b>Home Activity</b>: At a minimum, skim §6.2 (pg. 409 - 414) in Navidi (2015).
</div>

### Test your Understading

We want to check the calibration of a scale by weighing a standard 10g weight 100 times. Let $\mu$ be the population mean reading on the scale, so that the scale is in calibration if $\mu = 10$. A test is made of the hypotheses $H_0: \mu = 10$ versus $H_a: \mu \neq 10$.

<div style="background-color: rgba(0,255,0,0.05) ; padding: 10px; border: 1px solid darkgreen;"> 
<b>Home Activity</b>: Answer the following multiple choice questions.
</div>

Which of the following is the best interpretation of the conclusion "H$_0$ is rejected"?
1. The scale is in calibration.
2. The scale is not in calibration.
3. The scale might be in calibration.

Store your answer in the Python integer `ans_18a_i`.


In [2]:
### BEGIN SOLUTION
ans_18a_i = 2
### END SOLUTION

In [3]:
### BEGIN HIDDEN TESTS
secret_answer = 2

assert ans_18a_i == secret_answer, "What has the null hypothesis? What does it mean to reject H0?"

### END HIDDEN TESTS

Which of the following is the best interpretation of the conclusion "Failed to reject H$_0$"?
1. The scale is in calibration.
2. The scale is not in calibration.
3. The scale might be in calibration.

Store your answer in the Python integer `ans_18a_ii`.

In [4]:
### BEGIN SOLUTION
ans_18a_ii = 3
### END SOLUTION

In [5]:
### BEGIN HIDDEN TESTS
secret_answer = 3

assert ans_18a_ii == secret_answer, "What has the null hypothesis? What does it mean to reject H0?"

### END HIDDEN TESTS

Is it possible to perform a hypothesis test in a way to demonstrate conclusively the scale is in calibration?
1. Yes
2. No
3. Sometimes yes, sometimes no

Store your answer in the Python integer `ans_18a_iii`.

In [6]:
### BEGIN SOLUTION
ans_18a_iii = 2
### END SOLUTION

In [7]:
### BEGIN HIDDEN TESTS
secret_answer = 2

assert ans_18a_iii == secret_answer, "Is it ever possible to absolutely know the scale is calibrated?"
### END HIDDEN TESTS

Write a sentence to explain your answer to the last question.

**Home Activity Answer:**

## Type I and Type II Errors for Statistical Inference

**Further Reading**: §6.12 and §6.13 in Navidi (2015)

A few classes ago, we said the significance level $\alpha$ is often chosen at 0.05. But, this choice impacts the rate of wrong conclusions (errors). We will dive into this more today.

### Example: Law and Order

Consider a criminal trial in the American justice system. For simplicity, we'll assume a defendant is either innocent or guilty. Likewise, the jury can either convict or acquit. Let's express these options using the language of hypothesis testing:

![](https://drive.google.com/uc?export=view&id=1vZyz1Dxf3SEJhxKvQGjuyP3l3Lgwg9_l)

Here is another table (same information, different formatting): https://en.wikipedia.org/wiki/Type_I_and_type_II_errors#Table_of_error_types

As we can see, there are **four outcomes**:

1. Correct Inference / True Negative / Probability $1 - \alpha$
2. Correct Inference / True Positive / Probability $1 - \beta$
3. **Type I Error** / False Positive / Probability $\alpha$
4. **Type II Error** / False Negative / Probability $\beta$

As we can see, the false positive error rate is $\alpha$. Thus changing the significance level $\alpha$ gives us direct control of how frequently we make a **type I error**. We will see late how to compute $\beta$, the **type II error** rate.

### Type I Errors

Summary: **Null hypothesis is true, but we reject it.**

Other names: "asserting something that is absent", "false hit", "False Positive"

Examples:
* Concluding a new drug is more effective than a placebo when it is not.
* Concluding a manufacturing process is out of calibration when it is not.
* Peter crying wolf when there is no wolf.

The choice of the **significant level** $\alpha$ directly controls the Type I error rate.

### Type II Errors

Summary: **Null hypothesis is false, but we erroneously fail to reject**

Other names: "failing to assert what is present", "miss", "False Negative"

Examples:
* Failing to conclude a new drug is more effective than a placebo when it actually is.
* Failing to detect the ozone hole when it is there. *good side tangent*
* The villagers ignoring Peter when the wolf is present.

The Type II error rate, denoted $\beta$, is related to the **power** of a statistical test ($1 - \beta$).

<div style="background-color: rgba(0,0,255,0.05) ; padding: 10px; border: 1px solid darkblue;"> 
<b>Class Activity</b>: With a partner, think of a science or engineering example of hypothesis testing.
</div>