# Class 18: Statistical Inference and Hypothesis Testing

CBE 20258. Numerical and Statistical Analysis. Spring 2020.

&#169; University of Notre Dame

## Class 18 Learning Objectives

After studying this notebook, attending class, completing the home activities, and asking questions, you should be able to:
* Define statistical inference to a freshman engineer. Give two science or engineering examples.
* Explain the central limit theorem.
* Use the central limit theorem to calculate probabilities involving the sample mean. Do this with and without standardizing. (Two approaches, same answer.)
* Explain the correct interpretation of a 95% confidence interval.
* Using a picture, explain the 68-95-99.7 to someone that just finished Calculus II.
* Calculate any size confidence interval (95%, 99%, etc.) using z- or t-distribution
 * Explain why the t-distribution is important. Relate this back to the CLT.
 * Check for the assumption required to apply the t-distribution
 * Use a confidence interval to perform hypothesis testing
* Apply the 5-step hypothesis testing procedure
 * Identify null and alternative hypotheses from a problem description
 * Calculate test statistic
 * Draw "area under curve" that corresponds to P-value for a given set of hypotheses


In [1]:
# load libraries
import scipy.stats as stats
import numpy as np
import math
import matplotlib.pyplot as plt

## 18f. Confidence Interval

**Further Reading**: §5.1 in Navidi (2015)

Let's focus on the first question. We want to construct lower and upper bounds using the 25 observations above such that if we repeated our calculation *many many* times (i.e., the manufacture kept sending up batches of 25 samples to test), our process would capture the *true mean* **X%** (i.e., 95%, 99%, etc.) of the time.

**Key Insights**:
* We are assuming the lifetime of a catalyst can be modeled as a random process (probability distribution) with an unknown mean.
* **Frequentist** statistics cares about the **long-term error rate** of estimation methods.
* We are **not making probabilistic statements** about the catalyst lifetime. For example, we are **not saying there in an X%** chance the true mean falls within this interval. Further reading: https://www.graphpad.com/guides/prism/7/statistics/stat_more_about_confidence_interval.htm
* To make probablistic statements, we would need to use **Bayesian** statistics (i.e., credibility interval).

### 18f-i. 68-95-99.7 Rule

![rule](https://cdn-images-1.medium.com/max/1600/1*IZ2II2HYKeoMrdLU5jW6Dw.png)

Image credit: https://towardsdatascience.com/understanding-the-68-95-99-7-rule-for-a-normal-distribution-b7b7cbf760c2

### 18f-ii. Confidence Interval Formula

$$\bar{x} \pm z^* \frac{s}{\sqrt{n}}$$

Elements:
* $\bar{x}$ sample mean
* $z^*$ z-score
* $s$ sample standard deviation
* $n$ sample size

### 18f-iii. Catalyst Life Example

Calculate a 95% confidence interval for the catalyst example.

In [14]:
xbar - 1.96*s/math.sqrt(len(lifetime))

4.998343547921752

In [15]:
xbar + 1.96*s/math.sqrt(len(lifetime))

7.129656452078248

95% Confidence interval for mean catalyst lifetime: 5.00 to 7.13 hours

Alternately, we can use Python or the z-table to determine $z^*$.

In [16]:
## calculate 95% confidence interval
n = len(lifetime)
zstar95 = stats.norm.interval(0.95)
low = xbar + zstar95[0]*s/math.sqrt(n)
high = xbar + zstar95[1]*s/math.sqrt(n)
print("95% confidence interval: [",round(low,2),",", round(high,2),"] hours")

## calculate 90% confidence interval
n = len(lifetime)
zstar90 = stats.norm.interval(0.9)
low = xbar + zstar90[0]*s/math.sqrt(n)
high = xbar + zstar90[1]*s/math.sqrt(n)
print("90% confidence interval: [",round(low,2),",", round(high,2),"] hours")


## calculate 99% confidence interval
n = len(lifetime)
zstar99 = stats.norm.interval(0.99)
low = xbar + zstar99[0]*s/math.sqrt(n)
high = xbar + zstar99[1]*s/math.sqrt(n)
print("99% confidence interval: [",round(low,2),",", round(high,2),"] hours")

95% confidence interval: [ 5.0 , 7.13 ] hours
90% confidence interval: [ 5.17 , 6.96 ] hours
99% confidence interval: [ 4.66 , 7.46 ] hours
