<div style="width: 38.5%;">
    <p><strong>City College of San Francisco</strong><p>
    <hr>
    <p>MATH 108 - Foundations of Data Science</p>
</div>

# Lecture 27: Interpreting Confidence

Associated Textbook Sections: [13.3, 13.4](https://inferentialthinking.com/chapters/13/3/Confidence_Intervals.html)

## Outline

* [Visualizing Confidence](#Visualizing-Confidence)
* [Use Methods Appropriately](#Use-Methods-Appropriately)
* [Confidence Interval for Unknown Population Mean](#Confidence-Interval-for-Unknown-Population-Mean)
* [Confidence Intervals For Testing](#Confidence-Intervals-For-Testing)

## Set Up the Notebook

In [None]:
from datascience import *
import numpy as np
%matplotlib inline
import matplotlib.pyplot as plt
plt.style.use('fivethirtyeight')

---

## Visualizing Confidence

> The confidence is in the process that gives the interval: It generates a "good" interval about 95% of the time.

<img src="./img/visualizing_confidence.png" width=30%>

* Each yellow line in the visual represents a confidence interval from a fresh sample from the population.
* The red line marks the parameter being estimated by the intervals.
* Approximately 95% of the yellow lines intersect the red line.

A similar tool to help visualize the meaning of a confidence interval: [Interpreting Confidence Intervals](https://rpsychologist.com/d3/ci/)

## Use Methods Appropriately

### When **Not** to Use Our Bootstrap Method

* If you're trying to estimate any parameter that's greatly affected by rare elements of the population
very high or very low percentiles, or min and max
* If the probability distribution of your statistic is not roughly bell-shaped (the shape of the empirical distribution will be a clue)
* If the original sample is very small

---

## Confidence Interval for Unknown Population Mean

Load the `baby.csv` data from the Kaiser supported study on the relationship between smoking during pregnancy and low weight births.

In [None]:
births = Table.read_table('./data/baby.csv')
births

Visualize the distribution of maternal ages.

In [None]:
births.hist('Maternal Age')

Compute the mean age of mothers in the sample.

In [None]:
np.mean(births.column('Maternal Age'))

**Question**: What is the mean age of the mothers in the population?

Define a function to create one bootstrap resample and calculate the mean age for that resample.

In [None]:
def one_bootstrap_mean():
    ...
    return ...

Generate means from 3000 bootstrap samples

In [None]:
...

Get the endpoints of the 95% confidence interval

In [None]:
left = ...
right = ...

make_array(left, right)

Visualize the distribution of the bootstrap sample means.

In [None]:
resampled_means = Table().with_columns(
    'Bootstrap Sample Mean', ...
)
resampled_means.hist(bins=15)
plt.plot([left, right], [0, 0], color='yellow', lw=8);
print(f"We are 95% confident that the mean age of the mothers in the population is between {left} and {right} years old.")

### Can You Use a CI Like This?

By our calculation, an approximate 95% confidence interval for the average age of the mothers in the population is (26.9, 27.6) years.

True or False:
About 95% of the mothers in the population were between 26.9 years and 27.6 years old.

Answer: _False. We're estimating that their average age is in this interval._

In [None]:
births.hist('Maternal Age')
plt.plot([left, right], [0, 0], color='yellow', lw=8);

### Is This What a CI Means?

An approximate 95% confidence interval for the average age of the mothers in the population is (26.9, 27.6) years.

True or False:
There is a 0.95 probability that the average age of mothers in the population is in the range 26.9 to 27.6 years.

Answer: _False. The parameter is fixed, and the interval (26.9, 27.2) is fixed. The parameter is either in that interval, or not. Once you've picked an interval, there's no probability involved._

### 95% Confidence

* Interval of estimates of a parameter
* Based on random sampling
* The process results in a random interval
* A "good" interval is one that contains the parameter
* The confidence is in the process that creates the interval: It generates a "good" interval (approximately) 95% of the time.


---

## Confidence Intervals For Testing

### Using a CI for Testing

* Null hypothesis: $\text{Population mean} = x$
* Alternative hypothesis: $\text{Population mean} \neq x$
* Cutoff for p-value: p%
* Method:
    * Construct a (100-p)% confidence interval for the population mean
    * Make a decision:
        * If x is not in the interval, reject the null
        * If x is in the interval, can't reject the null


### Using the Confidence Interval for Testing Hypotheses

**Null:** The mean age of mothers in the population is 25 years; the random sample average is different due to chance.

**Alternative:** The mean age of the mothers in the population is not 25 years.

Suppose you use the 5% cutoff for the p-value.

Based on the confidence interval, which hypothesis would you pick?

Answer: _Since 25 is not in our constructed 95% confidence interval estimate for the mean age, then we reject the null hypothesis._

---

<footer>
    <p>Adopted from UC Berkeley DATA 8 course materials.</p>
    <p>This content is offered under a <a href="https://creativecommons.org/licenses/by-nc-sa/4.0/">CC Attribution Non-Commercial Share Alike</a> license.</p>
</footer>