# 04 Confidence Intervals

Quantifying uncertainty: how to construct, interpret, and visualize confidence intervals correctly.

## Table of Contents
- [What is a confidence interval?](#what-is-a-confidence-interval)
- [Constructing a CI for the mean](#constructing-a-ci-for-the-mean)
- [Visualizing the repeated sampling interpretation](#visualizing-the-repeated-sampling-interpretation)
- [Effect of sample size on CI width](#effect-of-sample-size-on-ci-width)
- [Effect of confidence level on CI width](#effect-of-confidence-level-on-ci-width)
- [Confidence intervals for regression coefficients](#confidence-intervals-for-regression-coefficients)
- [Margin of error and practical significance](#margin-of-error-and-practical-significance)
- [Checkpoint (Self-Check)](#checkpoint-self-check)
- [Solutions (Reference)](#solutions-reference)

## Why This Notebook Matters
Confidence intervals appear in every regression table, every policy report, and every
empirical paper. They are arguably more useful than p-values because they convey both
the direction and magnitude of uncertainty. Yet they are among the most commonly
misinterpreted statistics. This notebook builds the correct intuition through simulation.

## Prerequisites (Quick Self-Check)
- Completed notebooks 00-03 (descriptive stats, distributions, CLT, z-scores).
- Understanding of the normal and t-distributions.
- Understanding of the standard error of the mean.

## What You Will Produce
- (no file output; learning/analysis notebook)

## Success Criteria
- You can construct a confidence interval for a mean using both z and t approaches.
- You can correctly state what a 95% CI means (and what it does NOT mean).
- You can explain how sample size and confidence level affect CI width.
- You can interpret CIs for regression coefficients.

## Common Pitfalls
- Saying "there is a 95% probability the true mean is in this interval" (WRONG).
- Using z-critical values when the sample is small (use t instead).
- Ignoring CI width and focusing only on whether it contains zero.
- Confusing statistical precision (narrow CI) with practical importance.

## Quick Fixes (When You Get Stuck)
- For a 95% CI, the t-critical value with large n is approximately 1.96.
- `scipy.stats.t.interval(confidence, df, loc, scale)` computes the interval directly.
- In statsmodels: `res.conf_int(alpha=0.05)` gives 95% CIs for all coefficients.
- If you see `ModuleNotFoundError`, re-run the bootstrap cell.

## Matching Guide
- `docs/guides/00_statistics_primer/04_confidence_intervals.md`

## How To Use This Notebook
- Work section-by-section; don't skip the markdown.
- Most code cells are incomplete on purpose: replace TODOs and `...`, then run.
- After each section, write 2–4 sentences answering the interpretation prompts (what changed, why it matters).
- Prefer `data/processed/*` if you have built the real datasets; otherwise use the bundled `data/sample/*` fallbacks.
- Use the **Checkpoint (Self-Check)** section to catch mistakes early.
- Use **Solutions (Reference)** only to unblock yourself; then re-implement without looking.
- Use the matching guide (`docs/guides/00_statistics_primer/04_confidence_intervals.md`) for the math, assumptions, and deeper context.

<a id="environment-bootstrap"></a>
## Environment Bootstrap
Run this cell first. It makes the repo importable and defines common directories.

In [None]:
from __future__ import annotations

from pathlib import Path
import sys


def find_repo_root(start: Path) -> Path:
    p = start
    for _ in range(8):
        if (p / 'src').exists() and (p / 'docs').exists():
            return p
        p = p.parent
    raise RuntimeError('Could not find repo root. Start Jupyter from the repo root.')


PROJECT_ROOT = find_repo_root(Path.cwd())
if str(PROJECT_ROOT) not in sys.path:
    sys.path.append(str(PROJECT_ROOT))

DATA_DIR = PROJECT_ROOT / 'data'
RAW_DIR = DATA_DIR / 'raw'
PROCESSED_DIR = DATA_DIR / 'processed'
SAMPLE_DIR = DATA_DIR / 'sample'

PROJECT_ROOT

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from scipy import stats
import statsmodels.api as sm

plt.rcParams['figure.figsize'] = (10, 6)
plt.rcParams['figure.dpi'] = 100

## Load Data

We load macroeconomic quarterly data. The GDP growth columns give us a realistic
economic variable to build confidence intervals around.

In [None]:
df = pd.read_csv(SAMPLE_DIR / 'macro_quarterly_sample.csv', index_col=0, parse_dates=True)
print(f'Shape: {df.shape}')
df.head()

<a id="what-is-a-confidence-interval"></a>
## What is a confidence interval?

### Goal
Understand the precise definition of a confidence interval and dispel the most common
misconception about its interpretation.

### Why this matters in economics
Every empirical paper, policy brief, and regression table reports confidence intervals.
Misinterpreting them leads to overconfident policy recommendations and flawed conclusions.
Getting the definition right is not pedantry — it changes how you think about uncertainty.

---

### Definition

A **confidence interval** is a range constructed from sample data using a procedure
that, **across repeated samples**, captures the true population parameter a specified
fraction of the time.

For a 95% CI:
- If you could draw 100 independent samples and build a 95% CI from each,
  approximately 95 of those intervals would contain the true parameter.
- The remaining ~5 would miss it entirely.

### The critical misconception

> **WRONG:** "There is a 95% probability that the true mean is in this interval."

> **RIGHT:** "This interval was constructed by a procedure that captures the true
> parameter 95% of the time across repeated samples."

The parameter is **fixed** (it does not move). The interval is **random** (it changes
from sample to sample). Once you have computed a specific interval, the true parameter
is either in it or not — there is no probability left to assign.

### Analogy

A 95% CI is like a fishing net that catches a specific fish 95% of the time you cast it.
Once you have already cast the net and pulled it in, the fish is either in the net or it
is not. Saying "there is a 95% chance the fish is in this net" after you have already
pulled it in confuses the long-run property of the net with the outcome of a single cast.

### Components of a CI for the mean

$$\text{CI} = \bar{x} \pm t_{\alpha/2,\, n-1} \cdot \frac{s}{\sqrt{n}}$$

where:
- $\bar{x}$ = sample mean (point estimate),
- $t_{\alpha/2,\, n-1}$ = critical value from the t-distribution with $n-1$ degrees of freedom,
- $s$ = sample standard deviation,
- $n$ = sample size,
- $s / \sqrt{n}$ = standard error of the mean.

### Your Turn (1): State the definition in your own words

In the cell below, write 2–3 sentences explaining what a 95% confidence interval
means. Then write 1 sentence explaining what it does **not** mean.

In [None]:
# TODO: Write your interpretation as a comment or print statement.
#
# What a 95% CI means:
# ...
#
# What it does NOT mean:
# ...
...

**Interpretation prompt:** If someone says "there is a 95% chance the true GDP growth
rate is between 1.8% and 3.2%," what is wrong with that statement? How would you
correct it?

<a id="constructing-a-ci-for-the-mean"></a>
## Constructing a CI for the mean

### Goal
Build a confidence interval for the mean of a real economic variable — first by hand
using the formula, then using `scipy.stats.t.interval()`. Compare z-based and t-based
intervals.

### Why this matters in economics
Estimating the average GDP growth rate, unemployment rate, or inflation rate with
uncertainty is a fundamental task. The choice between z and t critical values matters
when sample sizes are small (as they often are in macro quarterly data).

### Your Turn (1): Compute a 95% CI by hand

In [None]:
# Extract GDP growth data (drop NaN values)
gdp_growth = df['gdp_growth_qoq_annualized'].dropna()

# TODO: Compute the sample statistics
n = ...           # sample size
x_bar = ...       # sample mean
s = ...           # sample standard deviation
se = ...          # standard error of the mean = s / sqrt(n)

print(f'n = {n}')
print(f'x_bar = {x_bar:.4f}')
print(f's = {s:.4f}')
print(f'se = {se:.4f}')

# TODO: Get the t-critical value for 95% confidence with n-1 degrees of freedom
alpha = 0.05
t_crit = ...      # Hint: stats.t.ppf(1 - alpha/2, df=n-1)

print(f't_crit = {t_crit:.4f}')

# TODO: Compute the confidence interval
ci_lower = ...    # x_bar - t_crit * se
ci_upper = ...    # x_bar + t_crit * se

print(f'\n95% CI for mean GDP growth (by hand): [{ci_lower:.4f}, {ci_upper:.4f}]')

### Your Turn (2): Verify with scipy

In [None]:
# TODO: Use scipy.stats.t.interval() to compute the same CI.
# Hint: stats.t.interval(confidence=0.95, df=n-1, loc=x_bar, scale=se)
ci_scipy = ...

print(f'95% CI (scipy): [{ci_scipy[0]:.4f}, {ci_scipy[1]:.4f}]')

# TODO: Verify these match your hand-computed values.
# Hint: np.isclose(ci_lower, ci_scipy[0])
...

### Your Turn (3): Compare z-based vs t-based CIs

In [None]:
# The z-based CI uses the normal distribution instead of t.
# For large n, z and t are nearly identical.
# For small n, the t-interval is wider (more conservative).

# TODO: Compute z-critical value for 95% confidence
z_crit = ...      # Hint: stats.norm.ppf(1 - alpha/2)

ci_z_lower = x_bar - z_crit * se
ci_z_upper = x_bar + z_crit * se

print(f'z_crit = {z_crit:.4f}, t_crit = {t_crit:.4f}')
print(f'95% CI (z-based): [{ci_z_lower:.4f}, {ci_z_upper:.4f}]')
print(f'95% CI (t-based): [{ci_lower:.4f}, {ci_upper:.4f}]')
print(f'Width (z): {ci_z_upper - ci_z_lower:.4f}')
print(f'Width (t): {ci_upper - ci_lower:.4f}')

# TODO: For small samples, repeat this comparison with only the first 10 observations.
# How much wider is the t-interval vs the z-interval when n is small?
gdp_small = gdp_growth.iloc[:10]
n_small = ...
x_bar_small = ...
se_small = ...
t_crit_small = ...
z_crit_small = z_crit  # z doesn't depend on n

ci_t_small = (x_bar_small - t_crit_small * se_small, x_bar_small + t_crit_small * se_small)
ci_z_small = (x_bar_small - z_crit_small * se_small, x_bar_small + z_crit_small * se_small)

print(f'\nSmall sample (n={n_small}):')
print(f't_crit = {t_crit_small:.4f} vs z_crit = {z_crit_small:.4f}')
print(f'Width (z): {ci_z_small[1] - ci_z_small[0]:.4f}')
print(f'Width (t): {ci_t_small[1] - ci_t_small[0]:.4f}')

**Interpretation prompt:**
- Why is the t-interval wider than the z-interval for small samples?
- At what sample size does the difference become negligible?
- For macro quarterly data (where n is typically 60–100 quarters), does the choice matter much?

<a id="visualizing-the-repeated-sampling-interpretation"></a>
## Visualizing the repeated sampling interpretation

### Goal
Make the definition of "95% confidence" tangible. Generate 100 samples from a known
population, build a 95% CI from each, and visualize which intervals capture the true
mean and which miss.

### Why this matters in economics
When you report a 95% CI for the effect of a minimum wage increase on employment, you
are relying on a procedure that, if applied to many independent studies, would produce
intervals containing the true effect about 95% of the time. This simulation makes that
abstract property concrete.

### Your Turn (1): Repeated-sampling CI simulation

In [None]:
# Simulation parameters
rng = np.random.default_rng(42)
true_mean = 2.5        # True population mean (e.g., true average GDP growth %)
true_std = 3.0         # True population standard deviation
sample_size = 30       # Each sample draws this many observations
n_simulations = 100    # Number of repeated samples
confidence = 0.95

# TODO: For each simulation, draw a sample, compute the CI, and store results.
results = []
for i in range(n_simulations):
    # Draw a random sample from the population
    sample = ...  # Hint: rng.normal(loc=true_mean, scale=true_std, size=sample_size)

    # Compute sample statistics
    x_bar_sim = ...  # sample mean
    se_sim = ...     # standard error = sample std / sqrt(n)

    # Compute the 95% CI using scipy
    ci_low, ci_high = ...  # Hint: stats.t.interval(...)

    # Does this interval contain the true mean?
    contains_true = ...  # Hint: ci_low <= true_mean <= ci_high

    results.append({
        'sim': i,
        'x_bar': x_bar_sim,
        'ci_low': ci_low,
        'ci_high': ci_high,
        'contains_true': contains_true
    })

ci_df = pd.DataFrame(results)

# How many intervals missed the true mean?
n_miss = (ci_df['contains_true'] == False).sum()
print(f'{n_miss} out of {n_simulations} intervals missed the true mean.')
print(f'Coverage rate: {ci_df["contains_true"].mean():.1%}')

### Your Turn (2): Plot all 100 intervals

In [None]:
# TODO: Create a plot with 100 horizontal lines, one per CI.
# Color intervals that contain the true mean in blue/green,
# and intervals that miss in red.

fig, ax = plt.subplots(figsize=(10, 12))

for _, row in ci_df.iterrows():
    color = ...  # 'steelblue' if contains_true, else 'red'
    ax.hlines(y=row['sim'], xmin=row['ci_low'], xmax=row['ci_high'],
              color=color, linewidth=1.2, alpha=0.7)
    ax.plot(row['x_bar'], row['sim'], 'o', color=color, markersize=2)

# Draw the true mean as a vertical line
ax.axvline(x=true_mean, color='black', linestyle='--', linewidth=1.5, label=f'True mean = {true_mean}')

ax.set_xlabel('GDP Growth (%)')
ax.set_ylabel('Simulation index')
ax.set_title(f'100 Confidence Intervals (95%): {n_miss} miss the true mean (red)')
ax.legend(loc='upper right')
plt.tight_layout()
plt.show()

**Interpretation prompt:**
- Approximately how many red intervals did you expect? How many did you actually get?
- If you re-run the simulation with a different random seed, will you get exactly the same count?
- Does the fact that some intervals miss the true mean make them "wrong"? Why or why not?
- How does this visualization connect to the fishing-net analogy from section 1?

<a id="effect-of-sample-size-on-ci-width"></a>
## Effect of sample size on CI width

### Goal
Demonstrate that confidence interval width shrinks proportionally to $1/\sqrt{n}$.
More data means more precision.

### Why this matters in economics
Data collection is expensive. Understanding how CI width scales with sample size helps
economists plan surveys, decide whether to collect more quarterly observations, and
evaluate the precision of estimates from small vs. large datasets. Quadrupling the
sample size only halves the CI width.

### Your Turn (1): Compute CIs for different sample sizes

In [None]:
# We simulate from a known population to isolate the sample-size effect.
rng = np.random.default_rng(123)
population = rng.normal(loc=2.5, scale=3.0, size=10_000)

sample_sizes = [10, 30, 100, 500]
alpha = 0.05

# TODO: For each sample size, draw a sample, compute a 95% CI, and record the width.
size_results = []
for n_i in sample_sizes:
    sample = rng.choice(population, size=n_i, replace=False)
    x_bar_i = ...
    se_i = ...
    ci_low_i, ci_high_i = ...  # Hint: stats.t.interval(...)
    width_i = ...

    size_results.append({
        'n': n_i,
        'x_bar': x_bar_i,
        'ci_low': ci_low_i,
        'ci_high': ci_high_i,
        'width': width_i
    })

size_df = pd.DataFrame(size_results)
size_df

### Your Turn (2): Plot CI width vs sample size

In [None]:
# TODO: Create a plot showing CI width as a function of sample size.
# Overlay a theoretical curve proportional to 1/sqrt(n) for comparison.

fig, ax = plt.subplots(figsize=(8, 5))

ax.plot(size_df['n'], size_df['width'], 'o-', color='steelblue', linewidth=2,
        markersize=8, label='Observed CI width')

# Theoretical scaling: width ~ c / sqrt(n)
# Use the first point to calibrate the constant c
c = ...  # size_df['width'].iloc[0] * np.sqrt(size_df['n'].iloc[0])
n_range = np.linspace(10, 500, 200)
theoretical_width = c / np.sqrt(n_range)
ax.plot(n_range, theoretical_width, '--', color='gray', alpha=0.7,
        label=r'Theoretical: $c / \sqrt{n}$')

ax.set_xlabel('Sample size (n)')
ax.set_ylabel('95% CI width')
ax.set_title('Confidence interval width shrinks with sample size')
ax.legend()
plt.tight_layout()
plt.show()

**Interpretation prompt:**
- By what factor does the CI width decrease when you go from n=10 to n=100?
  Does this match the $1/\sqrt{n}$ scaling (i.e., roughly $\sqrt{10}$ times narrower)?
- If you are a policy researcher with a budget to survey households, and you want
  to cut your CI width in half, how many more observations do you need?
- Why does the theoretical curve not match perfectly? (Hint: think about what else
  changes with sample size.)

<a id="effect-of-confidence-level-on-ci-width"></a>
## Effect of confidence level on CI width

### Goal
Show the trade-off between confidence level and precision: higher confidence requires
a wider interval.

### Why this matters in economics
In practice, economists must choose a confidence level. A 99% CI is more cautious but
less precise. An 80% CI is narrower but misses more often. The choice reflects the
cost of being wrong: a central bank forecasting inflation may prefer 99% confidence,
while a quick exploratory analysis might use 90%.

### Your Turn (1): Compute CIs at different confidence levels

In [None]:
# Use the GDP growth data
gdp_growth = df['gdp_growth_qoq_annualized'].dropna()
n = len(gdp_growth)
x_bar = gdp_growth.mean()
se = gdp_growth.std(ddof=1) / np.sqrt(n)

conf_levels = [0.80, 0.90, 0.95, 0.99]

# TODO: Compute a CI for each confidence level and store results.
level_results = []
for conf in conf_levels:
    ci_low_c, ci_high_c = ...  # Hint: stats.t.interval(confidence=conf, df=n-1, loc=x_bar, scale=se)
    width_c = ...

    level_results.append({
        'confidence': f'{conf:.0%}',
        'ci_low': ci_low_c,
        'ci_high': ci_high_c,
        'width': width_c
    })

level_df = pd.DataFrame(level_results)
level_df

### Your Turn (2): Visualize nested intervals

In [None]:
# TODO: Plot the nested CIs as horizontal bars, one on top of the other.
# The widest (99%) should be at the top, the narrowest (80%) at the bottom.

colors = ['#2ca02c', '#1f77b4', '#ff7f0e', '#d62728']
fig, ax = plt.subplots(figsize=(10, 4))

for i, row in level_df.iterrows():
    ax.barh(y=row['confidence'], width=row['width'],
            left=row['ci_low'], height=0.5,
            color=colors[i], alpha=0.7, edgecolor='black')
    ax.text(row['ci_high'] + 0.05, i, f"width={row['width']:.3f}",
            va='center', fontsize=10)

ax.axvline(x=x_bar, color='black', linestyle='--', label=f'Sample mean = {x_bar:.3f}')
ax.set_xlabel('GDP Growth (%, annualized)')
ax.set_title('Confidence intervals at different confidence levels')
ax.legend(loc='upper right')
plt.tight_layout()
plt.show()

**Interpretation prompt:**
- How much wider is the 99% CI compared to the 80% CI?
- In what situation would you prefer an 80% CI over a 99% CI?
- Is a 99.99% CI always better? What is the cost of extreme confidence?

<a id="confidence-intervals-for-regression-coefficients"></a>
## Confidence intervals for regression coefficients

### Goal
Bridge from CIs for the mean to CIs for regression coefficients. Fit a simple
regression with `statsmodels`, extract coefficient CIs, and interpret them.

### Why this matters in economics
In applied econometrics, you rarely just estimate a mean. You estimate the effect of
one variable on another while controlling for confounders. The [0.025, 0.975] columns
in `statsmodels` output give you the 95% CI for each coefficient. If zero is inside
the CI for a coefficient, you cannot reject the null of no effect at the 5% level.

### Your Turn (1): Fit a simple regression and extract CIs

In [None]:
# We will regress GDP growth on the unemployment rate.
# This is a simple illustrative regression, not a causal model.

reg_df = df[['gdp_growth_qoq_annualized', 'UNRATE']].dropna()

y = reg_df['gdp_growth_qoq_annualized']
X = sm.add_constant(reg_df[['UNRATE']])

# TODO: Fit the OLS regression
res = ...

# TODO: Print the summary
...

# TODO: Extract the 95% confidence intervals using res.conf_int(alpha=0.05)
ci_table = ...
ci_table.columns = ['CI_lower', 'CI_upper']
print('\n95% Confidence Intervals for Coefficients:')
print(ci_table)

### Your Turn (2): Interpret the regression CI

In [None]:
# TODO: Extract the coefficient and CI for UNRATE.
coef_unrate = ...    # res.params['UNRATE']
ci_low_unrate = ...  # ci_table.loc['UNRATE', 'CI_lower']
ci_high_unrate = ... # ci_table.loc['UNRATE', 'CI_upper']

print(f'Coefficient on UNRATE: {coef_unrate:.4f}')
print(f'95% CI: [{ci_low_unrate:.4f}, {ci_high_unrate:.4f}]')

# TODO: Check whether zero is inside the CI.
zero_in_ci = ...
print(f'\nDoes the CI contain zero? {zero_in_ci}')
if zero_in_ci:
    print('=> Cannot reject H0: no linear association at the 5% level.')
else:
    print('=> Reject H0: evidence of a linear association at the 5% level.')

### Your Turn (3): Visualize the coefficient CI

In [None]:
# TODO: Create a coefficient plot showing each coefficient as a point
# with its CI as a horizontal error bar.

fig, ax = plt.subplots(figsize=(8, 3))

coef_names = res.params.index.tolist()
coef_vals = res.params.values
ci_vals = res.conf_int(alpha=0.05).values

for i, name in enumerate(coef_names):
    ci_err = [[coef_vals[i] - ci_vals[i, 0]], [ci_vals[i, 1] - coef_vals[i]]]
    ax.errorbar(coef_vals[i], i, xerr=ci_err, fmt='o', color='steelblue',
                capsize=5, markersize=8, linewidth=2)

ax.axvline(x=0, color='red', linestyle='--', alpha=0.7, label='Zero (no effect)')
ax.set_yticks(range(len(coef_names)))
ax.set_yticklabels(coef_names)
ax.set_xlabel('Coefficient value')
ax.set_title('Regression coefficients with 95% CIs')
ax.legend()
plt.tight_layout()
plt.show()

**Interpretation prompt:**
- Write one sentence interpreting the CI for the UNRATE coefficient: "The 95% CI for
  the effect of unemployment on GDP growth is [a, b], which means..."
- If zero is in the interval, does that mean the true effect is exactly zero?
- How would your interpretation change if you used HAC standard errors?

<a id="margin-of-error-and-practical-significance"></a>
## Margin of error and practical significance

### Goal
Understand the margin of error, its relationship to CI width, and the distinction
between statistical precision and practical importance.

### Why this matters in economics
A narrow confidence interval around a small effect is a precise result showing that
the effect is near zero — this is informative. A wide confidence interval around a
large point estimate is uninformative because the true effect could be anywhere from
large positive to near zero (or even negative). CI width tells you whether you have
enough data to draw a meaningful conclusion.

### Your Turn (1): Compute the margin of error

In [None]:
# The margin of error (ME) is half the CI width.
# CI = point_estimate +/- ME

gdp_growth = df['gdp_growth_qoq_annualized'].dropna()
n = len(gdp_growth)
x_bar = gdp_growth.mean()
se = gdp_growth.std(ddof=1) / np.sqrt(n)

# TODO: Compute the margin of error for a 95% CI
t_crit_95 = stats.t.ppf(0.975, df=n-1)
margin_of_error = ...  # t_crit_95 * se

print(f'Point estimate (mean GDP growth): {x_bar:.4f}')
print(f'Margin of error (95%): {margin_of_error:.4f}')
print(f'CI width: {2 * margin_of_error:.4f}')
print(f'CI: [{x_bar - margin_of_error:.4f}, {x_bar + margin_of_error:.4f}]')

### Your Turn (2): Interpret different scenarios

Below are four hypothetical results from economic studies. For each, interpret
whether the result is informative and what it tells us about practical significance.

In [None]:
# Four hypothetical study results:
# Each tuple: (description, point_estimate, ci_lower, ci_upper)
scenarios = [
    ('Effect of job training on wages ($/hr)',    0.50, 0.10, 0.90),
    ('Effect of tax cut on GDP growth (%)',       1.20, -0.80, 3.20),
    ('Effect of class size on test scores (pts)', 0.02, -0.01, 0.05),
    ('Effect of minimum wage on employment (%)',  -0.30, -0.50, -0.10),
]

print('Scenario Analysis: Margin of Error and Practical Significance')
print('=' * 70)

for desc, pe, ci_lo, ci_hi in scenarios:
    me = (ci_hi - ci_lo) / 2
    contains_zero = ci_lo <= 0 <= ci_hi
    print(f'\n{desc}')
    print(f'  Point estimate: {pe:+.2f}')
    print(f'  95% CI: [{ci_lo:+.2f}, {ci_hi:+.2f}]')
    print(f'  Margin of error: {me:.2f}')
    print(f'  Contains zero: {contains_zero}')

# TODO: For each scenario, write 1-2 sentences interpreting the result.
# Consider:
#   - Is the CI narrow or wide relative to the point estimate?
#   - Is the result statistically significant (CI excludes zero)?
#   - Is the result practically significant (effect large enough to matter)?
#   - Is the result informative (CI narrow enough to draw a conclusion)?
...

# Scenario 1 interpretation: ...
# Scenario 2 interpretation: ...
# Scenario 3 interpretation: ...
# Scenario 4 interpretation: ...

**Interpretation prompt:**
- Which scenario is a "precise null result" (narrow CI around a small effect)?
- Which scenario is uninformative (wide CI that does not rule out important alternatives)?
- Why is "statistically significant but practically unimportant" a real problem in economics?
- Why is a wide CI not the same thing as "no effect"?

---

## Where This Shows Up Later

Confidence intervals are not just a statistics exercise — they are central to every
subsequent module in this project:

- **Regression (02_regression):** Every regression summary in `statsmodels` reports
  the `[0.025, 0.975]` columns — those are 95% CIs for each coefficient. You will
  interpret them to assess whether predictors have meaningful effects.

- **Causal inference (06_causal):** Difference-in-differences, instrumental variables,
  and panel methods all produce treatment effect estimates with CIs. The CI tells you
  not just "is the effect nonzero?" but "how large could the effect plausibly be?"

- **Time series (07_time_series_econ):** Impulse response functions are plotted with
  confidence bands. A response whose CI includes zero at all horizons is not statistically
  significant.

- **Robust inference:** When you switch from OLS standard errors to HC3 or HAC standard
  errors, the coefficients stay the same but the CIs change. Understanding why is key
  to correct inference.

<a id="checkpoint-self-check"></a>
## Checkpoint (Self-Check)

Run the assertions below to verify your understanding. Then write 2–3 sentences
summarizing the key takeaways.

In [None]:
# Quick sanity checks
# Uncomment and fill in after completing the sections above.

# 1. The CI from scipy should match the hand-computed CI.
# assert np.isclose(ci_lower, ci_scipy[0], atol=1e-6)
# assert np.isclose(ci_upper, ci_scipy[1], atol=1e-6)

# 2. The coverage rate from the simulation should be near 0.95.
# assert 0.85 <= ci_df['contains_true'].mean() <= 1.0

# 3. CI width should shrink as sample size grows.
# assert size_df['width'].is_monotonic_decreasing

# 4. CI width should grow as confidence level increases.
# assert level_df['width'].is_monotonic_increasing

# TODO: Write 2-3 summary sentences:
# - What does a 95% CI mean?
# - What is the most common misinterpretation?
# - How do sample size and confidence level affect CI width?
...

## Extensions (Optional)

- **Bootstrap confidence intervals:** Instead of assuming normality, resample the data
  with replacement 10,000 times and take the 2.5th and 97.5th percentiles of the
  bootstrapped means. Compare to the t-based CI.
- **Asymmetric CIs:** For skewed data (e.g., income), bootstrap CIs may be asymmetric
  around the point estimate. Try this with the RSAFS (retail sales) column.
- **Prediction intervals vs. confidence intervals:** A CI covers the mean; a prediction
  interval covers a single new observation. Compute both and compare widths.
- **CI for a proportion:** If you have a binary variable (e.g., recession indicator),
  compute a CI for the proportion using the Wilson or Clopper-Pearson method.

## Reflection

- What assumptions are you making when constructing a t-based CI?
  (Hint: independence, approximate normality of the sampling distribution.)
- How would autocorrelation in quarterly macro data violate those assumptions?
- If you had to communicate uncertainty about GDP growth to a policymaker who has
  never taken a statistics course, how would you explain a 95% CI?

<a id="solutions-reference"></a>
## Solutions (Reference)

Try the TODOs first. Use these only to unblock yourself or to compare approaches.

<details><summary>Solution: Constructing a CI for the mean</summary>

_One possible approach. Your variable names may differ; align them with the notebook._

```python
# Reference solution for 04_confidence_intervals — Constructing a CI for the mean
import numpy as np
import pandas as pd
from scipy import stats

gdp_growth = df['gdp_growth_qoq_annualized'].dropna()

n = len(gdp_growth)
x_bar = gdp_growth.mean()
s = gdp_growth.std(ddof=1)
se = s / np.sqrt(n)

alpha = 0.05
t_crit = stats.t.ppf(1 - alpha / 2, df=n - 1)

ci_lower = x_bar - t_crit * se
ci_upper = x_bar + t_crit * se
print(f'95% CI (by hand): [{ci_lower:.4f}, {ci_upper:.4f}]')

# Verify with scipy
ci_scipy = stats.t.interval(confidence=0.95, df=n - 1, loc=x_bar, scale=se)
print(f'95% CI (scipy):   [{ci_scipy[0]:.4f}, {ci_scipy[1]:.4f}]')

assert np.isclose(ci_lower, ci_scipy[0], atol=1e-10)
assert np.isclose(ci_upper, ci_scipy[1], atol=1e-10)

# z-based comparison
z_crit = stats.norm.ppf(1 - alpha / 2)
ci_z_lower = x_bar - z_crit * se
ci_z_upper = x_bar + z_crit * se
print(f'\n95% CI (z-based): [{ci_z_lower:.4f}, {ci_z_upper:.4f}]')
print(f'z_crit = {z_crit:.4f}, t_crit = {t_crit:.4f}')

# Small sample comparison
gdp_small = gdp_growth.iloc[:10]
n_small = len(gdp_small)
x_bar_small = gdp_small.mean()
se_small = gdp_small.std(ddof=1) / np.sqrt(n_small)
t_crit_small = stats.t.ppf(0.975, df=n_small - 1)

print(f'\nSmall sample (n={n_small}):')
print(f't_crit = {t_crit_small:.4f} vs z_crit = {z_crit:.4f}')
print(f'Width (t): {2 * t_crit_small * se_small:.4f}')
print(f'Width (z): {2 * z_crit * se_small:.4f}')
```

</details>

<details><summary>Solution: Visualizing the repeated sampling interpretation</summary>

_One possible approach. Your variable names may differ; align them with the notebook._

```python
# Reference solution for 04_confidence_intervals — Repeated sampling simulation
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from scipy import stats

rng = np.random.default_rng(42)
true_mean = 2.5
true_std = 3.0
sample_size = 30
n_simulations = 100

results = []
for i in range(n_simulations):
    sample = rng.normal(loc=true_mean, scale=true_std, size=sample_size)
    x_bar_sim = sample.mean()
    se_sim = sample.std(ddof=1) / np.sqrt(sample_size)
    ci_low, ci_high = stats.t.interval(confidence=0.95, df=sample_size - 1,
                                        loc=x_bar_sim, scale=se_sim)
    contains_true = ci_low <= true_mean <= ci_high
    results.append({
        'sim': i, 'x_bar': x_bar_sim,
        'ci_low': ci_low, 'ci_high': ci_high,
        'contains_true': contains_true
    })

ci_df = pd.DataFrame(results)
n_miss = (~ci_df['contains_true']).sum()
print(f'{n_miss} out of {n_simulations} intervals missed the true mean.')

fig, ax = plt.subplots(figsize=(10, 12))
for _, row in ci_df.iterrows():
    color = 'steelblue' if row['contains_true'] else 'red'
    ax.hlines(y=row['sim'], xmin=row['ci_low'], xmax=row['ci_high'],
              color=color, linewidth=1.2, alpha=0.7)
    ax.plot(row['x_bar'], row['sim'], 'o', color=color, markersize=2)

ax.axvline(x=true_mean, color='black', linestyle='--', linewidth=1.5,
           label=f'True mean = {true_mean}')
ax.set_xlabel('GDP Growth (%)')
ax.set_ylabel('Simulation index')
ax.set_title(f'100 CIs (95%): {n_miss} miss the true mean (red)')
ax.legend(loc='upper right')
plt.tight_layout()
plt.show()
```

</details>

<details><summary>Solution: Effect of sample size on CI width</summary>

_One possible approach. Your variable names may differ; align them with the notebook._

```python
# Reference solution for 04_confidence_intervals — Sample size effect
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from scipy import stats

rng = np.random.default_rng(123)
population = rng.normal(loc=2.5, scale=3.0, size=10_000)

sample_sizes = [10, 30, 100, 500]
size_results = []
for n_i in sample_sizes:
    sample = rng.choice(population, size=n_i, replace=False)
    x_bar_i = sample.mean()
    se_i = sample.std(ddof=1) / np.sqrt(n_i)
    ci_low_i, ci_high_i = stats.t.interval(confidence=0.95, df=n_i - 1,
                                            loc=x_bar_i, scale=se_i)
    width_i = ci_high_i - ci_low_i
    size_results.append({'n': n_i, 'x_bar': x_bar_i,
                         'ci_low': ci_low_i, 'ci_high': ci_high_i,
                         'width': width_i})

size_df = pd.DataFrame(size_results)
print(size_df)

fig, ax = plt.subplots(figsize=(8, 5))
ax.plot(size_df['n'], size_df['width'], 'o-', color='steelblue',
        linewidth=2, markersize=8, label='Observed CI width')

c = size_df['width'].iloc[0] * np.sqrt(size_df['n'].iloc[0])
n_range = np.linspace(10, 500, 200)
ax.plot(n_range, c / np.sqrt(n_range), '--', color='gray', alpha=0.7,
        label=r'Theoretical: $c / \sqrt{n}$')

ax.set_xlabel('Sample size (n)')
ax.set_ylabel('95% CI width')
ax.set_title('CI width shrinks with sample size')
ax.legend()
plt.tight_layout()
plt.show()
```

</details>

<details><summary>Solution: Effect of confidence level on CI width</summary>

_One possible approach. Your variable names may differ; align them with the notebook._

```python
# Reference solution for 04_confidence_intervals — Confidence level effect
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from scipy import stats

gdp_growth = df['gdp_growth_qoq_annualized'].dropna()
n = len(gdp_growth)
x_bar = gdp_growth.mean()
se = gdp_growth.std(ddof=1) / np.sqrt(n)

conf_levels = [0.80, 0.90, 0.95, 0.99]
level_results = []
for conf in conf_levels:
    ci_low_c, ci_high_c = stats.t.interval(confidence=conf, df=n - 1,
                                            loc=x_bar, scale=se)
    width_c = ci_high_c - ci_low_c
    level_results.append({'confidence': f'{conf:.0%}',
                          'ci_low': ci_low_c, 'ci_high': ci_high_c,
                          'width': width_c})

level_df = pd.DataFrame(level_results)
print(level_df)

colors = ['#2ca02c', '#1f77b4', '#ff7f0e', '#d62728']
fig, ax = plt.subplots(figsize=(10, 4))
for i, row in level_df.iterrows():
    ax.barh(y=row['confidence'], width=row['width'],
            left=row['ci_low'], height=0.5,
            color=colors[i], alpha=0.7, edgecolor='black')
    ax.text(row['ci_high'] + 0.05, i, f"width={row['width']:.3f}",
            va='center', fontsize=10)

ax.axvline(x=x_bar, color='black', linestyle='--',
           label=f'Sample mean = {x_bar:.3f}')
ax.set_xlabel('GDP Growth (%, annualized)')
ax.set_title('CIs at different confidence levels')
ax.legend(loc='upper right')
plt.tight_layout()
plt.show()
```

</details>

<details><summary>Solution: Confidence intervals for regression coefficients</summary>

_One possible approach. Your variable names may differ; align them with the notebook._

```python
# Reference solution for 04_confidence_intervals — Regression CIs
import statsmodels.api as sm

reg_df = df[['gdp_growth_qoq_annualized', 'UNRATE']].dropna()
y = reg_df['gdp_growth_qoq_annualized']
X = sm.add_constant(reg_df[['UNRATE']])

res = sm.OLS(y, X).fit()
print(res.summary())

ci_table = res.conf_int(alpha=0.05)
ci_table.columns = ['CI_lower', 'CI_upper']
print('\n95% Confidence Intervals:')
print(ci_table)

coef_unrate = res.params['UNRATE']
ci_low_unrate = ci_table.loc['UNRATE', 'CI_lower']
ci_high_unrate = ci_table.loc['UNRATE', 'CI_upper']

zero_in_ci = ci_low_unrate <= 0 <= ci_high_unrate
print(f'\nCoefficient on UNRATE: {coef_unrate:.4f}')
print(f'95% CI: [{ci_low_unrate:.4f}, {ci_high_unrate:.4f}]')
print(f'Contains zero: {zero_in_ci}')
```

</details>

<details><summary>Solution: Margin of error and practical significance</summary>

_One possible approach. Your variable names may differ; align them with the notebook._

```python
# Reference solution for 04_confidence_intervals — Margin of error
from scipy import stats

gdp_growth = df['gdp_growth_qoq_annualized'].dropna()
n = len(gdp_growth)
x_bar = gdp_growth.mean()
se = gdp_growth.std(ddof=1) / np.sqrt(n)

t_crit_95 = stats.t.ppf(0.975, df=n - 1)
margin_of_error = t_crit_95 * se

print(f'Point estimate: {x_bar:.4f}')
print(f'Margin of error (95%): {margin_of_error:.4f}')
print(f'CI: [{x_bar - margin_of_error:.4f}, {x_bar + margin_of_error:.4f}]')

# Scenario interpretations:
# 1. Job training: narrow CI excluding zero => statistically significant,
#    effect ($0.10 to $0.90/hr) is modest but real.
# 2. Tax cut: wide CI crossing zero => NOT statistically significant;
#    too imprecise to draw conclusions.
# 3. Class size: very narrow CI near zero => precise null result;
#    effect is real but practically negligible.
# 4. Minimum wage: narrow CI excluding zero, all negative =>
#    statistically and practically significant negative effect.
```

</details>