# 08 Bayesian vs Frequentist Thinking

Two philosophies of probability and inference: understanding the debate and when each perspective is useful.

## Table of Contents
- [Two ways to think about probability](#two-ways-to-think-about-probability)
- [Bayes' theorem](#bayes-theorem)
- [Prior, likelihood, and posterior](#prior-likelihood-and-posterior)
- [Bayesian vs frequentist intervals](#bayesian-vs-frequentist-intervals)
- [When does the prior matter?](#when-does-the-prior-matter)
- [Practical implications for this project](#practical-implications-for-this-project)
- [Checkpoint (Self-Check)](#checkpoint-self-check)
- [Solutions (Reference)](#solutions-reference)

## Why This Notebook Matters
The frequentist vs Bayesian distinction is one of the deepest in statistics. This project
primarily uses frequentist tools (p-values, confidence intervals, OLS), but understanding
the Bayesian perspective helps you: (1) interpret frequentist results more carefully,
(2) understand why regularization works (it's Bayesian!), and (3) think more clearly
about uncertainty. This is a conceptual notebook — lighter on code, heavier on ideas.

## Prerequisites (Quick Self-Check)
- Completed notebooks 00–07 (the full primer sequence).
- Solid understanding of hypothesis testing and confidence intervals.

## What You Will Produce
- (no file output; learning/analysis notebook)

## Success Criteria
- You can explain the difference between frequentist and Bayesian interpretations of probability.
- You can apply Bayes' theorem to a concrete example.
- You can describe how prior, likelihood, and posterior relate.
- You can articulate when Bayesian and frequentist answers converge.

## Common Pitfalls
- Thinking one approach is "right" and the other is "wrong" (both have strengths).
- Confusing a frequentist CI with a Bayesian credible interval.
- Ignoring the prior in Bayesian analysis (always state it explicitly).
- Thinking Bayesian methods are always better (they depend on prior choice).

## Quick Fixes (When You Get Stuck)
- Bayes' theorem: P(A|B) = P(B|A) * P(A) / P(B).
- `scipy.stats.beta` for Beta distributions (conjugate prior for binomial).
- If plots look wrong, check your x-axis range (0 to 1 for probability parameters).
- If you see `ModuleNotFoundError`, re-run the bootstrap cell.

## Matching Guide
- `docs/guides/00_statistics_primer/08_bayesian_vs_frequentist.md`

## How To Use This Notebook
- Work section-by-section; don't skip the markdown.
- Most code cells are incomplete on purpose: replace TODOs and `...`, then run.
- After each section, write 2–4 sentences answering the interpretation prompts (what changed, why it matters).
- This notebook is lighter on code and heavier on conceptual understanding than earlier primers.
- Use the **Checkpoint (Self-Check)** section to catch mistakes early.
- Use **Solutions (Reference)** only to unblock yourself; then re-implement without looking.
- Use the matching guide (`docs/guides/00_statistics_primer/08_bayesian_vs_frequentist.md`) for the math, assumptions, and deeper context.

<a id="environment-bootstrap"></a>
## Environment Bootstrap
Run this cell first. It makes the repo importable and defines common directories.

In [None]:
from __future__ import annotations

from pathlib import Path
import sys


def find_repo_root(start: Path) -> Path:
    p = start
    for _ in range(8):
        if (p / 'src').exists() and (p / 'docs').exists():
            return p
        p = p.parent
    raise RuntimeError('Could not find repo root. Start Jupyter from the repo root.')


PROJECT_ROOT = find_repo_root(Path.cwd())
if str(PROJECT_ROOT) not in sys.path:
    sys.path.append(str(PROJECT_ROOT))

DATA_DIR = PROJECT_ROOT / 'data'
RAW_DIR = DATA_DIR / 'raw'
PROCESSED_DIR = DATA_DIR / 'processed'
SAMPLE_DIR = DATA_DIR / 'sample'

PROJECT_ROOT

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from scipy import stats

<a id="two-ways-to-think-about-probability"></a>
## Two Ways to Think About Probability

### Goal
Understand the fundamental philosophical difference between frequentist and Bayesian
interpretations of probability.

### Why this matters
Everything downstream — how you construct confidence intervals, test hypotheses, and
interpret p-values — depends on what you think "probability" *means*. Most of this
project uses frequentist methods because they are the standard in econometrics, but
understanding the Bayesian perspective will make you a sharper thinker.

---

### The frequentist view: probability as long-run frequency

A coin has P(heads) = 0.5 because *if you flip it infinitely many times*, half the
outcomes will be heads. Probability is an objective property of a repeatable process.

**Strengths:**
- No subjective input required — the data speak for themselves.
- Well-suited to repeatable experiments (manufacturing, quality control, clinical trials).

**Limitations:**
- How do you assign probability to one-time events? What is the "long-run frequency"
  of "the US enters a recession in 2025"? The event either happens or it doesn't.
- Cannot formally incorporate prior knowledge (e.g., expert opinion) into the analysis.

### The Bayesian view: probability as degree of belief

Probability measures *your uncertainty* about an event. You can say "I believe there is
a 30% chance the US enters a recession next year" — this reflects your state of
knowledge, not a frequency.

**Strengths:**
- Can handle one-time events naturally.
- Formally incorporates prior knowledge via Bayes' theorem.
- Produces intuitive probability statements ("there is a 95% probability the parameter
  is in this interval").

**Limitations:**
- Requires specifying a prior, which is subjective.
- Can be computationally expensive for complex models.
- Different priors can lead to different conclusions (especially with limited data).

### The big picture

| Aspect | Frequentist | Bayesian |
|--------|-------------|----------|
| Probability means... | Long-run frequency | Degree of belief |
| Parameters are... | Fixed but unknown | Random variables with distributions |
| Data are... | Random (vary across samples) | Fixed (we observed what we observed) |
| Prior information? | Not formally used | Encoded in the prior distribution |
| Interval meaning | "95% of such intervals contain the true value" | "95% probability the parameter is here" |

> **Key insight for this project:** The rest of this project is mostly frequentist — OLS,
> p-values, confidence intervals. But understanding the Bayesian perspective enriches
> your thinking and helps you interpret results more carefully.

**Interpretation prompt** (write 2–4 sentences below):
- In your own words, what is the key difference between the two views?
- Can you think of an economic question where the Bayesian interpretation feels more
  natural? Where the frequentist interpretation feels more natural?
- Why might econometrics textbooks lean frequentist?

<a id="bayes-theorem"></a>
## Bayes' Theorem

### Goal
Apply Bayes' theorem to a concrete economic example and build intuition for
why base rates matter.

### Why this matters
Bayes' theorem is the mathematical engine of Bayesian inference. Even if you
use frequentist methods day-to-day, understanding Bayes' theorem helps you
avoid the **base-rate fallacy** — one of the most common mistakes in applied work.

---

### The formula

$$P(A \mid B) = \frac{P(B \mid A) \cdot P(A)}{P(B)}$$

Where:
- $P(A \mid B)$ = **posterior** — probability of A given we observed B
- $P(B \mid A)$ = **likelihood** — probability of B if A were true
- $P(A)$ = **prior** — probability of A before seeing B
- $P(B)$ = **marginal likelihood** (normalizing constant)

The denominator can be expanded using the law of total probability:

$$P(B) = P(B \mid A) \cdot P(A) + P(B \mid \neg A) \cdot P(\neg A)$$

### Concrete example: recession indicator

Suppose you have a recession prediction model. Here are the facts:

- **Sensitivity (true positive rate):** P(flag | recession) = 0.80
  The model correctly flags 80% of actual recessions.
- **False positive rate:** P(flag | no recession) = 0.10
  The model incorrectly flags 10% of non-recession periods.
- **Base rate:** P(recession) = 0.15
  Historically, the US is in recession about 15% of the time.

**Question:** The model just flagged positive. What is the probability we are
actually in a recession?

### Your Turn

First, try to compute this by hand (or at least set up the calculation on paper).
Then fill in the code below.

In [None]:
# TODO: Apply Bayes' theorem to the recession indicator example.
#
# Given:
#   P(flag | recession)    = 0.80  (sensitivity / true positive rate)
#   P(flag | no recession) = 0.10  (false positive rate)
#   P(recession)           = 0.15  (base rate)
#
# Compute:
#   P(recession | flag) = ?

sensitivity = 0.80          # P(flag | recession)
false_positive_rate = 0.10  # P(flag | no recession)
base_rate = 0.15            # P(recession)

# Step 1: compute P(flag) using the law of total probability
# P(flag) = P(flag|recession)*P(recession) + P(flag|no recession)*P(no recession)
p_flag = ...

# Step 2: apply Bayes' theorem
# P(recession | flag) = P(flag | recession) * P(recession) / P(flag)
p_recession_given_flag = ...

print(f'P(flag)                = {p_flag:.4f}')
print(f'P(recession | flag)    = {p_recession_given_flag:.4f}')
print(f'\nEven with a positive flag, the probability of recession is only '
      f'{p_recession_given_flag:.1%}')

In [None]:
# TODO: Explore how the base rate affects the posterior.
# Vary the base rate from 0.01 to 0.50 and plot P(recession | flag).
# This demonstrates the base-rate fallacy: even a good test can be misleading
# when the base rate is low.

base_rates = np.linspace(0.01, 0.50, 100)

# TODO: compute posterior for each base rate
posteriors = ...

fig, ax = plt.subplots(figsize=(8, 5))
ax.plot(base_rates, posteriors, 'b-', linewidth=2)
ax.axhline(y=0.5, color='gray', linestyle='--', alpha=0.5, label='50% threshold')
ax.axvline(x=0.15, color='red', linestyle='--', alpha=0.5, label='Actual base rate (15%)')
ax.set_xlabel('Base rate P(recession)', fontsize=12)
ax.set_ylabel('P(recession | positive flag)', fontsize=12)
ax.set_title('How the Base Rate Affects Posterior Probability', fontsize=13)
ax.legend()
ax.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

**Interpretation prompt** (write 2–4 sentences below):
- Were you surprised by the result? Most people expect a higher number.
- Why does the base rate matter so much, even when the model has 80% sensitivity?
- At what base rate does a positive flag make a recession more likely than not (>50%)?
- How does this relate to p-values in hypothesis testing? (Hint: a significant
  p-value is like a positive flag. What is the "base rate" for true effects?)

<a id="prior-likelihood-and-posterior"></a>
## Prior, Likelihood, and Posterior

### Goal
Visualize Bayesian updating: how a prior belief is combined with data (likelihood)
to produce a posterior belief.

### Why this matters
This is the core mechanism of Bayesian inference. Understanding it helps you see
statistics not as a one-shot calculation, but as a *learning process*: you start
with a belief, observe data, and update. This is how rational agents should behave
under uncertainty — and it mirrors how economists think about expectations.

---

### The Beta-Binomial model: a beautiful toy example

Suppose you want to estimate the probability $\theta$ that a recession indicator
gives a correct signal. This is a proportion, so $\theta \in [0, 1]$.

**The setup:**
- **Prior:** Before seeing data, your belief about $\theta$ is described by a
  Beta distribution: $\theta \sim \text{Beta}(\alpha, \beta)$.
  - $\text{Beta}(1, 1)$ = uniform (flat) prior: "I have no idea."
  - $\text{Beta}(10, 10)$ = mildly informative: "I think it is around 0.5."
- **Data:** You observe $n$ trials with $k$ successes (e.g., correct signals).
- **Posterior:** The updated belief is also a Beta distribution:
  $\theta \mid \text{data} \sim \text{Beta}(\alpha + k, \beta + n - k)$.

This is called a **conjugate prior**: the posterior has the same distributional
form as the prior, which makes the math tractable.

### Key intuition

$$\text{Posterior} \propto \text{Likelihood} \times \text{Prior}$$

The posterior is a *compromise* between the prior and the data. With little data,
the prior dominates. With lots of data, the data dominate.

### Your Turn

Visualize how the posterior evolves as you observe more coin flips (or more
recession indicator signals). Start with a flat prior and accumulate data.

In [None]:
# TODO: Visualize Bayesian updating with the Beta-Binomial model.
#
# Scenario: You are evaluating whether a recession indicator has a true
# accuracy rate theta. You start with a flat prior Beta(1,1) and observe
# results one at a time.
#
# Suppose you observe the following sequence of correct/incorrect signals:
# 1 = correct, 0 = incorrect

np.random.seed(42)
observations = [1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1]
# 16 correct out of 20 => sample proportion = 0.80

# Prior parameters
alpha_prior = 1  # Beta(1,1) = flat/uniform prior
beta_prior = 1

theta = np.linspace(0, 1, 500)

# Snapshots: plot the posterior after 0, 1, 3, 5, 10, 20 observations
snapshots = [0, 1, 3, 5, 10, 20]

fig, axes = plt.subplots(2, 3, figsize=(14, 8))
axes = axes.ravel()

for i, n_obs in enumerate(snapshots):
    data_so_far = observations[:n_obs]
    k = sum(data_so_far)        # number of successes
    n = len(data_so_far)        # number of trials

    # TODO: compute posterior parameters
    alpha_post = ...
    beta_post = ...

    # TODO: compute the posterior PDF using stats.beta.pdf(theta, alpha_post, beta_post)
    posterior_pdf = ...

    axes[i].plot(theta, posterior_pdf, 'b-', linewidth=2)
    axes[i].fill_between(theta, posterior_pdf, alpha=0.2)
    axes[i].axvline(x=0.80, color='red', linestyle='--', alpha=0.6,
                    label='True rate (0.80)')
    axes[i].set_title(f'After {n_obs} obs (k={k})', fontsize=11)
    axes[i].set_xlabel(r'$\theta$')
    axes[i].set_ylabel('Density')
    axes[i].legend(fontsize=8)

fig.suptitle('Bayesian Updating: Posterior Concentrates as Data Accumulates',
             fontsize=13)
plt.tight_layout()
plt.show()

In [None]:
# TODO: For the case of n=20, plot the prior, likelihood, and posterior
# on the same axes to see how the posterior is a compromise.
#
# Hint:
#   - Prior: stats.beta.pdf(theta, alpha_prior, beta_prior)
#   - Likelihood: stats.binom.pmf(k, n, theta)  (treat theta as the variable)
#     Note: for visualization, normalize the likelihood to have a similar scale.
#   - Posterior: stats.beta.pdf(theta, alpha_prior + k, beta_prior + n - k)

k_total = sum(observations)    # total successes
n_total = len(observations)    # total trials

prior_pdf = ...
likelihood = ...               # use stats.binom.pmf(k_total, n_total, theta)
# Normalize likelihood for visual comparison
likelihood_scaled = likelihood / likelihood.max() * stats.beta.pdf(
    theta, alpha_prior + k_total, beta_prior + n_total - k_total).max()
posterior_pdf = ...

fig, ax = plt.subplots(figsize=(9, 5))
ax.plot(theta, prior_pdf, 'g--', linewidth=2, label='Prior: Beta(1, 1)')
ax.plot(theta, likelihood_scaled, 'orange', linewidth=2,
        label=f'Likelihood (scaled): {k_total}/{n_total} correct', linestyle=':')
ax.plot(theta, posterior_pdf, 'b-', linewidth=2,
        label=f'Posterior: Beta({alpha_prior + k_total}, {beta_prior + n_total - k_total})')
ax.axvline(x=0.80, color='red', linestyle='--', alpha=0.5, label='True rate')
ax.set_xlabel(r'$\theta$ (accuracy rate)', fontsize=12)
ax.set_ylabel('Density', fontsize=12)
ax.set_title('Prior x Likelihood = Posterior', fontsize=13)
ax.legend(fontsize=10)
ax.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

**Interpretation prompt** (write 2–4 sentences below):
- How does the posterior change shape as more data arrives?
- After 20 observations, is the posterior centered near the true value (0.80)?
- In the prior-likelihood-posterior plot, where does the posterior "sit" relative
  to the prior and likelihood? Why?

<a id="bayesian-vs-frequentist-intervals"></a>
## Bayesian vs Frequentist Intervals

### Goal
Compute both a frequentist confidence interval and a Bayesian credible interval
for the same parameter, and understand how their *interpretations* differ.

### Why this matters
This is arguably the most important conceptual distinction in applied statistics.
Most practitioners (including many economists) *think* they are making Bayesian
statements when they report frequentist confidence intervals. Understanding the
difference will make your statistical communication more precise.

---

### The two interpretations

**Frequentist 95% confidence interval:**
> "If I repeated this experiment many times and computed a CI each time,
> 95% of those intervals would contain the true parameter value."
>
> The *interval* is random (it varies across samples). The *parameter* is fixed.
> You *cannot* say: "there is a 95% probability the parameter is in this interval."

**Bayesian 95% credible interval:**
> "Given the data and my prior, there is a 95% probability the parameter
> lies in this interval."
>
> The *parameter* is random (it has a posterior distribution). The *data* are fixed.
> This is the statement people *think* a CI makes.

### Why the distinction matters in practice

When an economist reports a 95% CI for the effect of minimum wage on employment
as [&minus;0.3, &minus;0.1], the temptation is to say "there is a 95% probability the
true effect is between &minus;0.3 and &minus;0.1." That is a Bayesian statement. The
correct frequentist statement is: "this procedure produces intervals that cover
the true effect 95% of the time." Subtle but important.

### Your Turn

Compute both intervals for a simple proportion estimation problem using
the recession indicator data from the previous section.

In [None]:
# TODO: Compute a frequentist CI and a Bayesian credible interval for
# the same proportion theta.
#
# Data: 16 correct signals out of 20 observations.

k = 16   # successes
n = 20   # trials
p_hat = k / n

# --- Frequentist: Wald 95% confidence interval ---
# p_hat +/- z * sqrt(p_hat * (1 - p_hat) / n)
z = stats.norm.ppf(0.975)  # ~1.96

# TODO: compute the standard error and confidence interval bounds
se = ...
freq_ci_lower = ...
freq_ci_upper = ...

print('--- Frequentist 95% Confidence Interval (Wald) ---')
print(f'Point estimate: {p_hat:.4f}')
print(f'95% CI: [{freq_ci_lower:.4f}, {freq_ci_upper:.4f}]')
print(f'Interpretation: If we repeated this experiment many times,')
print(f'95% of such intervals would contain the true theta.\n')

# --- Bayesian: 95% credible interval from the posterior ---
# Posterior: Beta(1 + k, 1 + n - k) with flat prior Beta(1,1)
alpha_post = 1 + k
beta_post = 1 + n - k

# TODO: compute the 95% credible interval using stats.beta.ppf()
# (the percent point function / inverse CDF)
bayes_ci_lower = ...
bayes_ci_upper = ...
posterior_mean = ...

print('--- Bayesian 95% Credible Interval (flat prior) ---')
print(f'Posterior mean: {posterior_mean:.4f}')
print(f'95% credible interval: [{bayes_ci_lower:.4f}, {bayes_ci_upper:.4f}]')
print(f'Interpretation: Given the data and our prior, there is a 95%')
print(f'probability that theta lies in this interval.')

In [None]:
# TODO: Visualize both intervals on the same plot.

theta = np.linspace(0, 1, 500)
posterior_pdf = stats.beta.pdf(theta, alpha_post, beta_post)

fig, ax = plt.subplots(figsize=(9, 5))

# Plot the posterior
ax.plot(theta, posterior_pdf, 'b-', linewidth=2, label='Posterior')
ax.fill_between(theta, posterior_pdf,
                where=(theta >= bayes_ci_lower) & (theta <= bayes_ci_upper),
                alpha=0.2, color='blue', label='95% Bayesian credible interval')

# Overlay the frequentist CI as horizontal bar
y_bar = posterior_pdf.max() * 1.05
ax.plot([freq_ci_lower, freq_ci_upper], [y_bar, y_bar], 'r-', linewidth=3,
        label='95% Frequentist CI')
ax.plot(p_hat, y_bar, 'ro', markersize=8)

ax.set_xlabel(r'$\theta$', fontsize=12)
ax.set_ylabel('Density', fontsize=12)
ax.set_title('Frequentist CI vs Bayesian Credible Interval', fontsize=13)
ax.legend(fontsize=10)
ax.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

**Interpretation prompt** (write 2–4 sentences below):
- Are the two intervals similar numerically? Why or why not?
- How do their *interpretations* differ, even if the numbers are close?
- Which interpretation do you find more intuitive? Which is technically correct
  for the standard econometrics tools you have learned so far?

<a id="when-does-the-prior-matter"></a>
## When Does the Prior Matter?

### Goal
Demonstrate that with enough data, the prior gets "washed out" and Bayesian
and frequentist answers converge.

### Why this matters
A common objection to Bayesian methods is: "But the prior is subjective!"
The reassuring answer: with enough data, the prior barely matters. Different
people starting with different beliefs will converge to the same posterior.
This is a deep result called **Bayesian consistency** (or "the data overwhelm
the prior").

But with *little* data, the prior matters a *lot*. This is both a feature
(you can incorporate expert knowledge) and a bug (you can bias results).

---

### The experiment

Three analysts have different priors about a recession indicator's accuracy $\theta$:
- **Analyst A (optimist):** Beta(9, 1) — strongly believes the model is great ($\theta \approx 0.9$).
- **Analyst B (skeptic):** Beta(1, 9) — strongly believes the model is bad ($\theta \approx 0.1$).
- **Analyst C (agnostic):** Beta(1, 1) — flat prior, no opinion.

They all observe the same data. How quickly do their posteriors converge?

### Your Turn

In [None]:
# TODO: Simulate data and show posterior convergence across different priors.
#
# True theta = 0.75. Generate data from Binomial(n, 0.75).
# Show posteriors for n = 5, 20, 50, 500.

np.random.seed(123)
true_theta = 0.75

# Three different priors
priors = {
    'Optimist: Beta(9,1)': (9, 1),
    'Skeptic: Beta(1,9)':  (1, 9),
    'Agnostic: Beta(1,1)': (1, 1),
}

sample_sizes = [5, 20, 50, 500]
theta = np.linspace(0, 1, 500)

# Generate a large dataset, then take subsets
all_data = np.random.binomial(1, true_theta, size=500)

fig, axes = plt.subplots(1, 4, figsize=(18, 4))

for j, n_obs in enumerate(sample_sizes):
    data_subset = all_data[:n_obs]
    k = data_subset.sum()

    for label, (a0, b0) in priors.items():
        # TODO: compute posterior parameters and plot
        a_post = ...
        b_post = ...
        pdf = ...
        axes[j].plot(theta, pdf, linewidth=2, label=label)

    axes[j].axvline(x=true_theta, color='black', linestyle='--', alpha=0.5,
                    label=f'True \u03b8 = {true_theta}')
    axes[j].set_title(f'n = {n_obs} (k = {k})', fontsize=11)
    axes[j].set_xlabel(r'$\theta$')
    if j == 0:
        axes[j].set_ylabel('Density')
    axes[j].legend(fontsize=7)

fig.suptitle('Prior Influence Vanishes as Data Accumulates', fontsize=13)
plt.tight_layout()
plt.show()

In [None]:
# TODO: Quantify the convergence.
# Compute the posterior mean for each prior at each sample size.
# Show that the gap between posterior means shrinks with n.

print(f'{"n":>6s}  {"Optimist":>10s}  {"Skeptic":>10s}  {"Agnostic":>10s}  {"Max Gap":>10s}')
print('-' * 55)

for n_obs in [5, 10, 20, 50, 100, 200, 500]:
    data_subset = all_data[:n_obs]
    k = data_subset.sum()

    means = []
    for label, (a0, b0) in priors.items():
        # TODO: compute posterior mean = (a0 + k) / (a0 + b0 + n_obs)
        post_mean = ...
        means.append(post_mean)

    gap = max(means) - min(means)
    print(f'{n_obs:>6d}  {means[0]:>10.4f}  {means[1]:>10.4f}  {means[2]:>10.4f}  {gap:>10.4f}')

**Interpretation prompt** (write 2–4 sentences below):
- At n=5, how different are the three posteriors? What does this mean practically?
- At n=500, do the priors still matter?
- In what situation would you *want* the prior to have strong influence?
  (Hint: think about small samples and expert knowledge.)

<a id="practical-implications-for-this-project"></a>
## Practical Implications for This Project

### Goal
Connect the Bayesian/frequentist distinction to the econometric tools you will
use throughout the rest of this project.

### Why this matters
You now have the conceptual framework to understand *why* certain techniques work
the way they do, even though we will primarily use frequentist implementations.

---

### This project uses frequentist methods

Throughout the rest of this project, you will use:
- **OLS regression** — finds coefficients by minimizing squared residuals (a
  frequentist optimization).
- **p-values** — "probability of data this extreme or more, if the null hypothesis
  is true" (a purely frequentist concept).
- **Confidence intervals** — coverage guarantees over repeated sampling.
- **Hypothesis tests** — reject/fail to reject at a significance level.

These are the standard tools of econometrics, and they work well for the
problems in this project.

### But Bayesian thinking enriches your work

Even within a frequentist workflow, Bayesian intuition helps:

1. **Interpreting results more carefully:**
   A significant p-value does *not* mean "the effect is real with 95%
   probability." Bayesian thinking reminds you that the posterior probability
   of an effect depends on the prior (base rate) as well.

2. **Understanding regularization:**
   Ridge regression (which you will encounter in `02_regression/05`) adds a
   penalty $\lambda \|\beta\|^2$ to the OLS objective. This is *exactly*
   equivalent to Bayesian MAP (Maximum A Posteriori) estimation with a
   Gaussian prior on the coefficients: $\beta \sim N(0, \sigma^2/\lambda)$.
   The penalty "shrinks" coefficients toward zero — just as a prior centered
   at zero would.

   Similarly, LASSO (L1 penalty) corresponds to a Laplace prior.

3. **Thinking about model uncertainty:**
   Frequentist tools give you uncertainty about parameters *within* a model.
   Bayesian thinking encourages you to also think about uncertainty *across*
   models (Bayesian model averaging). This perspective is valuable even if
   you implement it informally.

4. **Understanding stationarity priors:**
   In time series econometrics (`07_time_series_econ`), unit root tests check
   whether a series is stationary. The Bayesian perspective is: what is your
   prior belief about stationarity? This matters because unit root tests have
   low power — they often fail to reject the null of a unit root even when
   the series is stationary.

### Summary

| You will use (frequentist) | Bayesian counterpart | Connection |
|---|---|---|
| OLS | Bayesian regression | OLS = Bayesian posterior mode with flat prior |
| Ridge regression | MAP with Gaussian prior | Penalty = prior precision |
| p-value | Posterior probability | Related but NOT the same |
| Confidence interval | Credible interval | Same numbers, different meaning |
| AIC/BIC model selection | Bayesian model averaging | BIC approximates Bayes factor |

**Interpretation prompt** (write 2–4 sentences below):
- In your own words, why does understanding ridge regression as Bayesian help?
- Have you ever heard someone say "there is a 95% probability the true value is in
  this confidence interval"? Is that statement technically correct for a frequentist CI?
- When might you actually want to use Bayesian methods instead of frequentist ones
  in an economics context?

## Where This Shows Up Later

- **Ridge regression is Bayesian MAP estimation with a Gaussian prior**
  (`02_regression/05`). You will see that the penalty parameter lambda
  controls the "strength" of the prior belief that coefficients are near zero.
- **Prior beliefs about stationarity inform unit root testing**
  (`07_time_series_econ`). Whether you believe a series has a unit root
  before testing affects how you interpret test results with low power.
- **Model uncertainty quantification in the capstone.** The capstone project
  requires you to think about uncertainty at multiple levels — a Bayesian habit
  of mind, even if the implementation is frequentist.

<a id="checkpoint-self-check"></a>
## Checkpoint (Self-Check)
Run these asserts to verify your work. If any fail, go back and fix the corresponding section.

In [None]:
# ---- Bayes' theorem checks ----
assert isinstance(p_flag, float), 'p_flag should be a float'
assert 0 < p_flag < 1, f'p_flag={p_flag} should be a probability'
assert isinstance(p_recession_given_flag, float), 'p_recession_given_flag should be a float'
assert 0 < p_recession_given_flag < 1, 'Posterior should be a probability'
# The answer should be around 0.585
assert abs(p_recession_given_flag - 0.585) < 0.01, (
    f'Expected ~0.585, got {p_recession_given_flag:.4f}. Check your Bayes calculation.'
)

# ---- Interval checks ----
assert freq_ci_lower < freq_ci_upper, 'CI lower bound should be < upper bound'
assert bayes_ci_lower < bayes_ci_upper, 'Credible interval should be ordered'
assert 0 < freq_ci_lower < 1, 'Frequentist CI should be in (0, 1)'
assert 0 < bayes_ci_lower < 1, 'Bayesian CI should be in (0, 1)'

# ---- Posterior parameter checks ----
assert alpha_post == 1 + 16, f'alpha_post should be 17, got {alpha_post}'
assert beta_post == 1 + 4, f'beta_post should be 5, got {beta_post}'

print('All checkpoint assertions passed.')

## Extensions (Optional)
- **Bayesian A/B testing:** Suppose two economic policies are tested in different regions.
  Region A shows 60/100 positive outcomes, Region B shows 55/100. Use the Beta-Binomial
  model to compute P(policy A is better than policy B) by simulation.
- **Conjugate priors for the normal mean:** If you observe data from a normal distribution
  with known variance, the conjugate prior for the mean is also normal. Work through
  the math and plot the prior-to-posterior update.
- **Explore `PyMC` or `ArviZ`:** These Python libraries implement full Bayesian inference.
  Try fitting a simple linear regression with `PyMC` and compare the posterior
  distribution of coefficients to the OLS point estimates.
- **The Jeffreys-Lindley paradox:** With a large sample size, a Bayesian test and a
  frequentist test can give opposite conclusions. Research this paradox and think about
  what it means for applied work.

## Reflection
- Before this notebook, which "camp" did you lean toward — frequentist or Bayesian?
  Has your view changed?
- Think of a specific economic question you care about. Would you rather have a
  confidence interval or a credible interval for the answer? Why?
- How does the base-rate fallacy from the Bayes' theorem section connect to the
  replication crisis in social science research?
- In what ways does Bayesian thinking make you a more careful consumer of
  frequentist results?

<a id="solutions-reference"></a>
## Solutions (Reference)

Try the TODOs first. Use these only to unblock yourself or to compare approaches.

<details><summary>Solution: Bayes' theorem (recession indicator)</summary>

_One possible approach. Your variable names may differ; align them with the notebook._

```python
# Reference solution for 08 — Bayes' theorem
sensitivity = 0.80
false_positive_rate = 0.10
base_rate = 0.15

# P(flag) via law of total probability
p_flag = sensitivity * base_rate + false_positive_rate * (1 - base_rate)

# Bayes' theorem
p_recession_given_flag = sensitivity * base_rate / p_flag

print(f'P(flag)             = {p_flag:.4f}')       # 0.2050
print(f'P(recession | flag) = {p_recession_given_flag:.4f}')  # ~0.5854

# Base rate exploration
base_rates = np.linspace(0.01, 0.50, 100)
posteriors = (sensitivity * base_rates) / (
    sensitivity * base_rates + false_positive_rate * (1 - base_rates)
)
```

</details>

<details><summary>Solution: Prior, likelihood, and posterior (Bayesian updating)</summary>

_One possible approach. Your variable names may differ; align them with the notebook._

```python
# Reference solution for 08 — Bayesian updating
observations = [1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1]
alpha_prior, beta_prior = 1, 1
theta = np.linspace(0, 1, 500)
snapshots = [0, 1, 3, 5, 10, 20]

fig, axes = plt.subplots(2, 3, figsize=(14, 8))
axes = axes.ravel()

for i, n_obs in enumerate(snapshots):
    data_so_far = observations[:n_obs]
    k = sum(data_so_far)
    n = len(data_so_far)

    alpha_post = alpha_prior + k
    beta_post = beta_prior + n - k
    posterior_pdf = stats.beta.pdf(theta, alpha_post, beta_post)

    axes[i].plot(theta, posterior_pdf, 'b-', linewidth=2)
    axes[i].fill_between(theta, posterior_pdf, alpha=0.2)
    axes[i].axvline(x=0.80, color='red', linestyle='--', alpha=0.6,
                    label='True rate (0.80)')
    axes[i].set_title(f'After {n_obs} obs (k={k})', fontsize=11)
    axes[i].set_xlabel(r'$\theta$')
    axes[i].set_ylabel('Density')
    axes[i].legend(fontsize=8)

plt.suptitle('Bayesian Updating', fontsize=13)
plt.tight_layout()
plt.show()

# Prior-Likelihood-Posterior plot
k_total = sum(observations)
n_total = len(observations)

prior_pdf = stats.beta.pdf(theta, alpha_prior, beta_prior)
likelihood = stats.binom.pmf(k_total, n_total, theta)
likelihood_scaled = likelihood / likelihood.max() * stats.beta.pdf(
    theta, alpha_prior + k_total, beta_prior + n_total - k_total).max()
posterior_pdf = stats.beta.pdf(theta, alpha_prior + k_total,
                               beta_prior + n_total - k_total)
```

</details>

<details><summary>Solution: Bayesian vs frequentist intervals</summary>

_One possible approach. Your variable names may differ; align them with the notebook._

```python
# Reference solution for 08 — Intervals
k, n = 16, 20
p_hat = k / n  # 0.80

# Frequentist Wald CI
z = stats.norm.ppf(0.975)
se = np.sqrt(p_hat * (1 - p_hat) / n)
freq_ci_lower = p_hat - z * se
freq_ci_upper = p_hat + z * se

# Bayesian credible interval (flat prior)
alpha_post = 1 + k   # 17
beta_post = 1 + n - k  # 5
bayes_ci_lower = stats.beta.ppf(0.025, alpha_post, beta_post)
bayes_ci_upper = stats.beta.ppf(0.975, alpha_post, beta_post)
posterior_mean = alpha_post / (alpha_post + beta_post)
```

</details>

<details><summary>Solution: When does the prior matter?</summary>

_One possible approach. Your variable names may differ; align them with the notebook._

```python
# Reference solution for 08 — Prior convergence
np.random.seed(123)
true_theta = 0.75
all_data = np.random.binomial(1, true_theta, size=500)

priors = {
    'Optimist: Beta(9,1)': (9, 1),
    'Skeptic: Beta(1,9)':  (1, 9),
    'Agnostic: Beta(1,1)': (1, 1),
}

sample_sizes = [5, 20, 50, 500]
theta = np.linspace(0, 1, 500)

fig, axes = plt.subplots(1, 4, figsize=(18, 4))
for j, n_obs in enumerate(sample_sizes):
    data_subset = all_data[:n_obs]
    k = data_subset.sum()
    for label, (a0, b0) in priors.items():
        a_post = a0 + k
        b_post = b0 + n_obs - k
        pdf = stats.beta.pdf(theta, a_post, b_post)
        axes[j].plot(theta, pdf, linewidth=2, label=label)
    axes[j].axvline(x=true_theta, color='black', linestyle='--', alpha=0.5)
    axes[j].set_title(f'n = {n_obs} (k = {k})')
    axes[j].legend(fontsize=7)
plt.tight_layout()
plt.show()

# Quantify convergence
for n_obs in [5, 10, 20, 50, 100, 200, 500]:
    k = all_data[:n_obs].sum()
    means = [(a0 + k) / (a0 + b0 + n_obs) for (a0, b0) in priors.values()]
    gap = max(means) - min(means)
    print(f'n={n_obs:>4d}  gap={gap:.4f}')
```

</details>