# 04a Residual Diagnostics and Specification Tests

Breusch-Pagan, White, Durbin-Watson, Breusch-Godfrey, RESET

## Table of Contents
- [Residual Plots](#residual-plots)
- [Heteroskedasticity Tests](#heteroskedasticity-tests)
- [Serial Correlation Tests](#serial-correlation-tests)
- [Specification Tests (RESET)](#specification-tests-reset)
- [Structural Break (Chow Test)](#structural-break-chow-test)
- [Checkpoint (Self-Check)](#checkpoint-self-check)
- [Extensions (Optional)](#extensions-optional)
- [Reflection](#reflection)
- [Solutions (Reference)](#solutions-reference)

## Why This Notebook Matters

OLS gives you coefficients no matter what. But the standard errors, p-values, and confidence intervals that accompany those coefficients depend on assumptions: homoskedasticity, no serial correlation, and correct functional form. When those assumptions fail, your inference can be badly misleading -- coefficients look significant when they are not, or vice versa.

This notebook teaches you to **diagnose** assumption violations and **choose the right remedy**:
- Heteroskedasticity detected? Use robust (HC3) or White-corrected standard errors.
- Serial correlation detected? Use HAC/Newey-West standard errors.
- Functional form wrong? Re-specify the model (add nonlinear terms, logs, interactions).
- Structural break? Split the sample or add regime indicators.

You will learn to run formal statistical tests (Breusch-Pagan, White, Durbin-Watson, Breusch-Godfrey, RESET, Chow) and interpret them correctly.


## Prerequisites (Quick Self-Check)
- Completed Parts 00-03 (foundations + data + basic regression).
- Familiarity with OLS estimation and `statsmodels` output (see Notebook 00).
- Understanding of robust standard errors from Notebook 04.

## What You Will Produce
- (no file output; learning/analysis notebook)

## Success Criteria
- You can run every diagnostic test and correctly interpret its null hypothesis and p-value.
- You can explain when to use HC3 vs HAC vs re-specification.
- You can run your work end-to-end without undefined variables.

## Common Pitfalls
- Running cells top-to-bottom without reading the instructions.
- Leaving `...` placeholders in code cells.
- Confusing "failing to reject" with "the assumption holds" (absence of evidence is not evidence of absence).
- Applying Durbin-Watson to cross-sectional data (it is a time-series test).
- Treating test results mechanically without thinking about economic context.

## Quick Fixes (When You Get Stuck)
- If you see `ModuleNotFoundError`, re-run the bootstrap cell and restart the kernel; make sure `PROJECT_ROOT` is the repo root.
- If a `data/processed/*` file is missing, either run the matching build script (see guide) or use the notebook's `data/sample/*` fallback.
- If results look "too good," suspect leakage; re-check shifts, rolling windows, and time splits.
- If a model errors, check dtypes (`astype(float)`) and missingness (`dropna()` on required columns).

## Matching Guide
- `docs/guides/02_regression/04a_residual_diagnostics.md`

## How To Use This Notebook
- Work section-by-section; don't skip the markdown.
- Most code cells are incomplete on purpose: replace TODOs and `...`, then run.
- After each section, write 2-4 sentences answering the interpretation prompts (what changed, why it matters).
- Prefer `data/processed/*` if you have built the real datasets; otherwise use the bundled `data/sample/*` fallbacks.
- Use the **Checkpoint (Self-Check)** section to catch mistakes early.
- Use **Solutions (Reference)** only to unblock yourself; then re-implement without looking.
- Use the matching guide (`docs/guides/02_regression/04a_residual_diagnostics.md`) for the math, assumptions, and deeper context.

<a id="environment-bootstrap"></a>
## Environment Bootstrap
Run this cell first. It makes the repo importable and defines common directories.

In [None]:
from __future__ import annotations

from pathlib import Path
import sys


def find_repo_root(start: Path) -> Path:
    p = start
    for _ in range(8):
        if (p / 'src').exists() and (p / 'docs').exists():
            return p
        p = p.parent
    raise RuntimeError('Could not find repo root. Start Jupyter from the repo root.')


PROJECT_ROOT = find_repo_root(Path.cwd())
if str(PROJECT_ROOT) not in sys.path:
    sys.path.append(str(PROJECT_ROOT))

DATA_DIR = PROJECT_ROOT / 'data'
RAW_DIR = DATA_DIR / 'raw'
PROCESSED_DIR = DATA_DIR / 'processed'
SAMPLE_DIR = DATA_DIR / 'sample'

PROJECT_ROOT

## Goal

Learn the standard battery of residual diagnostics and specification tests that economists use after fitting an OLS model. By the end of this notebook you will know:

1. **Visual diagnostics** -- what patterns in residual plots reveal about model problems.
2. **Heteroskedasticity tests** -- Breusch-Pagan and White tests for non-constant variance.
3. **Serial correlation tests** -- Durbin-Watson and Breusch-Godfrey for time-series residual dependence.
4. **Specification tests** -- Ramsey RESET for functional form misspecification.
5. **Structural break detection** -- manual Chow test for regime changes.

Each test has a **null hypothesis**. Your job is to interpret the test statistic and p-value, then decide what remedy (if any) is needed.

---

## Load Data

We use **two** datasets in this notebook:

1. **Census county data** (cross-sectional) -- for heteroskedasticity tests and RESET.
2. **Macro quarterly data** (time series) -- for serial correlation tests and structural break.

### Your Turn (1): Load cross-sectional (county) data

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import statsmodels.api as sm

# --- Cross-sectional data (county level) ---
year = 2022  # TODO: set to the year you fetched
path_county = PROCESSED_DIR / f'census_county_{year}.csv'

if path_county.exists():
    df_county = pd.read_csv(path_county)
else:
    df_county = pd.read_csv(SAMPLE_DIR / 'census_county_sample.csv')

# Build log variables for cross-sectional regression
income = pd.to_numeric(df_county['B19013_001E'], errors='coerce')
rent = pd.to_numeric(df_county['B25064_001E'], errors='coerce')
mask = (income > 0) & (rent > 0)

df_cs = pd.DataFrame({
    'income': income[mask],
    'rent': rent[mask],
}).dropna()
df_cs['log_income'] = np.log(df_cs['income'])
df_cs['log_rent'] = np.log(df_cs['rent'])

print(f'Cross-sectional obs: {len(df_cs)}')
df_cs.head()

### Your Turn (2): Load time-series (macro) data

In [None]:
# --- Time-series data (macro quarterly) ---
path_macro = PROCESSED_DIR / 'macro_quarterly.csv'

if path_macro.exists():
    df_macro = pd.read_csv(path_macro, index_col=0, parse_dates=True)
else:
    df_macro = pd.read_csv(SAMPLE_DIR / 'macro_quarterly_sample.csv', index_col=0, parse_dates=True)

y_ts_col = 'gdp_growth_qoq'
x_ts_cols = ['T10Y2Y_lag1']

df_ts = df_macro[[y_ts_col] + x_ts_cols].dropna().copy()

print(f'Time-series obs: {len(df_ts)}')
df_ts.tail()

### Your Turn (3): Fit baseline OLS models

Fit one model on each dataset. We will use these residuals throughout the notebook.

In [None]:
# --- Cross-sectional model: log(rent) ~ log(income) ---
X_cs = sm.add_constant(df_cs[['log_income']], has_constant='add')
y_cs = df_cs['log_rent']
res_cs = sm.OLS(y_cs, X_cs).fit()

# --- Time-series model: gdp_growth ~ T10Y2Y_lag1 ---
X_ts = sm.add_constant(df_ts[x_ts_cols], has_constant='add')
y_ts = df_ts[y_ts_col]
res_ts = sm.OLS(y_ts, X_ts).fit()

print('=== Cross-Sectional Model ===')
print(res_cs.summary())
print('\n=== Time-Series Model ===')
print(res_ts.summary())

---

<a id="residual-plots"></a>
## 1. Residual Plots

### Goal
Visually inspect residuals to detect problems before running formal tests.

**What to look for:**
- **Fan shape** (residuals spread out as fitted values increase) -- suggests heteroskedasticity.
- **Curvature** (residuals show a systematic U-shape or pattern) -- suggests nonlinearity / functional form misspecification.
- **Outliers** (extreme residuals far from the bulk) -- may distort coefficient estimates.
- **Clusters or bands** -- may indicate omitted categorical variables.

### Your Turn (1): Fitted vs. residual plot (cross-sectional model)

In [None]:
fig, axes = plt.subplots(1, 2, figsize=(12, 5))

# TODO: Plot fitted values vs residuals for the cross-sectional model.
# Hint: res_cs.fittedvalues on x-axis, res_cs.resid on y-axis.
# Add a horizontal line at y=0 with ax.axhline(0, ...).
# Look for fan shapes or curvature.
ax = axes[0]
ax.set_title('Cross-Sectional: Fitted vs Residuals')
ax.set_xlabel('Fitted values')
ax.set_ylabel('Residuals')
...

# TODO: Plot fitted values vs residuals for the time-series model.
ax = axes[1]
ax.set_title('Time-Series: Fitted vs Residuals')
ax.set_xlabel('Fitted values')
ax.set_ylabel('Residuals')
...

plt.tight_layout()
plt.show()

### Your Turn (2): Residuals over time (time-series model only)

For time-series data, also plot residuals against time. Look for trends, volatility clustering, or level shifts.

In [None]:
fig, ax = plt.subplots(figsize=(10, 4))

# TODO: Plot residuals over time for the time-series model.
# Hint: res_ts.resid has a datetime index.
# Add a horizontal line at y=0.
# Look for periods of high vs low volatility or level shifts.
...

ax.set_title('Time-Series Residuals Over Time')
ax.set_xlabel('Date')
ax.set_ylabel('Residual')
plt.tight_layout()
plt.show()

### Interpretation Prompt

Write 2-4 sentences:
- Do you see a fan shape in the cross-sectional plot? What does that suggest?
- Do the time-series residuals look "well-behaved" (randomly scattered around zero) or do you see patterns?

In [None]:
# TODO: Write your interpretation.
notes = """
...
"""
print(notes)

---

<a id="heteroskedasticity-tests"></a>
## 2. Heteroskedasticity Tests

### Goal
Formally test whether the variance of residuals depends on the regressors (heteroskedasticity).

**Key idea:**
- **Null hypothesis ($H_0$):** Residuals are homoskedastic (constant variance).
- **Alternative ($H_1$):** Residual variance varies with regressors.
- **Low p-value** $\Rightarrow$ reject $H_0$ $\Rightarrow$ evidence of heteroskedasticity $\Rightarrow$ use robust SE (e.g., HC3).

### Two tests:

| Test | What it regresses $e^2$ on | Flexibility |
|------|---------------------------|-------------|
| **Breusch-Pagan** | Original regressors ($X$) | Detects linear relationship between variance and $X$ |
| **White** | $X$, $X^2$, and cross-products | Detects more general patterns (but uses more degrees of freedom) |

### Your Turn (1): Breusch-Pagan test

In [None]:
from statsmodels.stats.diagnostic import het_breuschpagan

# TODO: Run the Breusch-Pagan test on the cross-sectional model.
# het_breuschpagan returns (LM stat, LM p-value, F stat, F p-value)
# Pass: residuals and the exogenous variables (including constant)
bp_stat, bp_pval, bp_fstat, bp_fpval = het_breuschpagan(..., ...)

print('Breusch-Pagan Test (Cross-Sectional Model)')
print(f'  LM statistic : {bp_stat:.4f}')
print(f'  LM p-value   : {bp_pval:.4f}')
print(f'  F statistic  : {bp_fstat:.4f}')
print(f'  F p-value    : {bp_fpval:.4f}')
print(f'  Conclusion   : {"Reject H0 (heteroskedastic)" if bp_pval < 0.05 else "Fail to reject H0 (homoskedastic)"}')

### Your Turn (2): White test

In [None]:
from statsmodels.stats.diagnostic import het_white

# TODO: Run the White test on the cross-sectional model.
# het_white returns (LM stat, LM p-value, F stat, F p-value)
# Pass: residuals and the exogenous variables (including constant)
white_stat, white_pval, white_fstat, white_fpval = het_white(..., ...)

print('White Test (Cross-Sectional Model)')
print(f'  LM statistic : {white_stat:.4f}')
print(f'  LM p-value   : {white_pval:.4f}')
print(f'  F statistic  : {white_fstat:.4f}')
print(f'  F p-value    : {white_fpval:.4f}')
print(f'  Conclusion   : {"Reject H0 (heteroskedastic)" if white_pval < 0.05 else "Fail to reject H0 (homoskedastic)"}')

### Your Turn (3): Compare naive vs robust SE

If heteroskedasticity is detected, the remedy is straightforward: use robust standard errors. The coefficients stay the same; only the SE / p-values / CI change.

In [None]:
# TODO: Fit the same cross-sectional model with HC3 robust SE.
# Compare the SE for log_income between naive and HC3.
res_cs_hc3 = res_cs.get_robustcov_results(cov_type='HC3')

print('Coefficient comparison (should be identical):')
print(f'  Naive coef : {res_cs.params["log_income"]:.4f}')
print(f'  HC3 coef   : {res_cs_hc3.params["log_income"]:.4f}')
print()
print('Standard error comparison:')
print(f'  Naive SE   : {res_cs.bse["log_income"]:.4f}')
print(f'  HC3 SE     : {res_cs_hc3.bse["log_income"]:.4f}')
print()

# TODO: What happened to the SE? Did it increase or decrease?
# Write a 1-sentence interpretation.
...

### Interpretation Prompt

Write 2-4 sentences:
- Do the Breusch-Pagan and White tests agree? Why might they differ?
- If you reject homoskedasticity, what is the practical remedy?
- Does heteroskedasticity bias your coefficients, or only your inference (SE, CI, p-values)?

In [None]:
# TODO: Write your interpretation.
notes = """
...
"""
print(notes)

---

<a id="serial-correlation-tests"></a>
## 3. Serial Correlation Tests

### Goal
Test whether the regression residuals are autocorrelated (common in macro time series).

**Key idea:**
- **Null hypothesis ($H_0$):** No autocorrelation in residuals.
- **Alternative ($H_1$):** Residuals are serially correlated.
- **Low p-value** $\Rightarrow$ reject $H_0$ $\Rightarrow$ evidence of serial correlation $\Rightarrow$ use HAC/Newey-West SE.

### Two tests:

| Test | What it checks | Notes |
|------|---------------|-------|
| **Durbin-Watson** | AR(1) autocorrelation only | Value near 2 = no autocorrelation; near 0 = positive; near 4 = negative |
| **Breusch-Godfrey** | AR(p) autocorrelation at multiple lags | More general; you choose the number of lags |

**Important:** These tests are for **time-series** data. Applying them to cross-sectional data is meaningless unless the data has a natural ordering.

### Your Turn (1): Durbin-Watson statistic

In [None]:
from statsmodels.stats.stattools import durbin_watson

# TODO: Compute the Durbin-Watson statistic for the time-series model.
# durbin_watson takes an array of residuals.
dw_stat = durbin_watson(...)

print('Durbin-Watson Test (Time-Series Model)')
print(f'  DW statistic : {dw_stat:.4f}')
print(f'  Interpretation:')
if dw_stat < 1.5:
    print('    Value < 1.5: Evidence of positive autocorrelation.')
elif dw_stat > 2.5:
    print('    Value > 2.5: Evidence of negative autocorrelation.')
else:
    print('    Value near 2: Little evidence of AR(1) autocorrelation.')

### Your Turn (2): Breusch-Godfrey test

Unlike Durbin-Watson, Breusch-Godfrey can test for autocorrelation at higher lags and works with lagged dependent variables.

In [None]:
from statsmodels.stats.diagnostic import acorr_breusch_godfrey

# TODO: Run the Breusch-Godfrey test for lags 1 through 4.
# acorr_breusch_godfrey returns (LM stat, LM p-value, F stat, F p-value)
bg_stat, bg_pval, bg_fstat, bg_fpval = acorr_breusch_godfrey(res_ts, nlags=...)

print('Breusch-Godfrey Test (Time-Series Model, 4 lags)')
print(f'  LM statistic : {bg_stat:.4f}')
print(f'  LM p-value   : {bg_pval:.4f}')
print(f'  F statistic  : {bg_fstat:.4f}')
print(f'  F p-value    : {bg_fpval:.4f}')
print(f'  Conclusion   : {"Reject H0 (serial correlation)" if bg_pval < 0.05 else "Fail to reject H0 (no serial correlation)"}')

### Your Turn (3): Compare naive vs HAC SE

If serial correlation is detected, the remedy is HAC (Heteroskedasticity and Autocorrelation Consistent) standard errors.

In [None]:
# TODO: Fit the same time-series model with HAC robust SE (maxlags=4).
# Compare the SE for T10Y2Y_lag1 between naive and HAC.
res_ts_hac = res_ts.get_robustcov_results(cov_type='HAC', cov_kwds={'maxlags': ...})

print('Coefficient comparison (should be identical):')
print(f'  Naive coef : {res_ts.params["T10Y2Y_lag1"]:.4f}')
print(f'  HAC coef   : {res_ts_hac.params["T10Y2Y_lag1"]:.4f}')
print()
print('Standard error comparison:')
print(f'  Naive SE   : {res_ts.bse["T10Y2Y_lag1"]:.4f}')
print(f'  HAC SE     : {res_ts_hac.bse["T10Y2Y_lag1"]:.4f}')
print()
print('P-value comparison:')
print(f'  Naive p    : {res_ts.pvalues["T10Y2Y_lag1"]:.4f}')
print(f'  HAC p      : {res_ts_hac.pvalues["T10Y2Y_lag1"]:.4f}')

### Interpretation Prompt

Write 2-4 sentences:
- What does the Durbin-Watson value tell you? Does Breusch-Godfrey agree?
- How did HAC correction affect the standard errors and p-values?
- If serial correlation is present, does it bias coefficients or only inference?

In [None]:
# TODO: Write your interpretation.
notes = """
...
"""
print(notes)

---

<a id="specification-tests-reset"></a>
## 4. Specification Tests (RESET)

### Goal
Test whether the functional form of the model is correctly specified.

**Ramsey RESET (Regression Equation Specification Error Test):**
- Adds powers of the fitted values ($\hat{y}^2$, $\hat{y}^3$) to the regression.
- If these extra terms are jointly significant, the original specification may be wrong (missing nonlinearities, interactions, or transformations).
- **Null hypothesis ($H_0$):** The functional form is correctly specified (powers of $\hat{y}$ add no explanatory power).
- **Low p-value** $\Rightarrow$ reject $H_0$ $\Rightarrow$ consider adding polynomial terms, logs, or interactions.

### Your Turn (1): RESET test on the cross-sectional model

In [None]:
from statsmodels.stats.diagnostic import linear_reset

# TODO: Run the RESET test on the cross-sectional model.
# linear_reset returns a test result object with .statistic and .pvalue attributes.
# The 'power' parameter controls which powers of y_hat to include (default is [2, 3]).
reset_cs = linear_reset(res_cs, power=...)

print('RESET Test (Cross-Sectional Model)')
print(f'  F statistic : {reset_cs.statistic:.4f}')
print(f'  p-value     : {reset_cs.pvalue:.4f}')
print(f'  Conclusion  : {"Reject H0 (misspecification)" if reset_cs.pvalue < 0.05 else "Fail to reject H0 (no evidence of misspecification)"}')

### Your Turn (2): RESET test on the time-series model

In [None]:
# TODO: Run the RESET test on the time-series model.
reset_ts = linear_reset(res_ts, power=...)

print('RESET Test (Time-Series Model)')
print(f'  F statistic : {reset_ts.statistic:.4f}')
print(f'  p-value     : {reset_ts.pvalue:.4f}')
print(f'  Conclusion  : {"Reject H0 (misspecification)" if reset_ts.pvalue < 0.05 else "Fail to reject H0 (no evidence of misspecification)"}')

### Interpretation Prompt

Write 2-4 sentences:
- If RESET rejects, what does that suggest about your model?
- What would you do next? (Add polynomial terms? Transform variables? Add interactions?)
- Does RESET tell you *what* the correct specification is, or only that the current one is wrong?

In [None]:
# TODO: Write your interpretation.
notes = """
...
"""
print(notes)

---

<a id="structural-break-chow-test"></a>
## 5. Structural Break (Chow Test)

### Goal
Test whether the regression coefficients are stable across two sub-periods, or whether there is a structural break at a known date.

**Chow test intuition:**
1. Estimate the model on the **full sample** and get the sum of squared residuals ($SSR_{full}$).
2. Split the sample at a known break date. Estimate the model on **each sub-sample** separately ($SSR_1$, $SSR_2$).
3. If the model is the same in both sub-periods, then $SSR_{full} \approx SSR_1 + SSR_2$.
4. If the model differs, $SSR_{full}$ will be much larger than $SSR_1 + SSR_2$.

**Test statistic:**

$$F = \frac{(SSR_{full} - SSR_1 - SSR_2) / k}{(SSR_1 + SSR_2) / (n - 2k)}$$

where $k$ is the number of parameters and $n$ is the total number of observations.

- **Null hypothesis ($H_0$):** No structural break (coefficients are the same in both sub-periods).
- **Low p-value** $\Rightarrow$ reject $H_0$ $\Rightarrow$ evidence of a structural break.

### Your Turn (1): Choose a break date and split the sample

We will split the macro time series at the Great Recession (2008-Q4) as a natural candidate for a structural break.

In [None]:
# TODO: Choose a break date and split the time-series data.
break_date = '2008-12-31'

df_pre = df_ts[df_ts.index <= break_date].copy()
df_post = df_ts[df_ts.index > break_date].copy()

print(f'Pre-break obs  : {len(df_pre)}  ({df_pre.index.min().date()} to {df_pre.index.max().date()})')
print(f'Post-break obs : {len(df_post)}  ({df_post.index.min().date()} to {df_post.index.max().date()})')

# Sanity check: both sub-samples need enough observations
assert len(df_pre) >= 10, 'Pre-break sample too small'
assert len(df_post) >= 10, 'Post-break sample too small'

### Your Turn (2): Compute the Chow test statistic manually

In [None]:
from scipy import stats

# TODO: Fit OLS on each sub-sample and compute SSR.

# Full-sample SSR (already have this from res_ts)
ssr_full = res_ts.ssr

# Pre-break model
X_pre = sm.add_constant(df_pre[x_ts_cols], has_constant='add')
y_pre = df_pre[y_ts_col]
res_pre = sm.OLS(y_pre, X_pre).fit()
ssr_pre = ...

# Post-break model
X_post = sm.add_constant(df_post[x_ts_cols], has_constant='add')
y_post = df_post[y_ts_col]
res_post = sm.OLS(y_post, X_post).fit()
ssr_post = ...

# Chow F-statistic
k = len(res_ts.params)           # number of parameters (including constant)
n = len(df_ts)                    # total observations

# TODO: Compute the F statistic using the formula above.
chow_f = ...
chow_pval = 1 - stats.f.cdf(chow_f, dfn=k, dfd=n - 2 * k)

print('Chow Test (Structural Break)')
print(f'  Break date   : {break_date}')
print(f'  SSR (full)   : {ssr_full:.4f}')
print(f'  SSR (pre)    : {ssr_pre:.4f}')
print(f'  SSR (post)   : {ssr_post:.4f}')
print(f'  F statistic  : {chow_f:.4f}')
print(f'  p-value      : {chow_pval:.4f}')
print(f'  Conclusion   : {"Reject H0 (structural break)" if chow_pval < 0.05 else "Fail to reject H0 (no structural break)"}')

### Your Turn (3): Compare coefficients across sub-samples

Even without a formal test, comparing coefficients visually helps build intuition.

In [None]:
# TODO: Print and compare coefficients from pre- and post-break models.
comparison = pd.DataFrame({
    'Full Sample': res_ts.params,
    'Pre-Break': ...,
    'Post-Break': ...,
})

print('Coefficient Comparison Across Sub-Samples')
print(comparison.round(4))
print()
print('If coefficients differ substantially, the relationship may have changed after the break.')

### Interpretation Prompt

Write 2-4 sentences:
- Does the Chow test suggest a structural break at 2008-Q4?
- How do the coefficients differ between sub-periods?
- What are the limitations of the Chow test? (Hint: you must choose the break date in advance.)

In [None]:
# TODO: Write your interpretation.
notes = """
...
"""
print(notes)

---

## Summary: Diagnostic Decision Table

| Problem | Visual Clue | Formal Test | Remedy |
|---------|------------|-------------|--------|
| Heteroskedasticity | Fan shape in residual plot | Breusch-Pagan, White | HC3 robust SE |
| Serial correlation | Residuals cluster in time | Durbin-Watson, Breusch-Godfrey | HAC/Newey-West SE |
| Functional form | Curvature in residual plot | Ramsey RESET | Add nonlinear terms, logs, interactions |
| Structural break | Level shift in residuals over time | Chow test | Split sample, add regime dummies |

---

<a id="checkpoint-self-check"></a>
## Checkpoint (Self-Check)

Run a few asserts and write 2-3 sentences summarizing what you verified.

In [None]:
# TODO: Validate your results. Uncomment and adjust as needed.

# Cross-sectional data loaded
# assert df_cs.shape[0] > 50, f'Too few county observations: {df_cs.shape[0]}'

# Time-series data loaded
# assert df_ts.shape[0] > 30, f'Too few time-series observations: {df_ts.shape[0]}'

# Breusch-Pagan ran successfully
# assert 0 <= bp_pval <= 1, f'BP p-value out of range: {bp_pval}'

# White test ran successfully
# assert 0 <= white_pval <= 1, f'White p-value out of range: {white_pval}'

# Durbin-Watson in valid range
# assert 0 <= dw_stat <= 4, f'DW statistic out of range: {dw_stat}'

# Breusch-Godfrey ran successfully
# assert 0 <= bg_pval <= 1, f'BG p-value out of range: {bg_pval}'

# RESET ran successfully
# assert 0 <= reset_cs.pvalue <= 1, f'RESET p-value out of range'

# Chow test: SSR_full >= SSR_pre + SSR_post (by construction)
# assert ssr_full >= ssr_pre + ssr_post - 1e-6, 'SSR arithmetic check failed'

print('All checkpoint assertions passed.')
...

---

<a id="extensions-optional"></a>
## Extensions (Optional)

- Try running the heteroskedasticity tests on the **time-series** model. Do you find heteroskedasticity there too?
- Run the Breusch-Godfrey test at different lag lengths (1, 2, 4, 8). How sensitive are the results to the lag choice?
- If RESET rejects, try adding a squared term (e.g., `log_income**2`) to the cross-sectional model and re-run RESET. Does it pass now?
- Try different break dates for the Chow test (e.g., 2001-Q3 for 9/11, 2020-Q1 for COVID). Which break is most significant?
- Implement a **rolling Chow test** that moves the break date across the sample and plots the F-statistic over time.

---

<a id="reflection"></a>
## Reflection

- What did you assume implicitly when running these tests (e.g., about the data-generating process, normality, correct specification of the auxiliary regression)?
- Can you "test your way" to a correct model, or do diagnostics only flag certain problems?
- In practice, do you run all of these tests on every model, or do you select based on context (cross-section vs time series, domain knowledge)?
- If you had to ship a regression-based forecast, which of these diagnostics would you include in your model monitoring dashboard?

---

<a id="solutions-reference"></a>
## Solutions (Reference)

Try the TODOs first. Use these only to unblock yourself or to compare approaches.

<details><summary>Solution: Residual Plots</summary>

_One possible approach. Your variable names may differ; align them with the notebook._

```python
# Reference solution for 04a — Residual Plots
fig, axes = plt.subplots(1, 2, figsize=(12, 5))

# Cross-sectional: fitted vs residuals
ax = axes[0]
ax.scatter(res_cs.fittedvalues, res_cs.resid, alpha=0.3, s=10)
ax.axhline(0, color='red', linestyle='--', linewidth=1)
ax.set_title('Cross-Sectional: Fitted vs Residuals')
ax.set_xlabel('Fitted values')
ax.set_ylabel('Residuals')

# Time-series: fitted vs residuals
ax = axes[1]
ax.scatter(res_ts.fittedvalues, res_ts.resid, alpha=0.5, s=15)
ax.axhline(0, color='red', linestyle='--', linewidth=1)
ax.set_title('Time-Series: Fitted vs Residuals')
ax.set_xlabel('Fitted values')
ax.set_ylabel('Residuals')

plt.tight_layout()
plt.show()
```

```python
# Residuals over time
fig, ax = plt.subplots(figsize=(10, 4))
ax.plot(res_ts.resid.index, res_ts.resid.values, linewidth=0.8)
ax.axhline(0, color='red', linestyle='--', linewidth=1)
ax.set_title('Time-Series Residuals Over Time')
ax.set_xlabel('Date')
ax.set_ylabel('Residual')
plt.tight_layout()
plt.show()
```

</details>

<details><summary>Solution: Heteroskedasticity Tests</summary>

_One possible approach. Your variable names may differ; align them with the notebook._

```python
# Reference solution for 04a — Heteroskedasticity Tests
from statsmodels.stats.diagnostic import het_breuschpagan, het_white

# Breusch-Pagan
bp_stat, bp_pval, bp_fstat, bp_fpval = het_breuschpagan(res_cs.resid, res_cs.model.exog)
print(f'BP p-value: {bp_pval:.4f}')

# White
white_stat, white_pval, white_fstat, white_fpval = het_white(res_cs.resid, res_cs.model.exog)
print(f'White p-value: {white_pval:.4f}')

# Robust SE comparison
res_cs_hc3 = res_cs.get_robustcov_results(cov_type='HC3')
print(f'Naive SE: {res_cs.bse["log_income"]:.4f}')
print(f'HC3 SE:   {res_cs_hc3.bse["log_income"]:.4f}')
```

</details>

<details><summary>Solution: Serial Correlation Tests</summary>

_One possible approach. Your variable names may differ; align them with the notebook._

```python
# Reference solution for 04a — Serial Correlation Tests
from statsmodels.stats.stattools import durbin_watson
from statsmodels.stats.diagnostic import acorr_breusch_godfrey

# Durbin-Watson
dw_stat = durbin_watson(res_ts.resid)
print(f'DW statistic: {dw_stat:.4f}')

# Breusch-Godfrey (4 lags)
bg_stat, bg_pval, bg_fstat, bg_fpval = acorr_breusch_godfrey(res_ts, nlags=4)
print(f'BG p-value: {bg_pval:.4f}')

# HAC comparison
res_ts_hac = res_ts.get_robustcov_results(cov_type='HAC', cov_kwds={'maxlags': 4})
print(f'Naive SE: {res_ts.bse["T10Y2Y_lag1"]:.4f}')
print(f'HAC SE:   {res_ts_hac.bse["T10Y2Y_lag1"]:.4f}')
```

</details>

<details><summary>Solution: Specification Tests (RESET)</summary>

_One possible approach. Your variable names may differ; align them with the notebook._

```python
# Reference solution for 04a — RESET
from statsmodels.stats.diagnostic import linear_reset

# Cross-sectional RESET (powers 2 and 3)
reset_cs = linear_reset(res_cs, power=np.array([2, 3]))
print(f'RESET (CS) F={reset_cs.statistic:.4f}, p={reset_cs.pvalue:.4f}')

# Time-series RESET
reset_ts = linear_reset(res_ts, power=np.array([2, 3]))
print(f'RESET (TS) F={reset_ts.statistic:.4f}, p={reset_ts.pvalue:.4f}')
```

</details>

<details><summary>Solution: Structural Break (Chow Test)</summary>

_One possible approach. Your variable names may differ; align them with the notebook._

```python
# Reference solution for 04a — Chow Test
from scipy import stats

break_date = '2008-12-31'
df_pre = df_ts[df_ts.index <= break_date].copy()
df_post = df_ts[df_ts.index > break_date].copy()

# Full sample SSR
ssr_full = res_ts.ssr

# Pre-break
X_pre = sm.add_constant(df_pre[x_ts_cols], has_constant='add')
y_pre = df_pre[y_ts_col]
res_pre = sm.OLS(y_pre, X_pre).fit()
ssr_pre = res_pre.ssr

# Post-break
X_post = sm.add_constant(df_post[x_ts_cols], has_constant='add')
y_post = df_post[y_ts_col]
res_post = sm.OLS(y_post, X_post).fit()
ssr_post = res_post.ssr

# F statistic
k = len(res_ts.params)
n = len(df_ts)
chow_f = ((ssr_full - ssr_pre - ssr_post) / k) / ((ssr_pre + ssr_post) / (n - 2 * k))
chow_pval = 1 - stats.f.cdf(chow_f, dfn=k, dfd=n - 2 * k)

print(f'Chow F={chow_f:.4f}, p={chow_pval:.4f}')

# Coefficient comparison
comparison = pd.DataFrame({
    'Full Sample': res_ts.params,
    'Pre-Break': res_pre.params,
    'Post-Break': res_post.params,
})
print(comparison.round(4))
```

</details>