# 03 Z-Scores and Standardization

Transforming data to a common scale: how and why we standardize, and what z-scores tell you.

## Table of Contents
- [What is a z-score?](#what-is-a-z-score)
- [Computing z-scores](#computing-z-scores)
- [The standard normal distribution](#the-standard-normal-distribution)
- [The empirical rule (68-95-99.7)](#the-empirical-rule)
- [Standardization for comparing across scales](#standardization-for-comparing-across-scales)
- [Z-scores as outlier detectors](#z-scores-as-outlier-detectors)
- [When standardization is required vs optional](#when-standardization-is-required-vs-optional)
- [Checkpoint (Self-Check)](#checkpoint-self-check)
- [Solutions (Reference)](#solutions-reference)

## Why This Notebook Matters
Z-scores are one of the simplest yet most powerful tools in statistics. They let you compare
values across different scales, identify outliers, and connect raw data to probability
statements. Every t-statistic in a regression table is essentially a z-score. Understanding
standardization here will make hypothesis testing and regression output far more intuitive.

## Prerequisites (Quick Self-Check)
- Completed notebooks 00-02 (descriptive statistics, distributions, CLT).
- Understanding of mean and standard deviation.
- Familiarity with the normal distribution.

## What You Will Produce
- (no file output; learning/analysis notebook)

## Success Criteria
- You can compute and interpret z-scores for any dataset.
- You can use the empirical rule and know when it breaks down.
- You can standardize multiple variables to compare them on a common scale.
- You can use z-scores to detect outliers.

## Common Pitfalls
- Computing z-scores with population parameters when you only have a sample.
- Applying the empirical rule to heavily skewed data.
- Confusing standardization (z-scores) with normalization (min-max scaling).
- Forgetting to standardize features before algorithms that require it.

## Quick Fixes (When You Get Stuck)
- If z-scores look wrong, double-check that you used the correct mean and std.
- If `scipy.stats.zscore` gives unexpected results, check for NaN values first.
- If you see `ModuleNotFoundError`, re-run the bootstrap cell.

## Matching Guide
- `docs/guides/00a_statistics_primer/03_z_scores_and_standardization.md`

## How To Use This Notebook
- Work section-by-section; don't skip the markdown.
- Most code cells are incomplete on purpose: replace TODOs and `...`, then run.
- After each section, write 2–4 sentences answering the interpretation prompts (what changed, why it matters).
- Prefer `data/processed/*` if you have built the real datasets; otherwise use the bundled `data/sample/*` fallbacks.
- Use the **Checkpoint (Self-Check)** section to catch mistakes early.
- Use **Solutions (Reference)** only to unblock yourself; then re-implement without looking.
- Use the matching guide (`docs/guides/00a_statistics_primer/03_z_scores_and_standardization.md`) for the math, assumptions, and deeper context.

<a id="environment-bootstrap"></a>
## Environment Bootstrap
Run this cell first. It makes the repo importable and defines common directories.

In [None]:
from __future__ import annotations

from pathlib import Path
import sys


def find_repo_root(start: Path) -> Path:
    p = start
    for _ in range(8):
        if (p / 'src').exists() and (p / 'docs').exists():
            return p
        p = p.parent
    raise RuntimeError('Could not find repo root. Start Jupyter from the repo root.')


PROJECT_ROOT = find_repo_root(Path.cwd())
if str(PROJECT_ROOT) not in sys.path:
    sys.path.append(str(PROJECT_ROOT))

DATA_DIR = PROJECT_ROOT / 'data'
RAW_DIR = DATA_DIR / 'raw'
PROCESSED_DIR = DATA_DIR / 'processed'
SAMPLE_DIR = DATA_DIR / 'sample'

PROJECT_ROOT

<a id="what-is-a-z-score"></a>
## What is a z-score?

### Goal
Understand the definition and intuition behind z-scores before computing anything.

### Why this matters in economics
Every t-statistic you see in a regression table is a z-score: it measures how many standard
errors a coefficient is away from zero. GDP growth of -4% is meaningless without context;
saying it is 3.2 standard deviations below the mean immediately tells you it is extreme.

### Definition

The **z-score** (or **standard score**) of an observation $x$ is:

$$z = \frac{x - \mu}{\sigma}$$

where $\mu$ is the mean and $\sigma$ is the standard deviation.

For a sample, we use $\bar{x}$ and $s$:

$$z = \frac{x - \bar{x}}{s}$$

### Intuition

A z-score answers: **"How many standard deviations away from the mean is this value?"**

| z-score | Interpretation |
|---------|---------------|
| z = 0   | Exactly at the mean |
| z = +1  | One standard deviation above the mean |
| z = -1  | One standard deviation below the mean |
| z = +2  | Two standard deviations above the mean (unusually high) |
| z = -2  | Two standard deviations below the mean (unusually low) |
| z = +3  | Three standard deviations above (very rare if data is normal) |

**Key properties:**
- Positive z = above average; Negative z = below average.
- The magnitude tells you how unusual the value is.
- Z-scores are **unitless**: GDP growth in % and unemployment in % both become "number of standard deviations."

<a id="computing-z-scores"></a>
## Computing z-scores

### Goal
Compute z-scores by hand for a few values, then use `scipy.stats.zscore` on real economic data.
Identify which quarters had extreme GDP growth.

### Why this matters in economics
Extreme GDP growth quarters often correspond to recessions (large negative z) or recovery
booms (large positive z). Z-scores give a quantitative threshold for what counts as
"extreme" rather than relying on subjective judgment.

### Your Turn (1): Z-scores by hand

Given the following five GDP growth rates (annualized, in percent): 2.5, 3.1, -0.8, 4.2, 1.0

Compute the z-score for each value manually.

In [None]:
import numpy as np
import pandas as pd

# Five hypothetical GDP growth rates (annualized %)
values = np.array([2.5, 3.1, -0.8, 4.2, 1.0])

# TODO: Compute the mean and standard deviation of these values
mean_val = ...
std_val = ...  # use ddof=0 for population std (all five are the "population" here)

print(f'Mean: {mean_val:.2f}')
print(f'Std:  {std_val:.2f}')

# TODO: Compute z-scores by applying the formula z = (x - mean) / std
z_manual = ...

for val, z in zip(values, z_manual):
    print(f'  GDP growth = {val:+.1f}%  ->  z = {z:+.2f}')

**Interpretation prompt:** Which value has the most negative z-score? What does that tell you
about that quarter's GDP growth relative to the others?

### Your Turn (2): Z-scores with scipy on real data

Load the macro quarterly sample data and compute z-scores for the `gdp_growth_qoq_annualized`
column. Identify quarters with extreme GDP growth (|z| > 2).

In [None]:
from scipy import stats

# Load data
df = pd.read_csv(SAMPLE_DIR / 'macro_quarterly_sample.csv', index_col=0, parse_dates=True)
print(f'Shape: {df.shape}')
print(f'Date range: {df.index.min()} to {df.index.max()}')
df.head()

In [None]:
# TODO: Extract the GDP growth column and drop NaN values
gdp_growth = df['gdp_growth_qoq_annualized'].dropna()

# TODO: Compute z-scores using scipy.stats.zscore
# Hint: stats.zscore(gdp_growth) uses ddof=0 by default
z_gdp = ...

# Attach z-scores back for inspection
gdp_df = pd.DataFrame({
    'gdp_growth': gdp_growth,
    'z_score': z_gdp
})

print(f'Mean z-score: {z_gdp.mean():.4f}  (should be ~0)')
print(f'Std z-score:  {z_gdp.std():.4f}  (should be ~1)')

In [None]:
# TODO: Identify quarters with |z| > 2 (extreme GDP growth)
extreme_quarters = ...

print(f'Number of extreme quarters (|z| > 2): {len(extreme_quarters)}')
print(f'Total quarters: {len(gdp_df)}')
print(f'Percentage extreme: {100 * len(extreme_quarters) / len(gdp_df):.1f}%')
print()
print('Extreme quarters:')
extreme_quarters.sort_values('z_score')

**Interpretation prompt:**
1. Do the extreme negative z-score quarters correspond to known recessions?
2. What about the extreme positive quarters -- do they correspond to recovery periods?
3. The empirical rule says about 5% of observations should have |z| > 2 for normal data.
   Is your percentage close to 5%? What might explain any difference?

<a id="the-standard-normal-distribution"></a>
## The standard normal distribution

### Goal
Understand that z-scores convert any normal variable to $N(0, 1)$. Plot the standard normal
PDF and use the CDF to answer probability questions.

### Why this matters in economics
When you see a p-value in a regression table, the software computed it by comparing a
test statistic (essentially a z-score) to the standard normal or t-distribution.
Understanding $N(0, 1)$ makes those p-values concrete.

### Your Turn (1): Plot the standard normal PDF

In [None]:
import matplotlib.pyplot as plt
from scipy.stats import norm

# TODO: Create an array of x values from -4 to 4
x = ...

# TODO: Compute the standard normal PDF at each x value
pdf_vals = ...

fig, ax = plt.subplots(figsize=(9, 5))
ax.plot(x, pdf_vals, 'k-', lw=2, label='N(0, 1) PDF')

# Shade regions for 1, 2, 3 standard deviations
for n_sd, color, alpha in [(1, 'steelblue', 0.4), (2, 'orange', 0.3), (3, 'green', 0.2)]:
    mask = (x >= -n_sd) & (x <= n_sd)
    ax.fill_between(x[mask], pdf_vals[mask], alpha=alpha, color=color,
                    label=f'{n_sd} SD')

ax.set_xlabel('z-score')
ax.set_ylabel('Density')
ax.set_title('Standard Normal Distribution N(0, 1)')
ax.legend()
ax.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

### Your Turn (2): Probability questions using the CDF

Use `norm.cdf()` to answer:
1. What fraction of observations fall within 1 standard deviation of the mean?
2. Within 2 standard deviations?
3. Within 3 standard deviations?
4. What is the probability of observing z > 1.96? (This is the basis for the 5% significance level.)

In [None]:
# TODO: Compute P(-1 < Z < 1) using norm.cdf
# Hint: P(a < Z < b) = norm.cdf(b) - norm.cdf(a)
within_1sd = ...

# TODO: Compute P(-2 < Z < 2)
within_2sd = ...

# TODO: Compute P(-3 < Z < 3)
within_3sd = ...

# TODO: Compute P(Z > 1.96)
above_196 = ...

print(f'P(-1 < Z < 1) = {within_1sd:.4f}  (empirical rule says ~0.68)')
print(f'P(-2 < Z < 2) = {within_2sd:.4f}  (empirical rule says ~0.95)')
print(f'P(-3 < Z < 3) = {within_3sd:.4f}  (empirical rule says ~0.997)')
print(f'P(Z > 1.96)   = {above_196:.4f}  (this is the one-tail 2.5%)')

**Interpretation prompt:** The exact values from the CDF are close to but not exactly 68%,
95%, 99.7%. Why is the empirical rule stated as approximate numbers? Why is z = 1.96
(not 2.0) the critical value for 95% two-tailed tests?

<a id="the-empirical-rule"></a>
## The empirical rule (68-95-99.7)

### Goal
Verify the 68-95-99.7 rule on simulated normal data AND on real economic data.
Discuss when the rule breaks down.

### Why this matters in economics
Economic data is often approximately -- but not exactly -- normal. Fat tails are common:
financial crises, pandemics, and policy shocks generate observations far beyond what the
normal distribution predicts. Knowing when the rule holds (and when it doesn't) is critical
for risk assessment.

### Your Turn (1): Verify on simulated normal data

In [None]:
rng = np.random.default_rng(42)

# TODO: Generate 10,000 draws from a normal distribution
sim_data = ...

# TODO: Compute z-scores for the simulated data
sim_z = ...

# TODO: Compute the percentage of observations within 1, 2, 3 standard deviations
pct_1sd = ...
pct_2sd = ...
pct_3sd = ...

print('--- Simulated Normal Data (n=10,000) ---')
print(f'Within 1 SD: {pct_1sd:.2f}%  (expect ~68.27%)')
print(f'Within 2 SD: {pct_2sd:.2f}%  (expect ~95.45%)')
print(f'Within 3 SD: {pct_3sd:.2f}%  (expect ~99.73%)')

### Your Turn (2): Verify on real GDP growth data

In [None]:
# Use the z-scores computed earlier for GDP growth

# TODO: Compute actual percentages within 1, 2, 3 SD for real GDP growth
gdp_pct_1sd = ...
gdp_pct_2sd = ...
gdp_pct_3sd = ...

print('--- Real GDP Growth Data ---')
print(f'Within 1 SD: {gdp_pct_1sd:.2f}%  (rule says ~68%)')
print(f'Within 2 SD: {gdp_pct_2sd:.2f}%  (rule says ~95%)')
print(f'Within 3 SD: {gdp_pct_3sd:.2f}%  (rule says ~99.7%)')
print()

# Summary comparison table
comparison = pd.DataFrame({
    'Rule': [68.27, 95.45, 99.73],
    'Simulated': [pct_1sd, pct_2sd, pct_3sd],
    'GDP Growth': [gdp_pct_1sd, gdp_pct_2sd, gdp_pct_3sd]
}, index=['1 SD', '2 SD', '3 SD'])
comparison

**Interpretation prompt:**
1. Does the empirical rule hold well for the simulated data? Why?
2. How closely does GDP growth follow the rule? If there are deviations, what might cause them?
3. In which direction would heavy tails cause the rule to break -- would you see *more* or
   *fewer* extreme observations than the rule predicts?
4. If you were building a risk model, would you trust the normal-distribution percentages
   for tail events? Why or why not?

<a id="standardization-for-comparing-across-scales"></a>
## Standardization for comparing across scales

### Goal
Standardize multiple economic indicators with different units and plot them on the same
axis to reveal co-movement patterns.

### Why this matters in economics
GDP growth is measured in percent, industrial production is an index, retail sales are in
billions of dollars, and interest rates are in percentage points. You cannot directly compare
these on the same chart. Standardization (converting to z-scores) puts them all in "standard
deviation units," making visual comparison meaningful.

### Your Turn (1): Standardize multiple columns

In [None]:
# Select key economic indicators (different units and scales)
indicators = ['gdp_growth_qoq_annualized', 'UNRATE', 'FEDFUNDS', 'T10Y2Y']

df_raw = df[indicators].dropna()

print('--- Raw data summary (different scales) ---')
df_raw.describe().round(2)

In [None]:
# TODO: Standardize each column to z-scores
# Hint: For a DataFrame, (df - df.mean()) / df.std() works column-wise
df_standardized = ...

print('--- Standardized data summary (should have mean~0, std~1) ---')
df_standardized.describe().round(4)

### Your Turn (2): Plot raw vs standardized to see the difference

In [None]:
fig, axes = plt.subplots(1, 2, figsize=(16, 5))

# Left panel: raw data (hard to compare because scales differ)
# TODO: Plot all indicators in df_raw on axes[0]
...
axes[0].set_title('Raw Indicators (different scales)')
axes[0].set_ylabel('Original units')
axes[0].legend(loc='upper left', fontsize=8)
axes[0].grid(True, alpha=0.3)

# Right panel: standardized data (easy to compare)
# TODO: Plot all indicators in df_standardized on axes[1]
...
axes[1].set_title('Standardized Indicators (z-scores)')
axes[1].set_ylabel('Standard deviations from mean')
axes[1].axhline(0, color='black', lw=0.8, ls='--')
axes[1].legend(loc='upper left', fontsize=8)
axes[1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

### Your Turn (3): Identify co-movement patterns

In [None]:
# TODO: Compute the correlation matrix of the STANDARDIZED indicators.
# Note: correlations are the same whether you standardize or not!
# But the standardized plot makes co-movement visually obvious.
corr_matrix = ...

print('Correlation matrix of economic indicators:')
corr_matrix.round(3)

**Interpretation prompt:**
1. Which pairs of indicators move together (positive correlation)? Which move in opposite
   directions (negative correlation)?
2. Does the relationship between unemployment (UNRATE) and GDP growth match your economic
   intuition? (Hint: think about Okun's Law.)
3. Why is standardization essential for the visual comparison even though it does not
   change the correlation coefficients?

<a id="z-scores-as-outlier-detectors"></a>
## Z-scores as outlier detectors

### Goal
Use z-scores to flag outliers in economic data. Cross-reference extreme observations with
known economic events.

### Why this matters in economics
Outliers in economic data are not random noise -- they often correspond to recessions,
financial crises, or policy interventions. Identifying them systematically (rather than
eyeballing) is the first step toward deciding whether to include, transform, or model
them separately.

### Your Turn (1): Flag outliers across multiple indicators

In [None]:
# Use the standardized DataFrame from above
# Threshold: |z| > 2.5 is a common outlier cutoff

threshold = 2.5

# TODO: Create a boolean DataFrame where True = outlier (|z| > threshold)
outlier_flags = ...

# TODO: Count how many outliers each indicator has
outlier_counts = ...
print(f'Outlier counts (|z| > {threshold}):')
print(outlier_counts)
print()

# TODO: Find quarters where ANY indicator is an outlier
any_outlier = ...
outlier_quarters = df_standardized[any_outlier]
print(f'Quarters with at least one outlier: {any_outlier.sum()} out of {len(df_standardized)}')
outlier_quarters

### Your Turn (2): Cross-reference with recession indicator

In [None]:
# The dataset has a 'recession' column (1 = recession quarter, 0 = expansion)

# TODO: For the outlier quarters identified above, check whether they overlap with recessions
outlier_dates = outlier_quarters.index
recession_status = df.loc[outlier_dates, 'recession'] if 'recession' in df.columns else None

if recession_status is not None:
    outlier_detail = pd.DataFrame({
        'recession': recession_status,
        'gdp_growth': df.loc[outlier_dates, 'gdp_growth_qoq_annualized'],
        'z_gdp': df_standardized.loc[outlier_dates, 'gdp_growth_qoq_annualized'],
        'z_unrate': df_standardized.loc[outlier_dates, 'UNRATE']
    })
    print(outlier_detail.round(2))
else:
    print('No recession column found -- skip cross-referencing.')

### Your Turn (3): Visualize outliers

In [None]:
fig, ax = plt.subplots(figsize=(12, 5))

# Plot GDP growth z-scores over time
z_gdp_full = df_standardized['gdp_growth_qoq_annualized']
ax.plot(z_gdp_full.index, z_gdp_full.values, 'b-', lw=1, label='GDP growth (z-score)')

# TODO: Highlight outlier quarters (|z| > threshold) as red dots
...

# Draw threshold lines
ax.axhline(threshold, color='red', ls='--', lw=0.8, alpha=0.7, label=f'z = +/- {threshold}')
ax.axhline(-threshold, color='red', ls='--', lw=0.8, alpha=0.7)
ax.axhline(0, color='black', ls='-', lw=0.5)

# TODO: Shade recession periods if the 'recession' column exists
if 'recession' in df.columns:
    rec = df['recession'].reindex(z_gdp_full.index)
    ax.fill_between(z_gdp_full.index, ax.get_ylim()[0], ax.get_ylim()[1],
                    where=rec == 1, alpha=0.15, color='gray', label='Recession')

ax.set_xlabel('Date')
ax.set_ylabel('Z-score')
ax.set_title('GDP Growth Z-Scores with Outlier Detection')
ax.legend(loc='lower left', fontsize=8)
ax.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

**Interpretation prompt:**
1. Do the z-score outliers cluster around recession periods?
2. Are there any outliers that are *not* during recessions? What might explain them?
3. Would a stricter threshold (|z| > 3) miss important events?
4. Is the z-score method appropriate for heavily skewed data, or should you consider
   alternative outlier detection methods (e.g., IQR-based)?

<a id="when-standardization-is-required-vs-optional"></a>
## When standardization is required vs optional

### Goal
Understand which machine learning and econometric methods require standardized inputs
and which do not.

### Why this matters in economics
Using the wrong preprocessing can silently degrade your model or mislead interpretation.
For example, running Ridge regression without standardizing features gives the penalty
disproportionate influence over large-scale variables, biasing which coefficients get shrunk.

### Summary table

| Method | Standardization needed? | Why |
|--------|------------------------|-----|
| OLS (ordinary least squares) | Optional | Coefficients adjust to scale; but standardized coefficients aid interpretation |
| Ridge / Lasso regression | **Required** | Penalty treats all coefficients equally; different scales bias the penalty |
| PCA (principal component analysis) | **Required** | PCA maximizes variance; high-scale variables dominate without standardization |
| K-Means clustering | **Required** | Distance-based; large-scale features dominate distance calculations |
| Logistic regression (with regularization) | **Required** | Same reason as Ridge/Lasso |
| Decision trees / Random forests | Not needed | Splits are based on thresholds within each feature; scale-invariant |
| Gradient boosting (XGBoost, etc.) | Not needed | Also tree-based and scale-invariant |

**Rule of thumb:** If the algorithm uses *distances* or *penalties* that combine features,
you need standardization. If the algorithm makes *one feature at a time* decisions (trees),
you do not.

### Your Turn: Demonstrate the effect of standardization on Ridge regression

In [None]:
from sklearn.linear_model import Ridge
from sklearn.preprocessing import StandardScaler

# Prepare features and target
feature_cols = ['UNRATE', 'FEDFUNDS', 'INDPRO', 'CPIAUCSL', 'T10Y2Y']
target_col = 'gdp_growth_qoq_annualized'

df_model = df[feature_cols + [target_col]].dropna()
X = df_model[feature_cols]
y = df_model[target_col]

# TODO: Fit Ridge WITHOUT standardizing. Print coefficients.
ridge_raw = Ridge(alpha=1.0)
ridge_raw.fit(X, y)
coef_raw = pd.Series(ridge_raw.coef_, index=feature_cols, name='Raw')

# TODO: Fit Ridge WITH standardized features. Print coefficients.
scaler = StandardScaler()
X_scaled = ...

ridge_scaled = Ridge(alpha=1.0)
ridge_scaled.fit(X_scaled, y)
coef_scaled = pd.Series(ridge_scaled.coef_, index=feature_cols, name='Standardized')

# Compare
comparison_df = pd.DataFrame({'Raw_coef': coef_raw, 'Standardized_coef': coef_scaled})
print('Ridge coefficients: raw vs standardized features')
comparison_df.round(4)

**Interpretation prompt:**
1. How do the coefficients differ between the raw and standardized versions?
2. Which set of coefficients is more meaningful for comparing relative feature importance?
3. Why does the Ridge penalty distort results when features have different scales?
4. In the later regularization notebook (05_regularization_ridge_lasso), you will see this
   in a full pipeline. Keep this demonstration in mind.

---

## Where This Shows Up Later

The concepts from this notebook are foundational for many later topics:

- **Regression notebooks (02_regression):** Every t-statistic in a regression summary is
  essentially a z-score: $t = \hat{\beta} / SE(\hat{\beta})$. Understanding z-scores makes
  hypothesis testing intuitive.
- **Regularization (02_regression/05_regularization_ridge_lasso):** Ridge and Lasso require
  standardized features. The demonstration above previews why.
- **PCA and unsupervised learning (04_unsupervised/01_pca_macro_factors):** PCA operates on
  variance. Without standardization, the highest-variance variable dominates the first
  principal component regardless of its economic importance.
- **Anomaly detection (04_unsupervised/03_anomaly_detection):** Z-score-based outlier
  detection is the simplest anomaly detection method. Later notebooks build more
  sophisticated approaches.

<a id="checkpoint-self-check"></a>
## Checkpoint (Self-Check)

Run these assertions to verify your work. If any fail, revisit the corresponding section.

In [None]:
# --- Self-check assertions ---

# 1. Manual z-score: check that the mean of z-scores is approximately 0
assert abs(np.mean(z_manual)) < 1e-10, 'Mean of z-scores should be ~0'

# 2. scipy z-scores: mean should be ~0, std ~1
assert abs(z_gdp.mean()) < 0.01, 'scipy z-score mean should be ~0'
assert abs(z_gdp.std() - 1.0) < 0.05, 'scipy z-score std should be ~1'

# 3. CDF values should match known results
assert abs(within_1sd - 0.6827) < 0.001, 'P(-1 < Z < 1) should be ~0.6827'
assert abs(within_2sd - 0.9545) < 0.001, 'P(-2 < Z < 2) should be ~0.9545'

# 4. Standardized DataFrame should have mean ~0 and std ~1
assert (df_standardized.mean().abs() < 0.01).all(), 'Standardized means should be ~0'
assert ((df_standardized.std() - 1.0).abs() < 0.05).all(), 'Standardized stds should be ~1'

print('All checks passed.')

## Extensions (Optional)

1. **Robust z-scores:** Instead of mean/std, use median and MAD (median absolute deviation).
   Compare outlier detection results. When would robust z-scores be preferable?
2. **Min-max normalization:** Implement min-max scaling and compare with z-score
   standardization. When is each appropriate?
3. **Rolling z-scores:** Compute z-scores using a rolling window (e.g., 20 quarters) instead
   of the full sample. How does this change outlier detection? Why might rolling z-scores
   be more appropriate for non-stationary economic data?
4. **Skewness and kurtosis:** Compute the skewness and kurtosis of GDP growth. How do these
   relate to the empirical rule deviations you found?

## Reflection

Write 2-4 sentences on each:

1. When you see a t-statistic of 2.5 in a regression table, what does that mean in terms
   of z-scores and the standard normal distribution?
2. If you were comparing the volatility of GDP growth across countries with different
   average growth rates, why would z-scores be more appropriate than raw values?
3. What is one situation where you would NOT want to standardize your data before modeling?

<a id="solutions-reference"></a>
## Solutions (Reference)

Try the TODOs first. Use these only to unblock yourself or to compare approaches.

<details><summary>Solution: Computing z-scores by hand</summary>

_One possible approach. Your variable names may differ; align them with the notebook._

```python
import numpy as np

values = np.array([2.5, 3.1, -0.8, 4.2, 1.0])

mean_val = values.mean()          # 2.0
std_val = values.std(ddof=0)      # ~1.664

z_manual = (values - mean_val) / std_val
# z_manual: [ 0.30,  0.66, -1.68,  1.32, -0.60]

for val, z in zip(values, z_manual):
    print(f'  GDP growth = {val:+.1f}%  ->  z = {z:+.2f}')
```

</details>

<details><summary>Solution: Z-scores with scipy on real data</summary>

_One possible approach._

```python
from scipy import stats

gdp_growth = df['gdp_growth_qoq_annualized'].dropna()

z_gdp = stats.zscore(gdp_growth)  # ddof=0 by default

gdp_df = pd.DataFrame({
    'gdp_growth': gdp_growth,
    'z_score': z_gdp
})

# Extreme quarters
extreme_quarters = gdp_df[gdp_df['z_score'].abs() > 2]
extreme_quarters.sort_values('z_score')
```

</details>

<details><summary>Solution: Standard normal PDF and CDF</summary>

_One possible approach._

```python
from scipy.stats import norm

x = np.linspace(-4, 4, 500)
pdf_vals = norm.pdf(x)

within_1sd = norm.cdf(1) - norm.cdf(-1)   # 0.6827
within_2sd = norm.cdf(2) - norm.cdf(-2)   # 0.9545
within_3sd = norm.cdf(3) - norm.cdf(-3)   # 0.9973
above_196 = 1 - norm.cdf(1.96)             # 0.0250
```

</details>

<details><summary>Solution: Empirical rule verification</summary>

_One possible approach._

```python
rng = np.random.default_rng(42)
sim_data = rng.normal(loc=0, scale=1, size=10_000)
sim_z = (sim_data - sim_data.mean()) / sim_data.std()

pct_1sd = 100 * np.mean(np.abs(sim_z) <= 1)
pct_2sd = 100 * np.mean(np.abs(sim_z) <= 2)
pct_3sd = 100 * np.mean(np.abs(sim_z) <= 3)

# For GDP growth z-scores (z_gdp computed earlier):
gdp_pct_1sd = 100 * np.mean(np.abs(z_gdp) <= 1)
gdp_pct_2sd = 100 * np.mean(np.abs(z_gdp) <= 2)
gdp_pct_3sd = 100 * np.mean(np.abs(z_gdp) <= 3)
```

</details>

<details><summary>Solution: Standardization for comparing across scales</summary>

_One possible approach._

```python
indicators = ['gdp_growth_qoq_annualized', 'UNRATE', 'FEDFUNDS', 'T10Y2Y']
df_raw = df[indicators].dropna()

df_standardized = (df_raw - df_raw.mean()) / df_raw.std()

# Plot
fig, axes = plt.subplots(1, 2, figsize=(16, 5))

df_raw.plot(ax=axes[0])
axes[0].set_title('Raw Indicators (different scales)')

df_standardized.plot(ax=axes[1])
axes[1].set_title('Standardized Indicators (z-scores)')
axes[1].axhline(0, color='black', lw=0.8, ls='--')

plt.tight_layout()
plt.show()

# Correlation
corr_matrix = df_standardized.corr()
```

</details>

<details><summary>Solution: Z-scores as outlier detectors</summary>

_One possible approach._

```python
threshold = 2.5

outlier_flags = df_standardized.abs() > threshold
outlier_counts = outlier_flags.sum()

any_outlier = outlier_flags.any(axis=1)
outlier_quarters = df_standardized[any_outlier]
```

</details>

<details><summary>Solution: Ridge with and without standardization</summary>

_One possible approach._

```python
from sklearn.linear_model import Ridge
from sklearn.preprocessing import StandardScaler

feature_cols = ['UNRATE', 'FEDFUNDS', 'INDPRO', 'CPIAUCSL', 'T10Y2Y']
target_col = 'gdp_growth_qoq_annualized'

df_model = df[feature_cols + [target_col]].dropna()
X = df_model[feature_cols]
y = df_model[target_col]

# Without standardization
ridge_raw = Ridge(alpha=1.0).fit(X, y)

# With standardization
scaler = StandardScaler()
X_scaled = pd.DataFrame(scaler.fit_transform(X), columns=feature_cols, index=X.index)

ridge_scaled = Ridge(alpha=1.0).fit(X_scaled, y)

comparison_df = pd.DataFrame({
    'Raw_coef': ridge_raw.coef_,
    'Standardized_coef': ridge_scaled.coef_
}, index=feature_cols)
```

</details>