# Lesson 3 Module 1: Bias–Variance Tradeoff

This notebook demonstrates the bias–variance tradeoff using practical examples.
It builds on Lesson 2 (MLE/MoM estimators) and provides foundation for confidence intervals.

## Learning Objectives
- Define bias, variance, and mean squared error of estimators
- Explain the bias–variance decomposition: MSE = Bias^2 + Var
- Apply tradeoff concepts to shrinkage estimators and sample variance
- Connect to Lesson 2 estimator properties and Lesson 1 sampling distributions

## Repository Context
- Uses datasets from `shared/data/` where appropriate
- Connects to Lesson 1 (LLN/CLT) and Lesson 2 (MLE/MoM)
- Uses helper functions from the appendix

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats

# Set style and random seed
sns.set_theme(context="talk", style="whitegrid")
sns.set_palette(["#000000", "#E69F00", "#56B4E9", "#009E73",
                 "#F0E442", "#0072B2", "#D55E00", "#CC79A7"])
rng = np.random.default_rng(2025)

print("Environment setup complete. Random seed: 2025")

## 1. Bias, Variance, and MSE Definitions

Let's implement the core concepts and verify the decomposition.

In [None]:
def compute_estimator_properties(estimator_fn, true_theta, gen_fn, n, R=10000):
    """
    Compute empirical bias, variance, and MSE for an estimator.

    Parameters:
    -----------
    estimator_fn : callable
        Function that takes sample and returns estimate
    true_theta : float
        True parameter value
    gen_fn : callable
        Function that generates sample of size n
    n : int
        Sample size
    R : int
        Number of replications

    Returns:
    --------
    dict with bias, variance, mse, and estimates
    """
    estimates = np.empty(R)

    for r in range(R):
        sample = gen_fn(n)
        estimates[r] = estimator_fn(sample)

    bias = np.mean(estimates) - true_theta
    variance = np.var(estimates, ddof=0)
    mse = np.mean((estimates - true_theta)**2)

    return {
        'bias': bias,
        'variance': variance,
        'mse': mse,
        'estimates': estimates,
        'bias_squared': bias**2
    }

print("Function defined: compute_estimator_properties")

In [None]:
# Test the decomposition: MSE should equal Bias^2 + Variance
true_mu = 5.0
true_sigma = 2.0
n = 10
R = 5000

def sample_mean(x):
    return np.mean(x)

def generate_normal(n):
    return rng.normal(true_mu, true_sigma, n)

props = compute_estimator_properties(sample_mean, true_mu, generate_normal, n, R)

print(f"Sample size: {n}")
print(f"Bias: {props['bias']:.6f}")
print(f"Variance: {props['variance']:.6f}")
print(f"Bias^2: {props['bias_squared']:.6f}")
print(f"MSE: {props['mse']:.6f}")
print(f"Bias^2 + Var: {props['bias_squared'] + props['variance']:.6f}")
print(f"Decomposition check: {abs(props['mse'] - (props['bias_squared'] + props['variance'])) < 1e-10}")

## 2. Shrinkage Estimator Example

Using heights data from the repository to demonstrate shrinkage.

In [None]:
# Load heights data
heights_df = pd.read_csv("../../../shared/data/heights_weights_sample.csv")
heights = heights_df['height_cm'].values

print(f"Loaded {len(heights)} height measurements")
print(f"Sample mean: {np.mean(heights):.2f} cm")
print(f"Sample std: {np.std(heights, ddof=1):.2f} cm")

# Use population mean as prior (approximate)
mu_0 = 170  # Prior guess for average height
true_mu = np.mean(heights)  # Use sample mean as proxy for true mean

In [None]:
def shrinkage_estimator(x, alpha, mu_0):
    """Shrinkage estimator: alpha * mean(x) + (1-alpha) * mu_0"""
    return alpha * np.mean(x) + (1 - alpha) * mu_0

# Compare shrinkage estimators for different alpha values
n_small = 20  # Small sample
alpha_values = np.linspace(0, 1, 21)
R = 2000

results = []
for alpha in alpha_values:
    def est_fn(x):
        return shrinkage_estimator(x, alpha, mu_0)

    props = compute_estimator_properties(est_fn, true_mu, generate_normal, n_small, R)
    results.append({
        'alpha': alpha,
        'bias': props['bias'],
        'variance': props['variance'],
        'mse': props['mse'],
        'bias_squared': props['bias_squared']
    })

shrinkage_df = pd.DataFrame(results)
print("Computed shrinkage estimator properties for alpha in [0,1]")

In [None]:
# Plot the tradeoff
fig, axes = plt.subplots(2, 2, figsize=(12, 10))

# Bias and variance vs alpha
axes[0,0].plot(shrinkage_df['alpha'], shrinkage_df['bias'], 'b-', linewidth=2, label='Bias')
axes[0,0].plot(shrinkage_df['alpha'], shrinkage_df['variance'], 'r-', linewidth=2, label='Variance')
axes[0,0].set_xlabel(r'$\alpha$')
axes[0,0].set_ylabel('Value')
axes[0,0].set_title('Bias and Variance vs Shrinkage Parameter')
axes[0,0].legend()
axes[0,0].grid(True, alpha=0.3)

# Bias squared vs alpha
axes[0,1].plot(shrinkage_df['alpha'], shrinkage_df['bias_squared'], 'g-', linewidth=2)
axes[0,1].set_xlabel(r'$\alpha$')
axes[0,1].set_ylabel('Bias^2')
axes[0,1].set_title('Bias Squared vs Shrinkage Parameter')
axes[0,1].grid(True, alpha=0.3)

# MSE decomposition
axes[1,0].plot(shrinkage_df['alpha'], shrinkage_df['bias_squared'], 'g-', linewidth=2, label='Bias^2')
axes[1,0].plot(shrinkage_df['alpha'], shrinkage_df['variance'], 'r-', linewidth=2, label='Variance')
axes[1,0].plot(shrinkage_df['alpha'], shrinkage_df['mse'], 'b-', linewidth=3, label='MSE')
axes[1,0].set_xlabel(r'$\alpha$')
axes[1,0].set_ylabel('Value')
axes[1,0].set_title('MSE Decomposition')
axes[1,0].legend()
axes[1,0].grid(True, alpha=0.3)

# Find optimal alpha
optimal_idx = shrinkage_df['mse'].argmin()
optimal_alpha = shrinkage_df.iloc[optimal_idx]['alpha']
axes[1,0].axvline(optimal_alpha, color='black', linestyle='--', alpha=0.7)
# axes[1,0].text(optimal_alpha + 0.05, 0.5, f'Optimal alpha = {optimal_alpha:.2f}',
#                bbox=dict(boxstyle='round,pad=0.3', facecolor='white', alpha=0.8))

# MSE vs alpha
axes[1,1].plot(shrinkage_df['alpha'], shrinkage_df['mse'], 'b-', linewidth=3)
axes[1,1].axvline(optimal_alpha, color='black', linestyle='--', alpha=0.7)
axes[1,1].set_xlabel(r'$\alpha$')
axes[1,1].set_ylabel('MSE')
axes[1,1].set_title('MSE vs Shrinkage Parameter')
axes[1,1].grid(True, alpha=0.3)

plt.tight_layout()
plt.savefig('../slides/figures/bias_variance_tradeoff.png', dpi=150, bbox_inches='tight')
plt.show()

print(f"Optimal shrinkage parameter: {optimal_alpha:.3f}")
print(f"Minimum MSE: {shrinkage_df['mse'].min():.6f}")

## 3. Sample Variance Estimators Comparison

Compare the unbiased and MLE variance estimators.

In [None]:
def unbiased_variance(x):
    """Unbiased sample variance (denominator n-1)"""
    return np.var(x, ddof=1)

def mle_variance(x):
    """MLE sample variance (denominator n)"""
    return np.var(x, ddof=0)

# Compare across different sample sizes
sample_sizes = [5, 10, 20, 50, 100]
R = 5000
true_sigma2 = 4.0  # True variance

comparison_results = []
for n in sample_sizes:
    # Unbiased estimator
    props_unbiased = compute_estimator_properties(
        unbiased_variance, true_sigma2, generate_normal, n, R
    )

    # MLE estimator
    props_mle = compute_estimator_properties(
        mle_variance, true_sigma2, generate_normal, n, R
    )

    comparison_results.append({
        'n': n,
        'unbiased_bias': props_unbiased['bias'],
        'unbiased_var': props_unbiased['variance'],
        'unbiased_mse': props_unbiased['mse'],
        'mle_bias': props_mle['bias'],
        'mle_var': props_mle['variance'],
        'mle_mse': props_mle['mse']
    })

var_comparison_df = pd.DataFrame(comparison_results)
print("Computed variance estimator comparison across sample sizes")

In [None]:
# Plot comparison
fig, axes = plt.subplots(2, 2, figsize=(12, 10))

# Bias comparison
axes[0,0].plot(var_comparison_df['n'], var_comparison_df['unbiased_bias'], 'b-', linewidth=2, label='Unbiased')
axes[0,0].plot(var_comparison_df['n'], var_comparison_df['mle_bias'], 'r-', linewidth=2, label='MLE')
axes[0,0].axhline(0, color='black', linestyle='--', alpha=0.5)
axes[0,0].set_xlabel('Sample Size')
axes[0,0].set_ylabel('Bias')
axes[0,0].set_title('Bias Comparison')
axes[0,0].legend()
axes[0,0].grid(True, alpha=0.3)

# Variance comparison
axes[0,1].plot(var_comparison_df['n'], var_comparison_df['unbiased_var'], 'b-', linewidth=2, label='Unbiased')
axes[0,1].plot(var_comparison_df['n'], var_comparison_df['mle_var'], 'r-', linewidth=2, label='MLE')
axes[0,1].set_xlabel('Sample Size')
axes[0,1].set_ylabel('Variance')
axes[0,1].set_title('Variance Comparison')
axes[0,1].legend()
axes[0,1].grid(True, alpha=0.3)

# MSE comparison
axes[1,0].plot(var_comparison_df['n'], var_comparison_df['unbiased_mse'], 'b-', linewidth=2, label='Unbiased')
axes[1,0].plot(var_comparison_df['n'], var_comparison_df['mle_mse'], 'r-', linewidth=2, label='MLE')
axes[1,0].set_xlabel('Sample Size')
axes[1,0].set_ylabel('MSE')
axes[1,0].set_title('MSE Comparison')
axes[1,0].legend()
axes[1,0].grid(True, alpha=0.3)

# MSE decomposition for n=20
n_idx = 2  # n=20
unbiased_mse = var_comparison_df.iloc[n_idx]['unbiased_mse']
unbiased_bias2 = var_comparison_df.iloc[n_idx]['unbiased_bias']**2
unbiased_var = var_comparison_df.iloc[n_idx]['unbiased_var']

mle_mse = var_comparison_df.iloc[n_idx]['mle_mse']
mle_bias2 = var_comparison_df.iloc[n_idx]['mle_bias']**2
mle_var = var_comparison_df.iloc[n_idx]['mle_var']

x_pos = var_comparison_df.iloc[n_idx]['n']
axes[1,1].bar([x_pos-2, x_pos+2], [unbiased_bias2, mle_bias2], width=1.5, label='Bias^2')
axes[1,1].bar([x_pos-2, x_pos+2], [unbiased_var, mle_var], width=1.5, bottom=[unbiased_bias2, mle_bias2], label='Variance')
axes[1,1].axhline(unbiased_mse, color='blue', linestyle='--', alpha=0.7)
axes[1,1].axhline(mle_mse, color='red', linestyle='--', alpha=0.7)
axes[1,1].set_xlabel('Estimator (n=20)')
axes[1,1].set_ylabel('MSE Components')
axes[1,1].set_title('MSE Decomposition (n=20)')
axes[1,1].set_xticks([x_pos-2, x_pos+2])
axes[1,1].set_xticklabels(['Unbiased', 'MLE'])
axes[1,1].legend()
axes[1,1].grid(True, alpha=0.3)

plt.tight_layout()
plt.savefig('../slides/figures/variance_estimators_comparison.png', dpi=150, bbox_inches='tight')
plt.show()

print("Variance estimator comparison complete")
print(f"For n=20: Unbiased MSE = {unbiased_mse:.6f}, MLE MSE = {mle_mse:.6f}")

## 4. Simulation Exercises

These simulations correspond to Exercises 3 and 4 from the workbook and extend the notebook with reproducible Monte Carlo analyses.


### Exercise 3: Shrinkage Estimator Bias–Variance Tradeoff

We investigate how the shrinkage parameter $\alpha$ balances bias and variance when blending the sample mean with a prior guess $\mu_0 = 170$ for the heights data. Each point below summarises 1,000 bootstrap samples of size $n = 30$.


In [None]:
# Monte Carlo simulation for Exercise 3
sim_rng = np.random.default_rng(2025)
alphas = np.linspace(0.0, 1.0, 21)
n_samples = 30
n_sims = 1000

records = []
for alpha in alphas:
    samples = sim_rng.choice(heights, size=(n_sims, n_samples), replace=True)
    sample_means = samples.mean(axis=1)
    estimates = alpha * sample_means + (1.0 - alpha) * mu_0

    bias = estimates.mean() - true_mu
    variance = estimates.var()
    mse = np.mean((estimates - true_mu) ** 2)

    records.append({
        'alpha': alpha,
        'bias': bias,
        'bias_squared': bias ** 2,
        'variance': variance,
        'mse': mse
    })

shrinkage_sim_df = pd.DataFrame(records)
shrinkage_sim_df.head()


In [None]:
fig, ax = plt.subplots(figsize=(10, 6))
ax.plot(shrinkage_sim_df['alpha'], shrinkage_sim_df['bias_squared'], marker='o', label='Bias^2')
ax.plot(shrinkage_sim_df['alpha'], shrinkage_sim_df['variance'], marker='s', label='Variance')
ax.plot(shrinkage_sim_df['alpha'], shrinkage_sim_df['mse'], marker='^', linewidth=2, label='MSE')
ax.axvline(1.0, color='red', linestyle='--', alpha=0.5, label='No shrinkage (alpha = 1)')
optimal_row = shrinkage_sim_df.loc[shrinkage_sim_df['mse'].idxmin()]
ax.axvline(optimal_row['alpha'], color='black', linestyle=':', alpha=0.7, label=f"Min MSE alpha = {optimal_row['alpha']:.2f}")
ax.set_xlabel('Shrinkage parameter alpha')
ax.set_ylabel('Value')
ax.set_title('Bias^2, Variance, and MSE for Shrinkage Estimator')
ax.grid(True, alpha=0.3)
ax.legend()
plt.show()

print(f"Minimum MSE {optimal_row['mse']:.6f} achieved at alpha = {optimal_row['alpha']:.2f}")


**Observations:** For small $\alpha$ the estimator leans heavily on the prior and exhibits low variance but large (squared) bias. Increasing $\alpha$ reduces the bias at the cost of higher variance, and the minimum MSE occurs at an intermediate value that balances the two sources of error.


### Exercise 4: MSE of Variance Estimators

We compare the unbiased sample variance $s^2$ and the MLE variance estimator across sample sizes $n = 5, 10, 20, 50$ using normal samples.


In [None]:
# Monte Carlo simulation for Exercise 4
sim_rng = np.random.default_rng(2026)
sample_sizes = [5, 10, 20, 50]
n_sims = 10000
sigma2_true = 1.0

var_records = []
for n in sample_sizes:
    samples = sim_rng.normal(loc=0.0, scale=np.sqrt(sigma2_true), size=(n_sims, n))
    s2_estimates = samples.var(axis=1, ddof=1)
    mle_estimates = samples.var(axis=1, ddof=0)

    var_records.append({
        'n': n,
        'unbiased_bias': s2_estimates.mean() - sigma2_true,
        'unbiased_variance': s2_estimates.var(),
        'unbiased_mse': np.mean((s2_estimates - sigma2_true) ** 2),
        'mle_bias': mle_estimates.mean() - sigma2_true,
        'mle_variance': mle_estimates.var(),
        'mle_mse': np.mean((mle_estimates - sigma2_true) ** 2)
    })

variance_sim_df = pd.DataFrame(var_records)
variance_sim_df


In [None]:
fig, axes = plt.subplots(1, 2, figsize=(14, 5))
axes[0].plot(variance_sim_df['n'], variance_sim_df['unbiased_mse'], marker='o', linewidth=2, label='s^2 (unbiased)')
axes[0].plot(variance_sim_df['n'], variance_sim_df['mle_mse'], marker='s', linewidth=2, label='MLE (biased)')
axes[0].set_xlabel('Sample size n')
axes[0].set_ylabel('MSE')
axes[0].set_title('MSE Comparison')
axes[0].grid(True, alpha=0.3)
axes[0].legend()

n_focus = 10
focus_row = variance_sim_df.loc[variance_sim_df['n'] == n_focus].iloc[0]
axes[1].bar(['s^2', 'MLE'], [focus_row['unbiased_bias'] ** 2, focus_row['mle_bias'] ** 2], label='Bias^2', alpha=0.7)
axes[1].bar(['s^2', 'MLE'], [focus_row['unbiased_variance'], focus_row['mle_variance']],
           bottom=[focus_row['unbiased_bias'] ** 2, focus_row['mle_bias'] ** 2], label='Variance', alpha=0.7)
axes[1].set_ylabel('Contribution')
axes[1].set_title(f'Bias-Variance Decomposition (n={n_focus})')
axes[1].grid(True, axis='y', alpha=0.3)
axes[1].legend()

plt.tight_layout()
plt.show()


**Takeaways:** The MLE variance estimator is biased downward but enjoys lower variance in small samples, often yielding a lower MSE for modest $n$. The unbiased estimator eliminates bias but pays for it with higher variability; as $n$ grows, both estimators converge.


## 5. Summary and Key Takeaways

This notebook demonstrated:
1. The fundamental bias-variance decomposition
2. How shrinkage estimators trade bias for reduced variance
3. The practical differences between unbiased and MLE variance estimators
4. How to evaluate estimators using MSE in practice

Key insights:
- Unbiased estimators aren't always better when variance is high
- The optimal estimator depends on the specific use case
- Simulation is essential for understanding finite-sample properties