# Notebook 02: Random Effects Tobit for Panel Data -- SOLUTIONS

**This is the worked solution notebook.**  
It provides complete, working solutions for all 4 exercises from `02_tobit_panel.ipynb`.

> Instructors: do not distribute this file to students before they complete the tutorial notebook.

## Exercise Overview

| Exercise | Topic | Key Concept |
|----------|-------|-------------|
| 1 | Quadrature Sensitivity | Gauss-Hermite quadrature point selection |
| 2 | Subsample Analysis by Gender | Heterogeneity in model parameters |
| 3 | Marginal Effects | Unconditional vs conditional effects |
| 4 | Prediction Performance | Out-of-sample evaluation, Pooled vs RE |

---

## Setup

In [None]:
# Standard libraries
import sys
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from pathlib import Path
import warnings
warnings.filterwarnings('ignore')

from scipy import stats
import statsmodels.api as sm

# PanelBox imports
from panelbox.models.censored import PooledTobit, RandomEffectsTobit

# Visualization configuration
plt.style.use('seaborn-v0_8-whitegrid')
sns.set_palette('husl')
plt.rcParams['figure.figsize'] = (10, 6)
plt.rcParams['figure.dpi'] = 100
plt.rcParams['font.size'] = 11
plt.rcParams['axes.titlesize'] = 14
plt.rcParams['axes.labelsize'] = 12
pd.set_option('display.max_columns', None)
pd.set_option('display.precision', 4)

# Set random seed for reproducibility
np.random.seed(42)

# Define paths
BASE_DIR = Path('..')
DATA_DIR = BASE_DIR / 'data'
OUTPUT_DIR = BASE_DIR / 'outputs'
FIGURES_DIR = OUTPUT_DIR / 'figures' / '02_tobit_panel'
TABLES_DIR = OUTPUT_DIR / 'tables' / '02_tobit_panel'

# Create output directories
FIGURES_DIR.mkdir(parents=True, exist_ok=True)
TABLES_DIR.mkdir(parents=True, exist_ok=True)

# Add utils to path for data generation fallback
sys.path.insert(0, str(BASE_DIR / 'utils'))

print('Setup complete!')

---

## Data Loading

In [None]:
# Load dataset (or generate if CSV not found)
data_path = DATA_DIR / 'health_expenditure_panel.csv'

if data_path.exists():
    df = pd.read_csv(data_path)
    print(f'Loaded data from {data_path}')
else:
    from data_generation import generate_health_panel
    df = generate_health_panel(n=500, t=4, seed=42)
    df.to_csv(data_path, index=False)
    print(f'Generated and saved data to {data_path}')

print(f'Dataset shape: {df.shape}')
print(f'N individuals: {df["id"].nunique()}')
print(f'T periods: {df["time"].nunique()} ({df["time"].min()}-{df["time"].max()})')
print(f'Censoring rate: {(df["expenditure"] == 0).mean() * 100:.1f}%')
print()
display(df.head(8))

---

## Data Preparation and Base Model

Before tackling the exercises, we prepare the estimation data and fit a baseline
Random Effects Tobit model that will serve as the reference for all exercises.

In [None]:
# Prepare data for estimation
y = df['expenditure'].values

# Explanatory variables (with constant)
X_vars = df[['income', 'age', 'chronic', 'insurance', 'female', 'bmi']].values
X = sm.add_constant(X_vars)

var_names = ['const', 'income', 'age', 'chronic', 'insurance', 'female', 'bmi']

# Panel identifiers
groups = df['id'].values
time = df['time'].values

print('Data preparation:')
print(f'  y shape:       {y.shape}')
print(f'  X shape:       {X.shape}')
print(f'  N individuals: {len(np.unique(groups))}')
print(f'  T periods:     {len(np.unique(time))}')
print(f'  Variables:     {var_names}')

In [None]:
# ============================================================
# Fit base RE Tobit model (Q=12, full sample)
# This serves as the reference for all exercises.
# ============================================================

print('Fitting base Random Effects Tobit (Q=12, full sample)...')
print('=' * 60)

re_base = RandomEffectsTobit(
    endog=y,
    exog=X,
    groups=groups,
    time=time,
    censoring_point=0.0,
    censoring_type='left',
    quadrature_points=12
)
re_base.fit(method='BFGS', maxiter=1000)

print()
print(re_base.summary())

# Variance decomposition
rho_base = re_base.sigma_alpha**2 / (re_base.sigma_alpha**2 + re_base.sigma_eps**2)
print(f'\nIntra-class correlation (rho): {rho_base:.4f}')

---

## Exercise 1: Quadrature Sensitivity

**Task**: Re-estimate the Random Effects Tobit model with different numbers of
Gauss-Hermite quadrature points: **6, 12, and 24**. Compare the estimated
coefficients, $\sigma_\alpha$, $\sigma_\varepsilon$, and log-likelihood.
At what point do the results stabilize?

### Background

Gauss-Hermite quadrature approximates the integral over the random effect:

$$L_i = \int_{-\infty}^{\infty} \prod_{t=1}^{T_i} f(y_{it} | X_{it}, \alpha_i) \, \phi(\alpha_i / \sigma_\alpha) \, d\alpha_i
\approx \sum_{q=1}^{Q} w_q \prod_{t=1}^{T_i} f(y_{it} | X_{it}, \sqrt{2}\sigma_\alpha \cdot n_q)$$

More quadrature points ($Q$) mean a more accurate integral approximation, but
at the cost of computation time. In practice, we need enough points so that
the estimates have converged -- adding more points should not materially
change the results.

In [None]:
# ============================================================
# Exercise 1 Solution: Quadrature Sensitivity Analysis
# ============================================================

quad_points_list = [6, 12, 24]
quad_results = {}

for nq in quad_points_list:
    print(f'\nFitting RE Tobit with Q = {nq} quadrature points...')
    print('-' * 50)
    
    model_q = RandomEffectsTobit(
        endog=y,
        exog=X,
        groups=groups,
        time=time,
        censoring_point=0.0,
        censoring_type='left',
        quadrature_points=nq
    )
    model_q.fit(method='BFGS', maxiter=1000)
    
    rho_q = model_q.sigma_alpha**2 / (model_q.sigma_alpha**2 + model_q.sigma_eps**2)
    
    quad_results[nq] = {
        'model': model_q,
        'beta': model_q.beta.copy(),
        'sigma_eps': model_q.sigma_eps,
        'sigma_alpha': model_q.sigma_alpha,
        'llf': model_q.llf,
        'rho': rho_q,
        'converged': model_q.converged
    }
    
    print(f'  Log-likelihood: {model_q.llf:.4f}')
    print(f'  sigma_eps:      {model_q.sigma_eps:.4f}')
    print(f'  sigma_alpha:    {model_q.sigma_alpha:.4f}')
    print(f'  rho (ICC):      {rho_q:.4f}')
    print(f'  Converged:      {model_q.converged}')

print('\nAll models fitted successfully.')

In [None]:
# ============================================================
# Comparison table: coefficients across quadrature specifications
# ============================================================

K = len(var_names)

# Build comparison DataFrame
rows = []
for nq in quad_points_list:
    r = quad_results[nq]
    row = {'Q': nq}
    for i, vname in enumerate(var_names):
        row[vname] = r['beta'][i]
    row['sigma_eps'] = r['sigma_eps']
    row['sigma_alpha'] = r['sigma_alpha']
    row['rho'] = r['rho']
    row['log_lik'] = r['llf']
    row['converged'] = r['converged']
    rows.append(row)

quad_comparison = pd.DataFrame(rows).set_index('Q')

print('Quadrature Sensitivity: Coefficient Comparison')
print('=' * 90)
display(quad_comparison.round(4))

In [None]:
# ============================================================
# Compute absolute and relative differences (Q=6 vs Q=12, Q=12 vs Q=24)
# ============================================================

compare_cols = var_names + ['sigma_eps', 'sigma_alpha', 'rho', 'log_lik']

# Differences between Q=6 and Q=12
diff_6_12 = quad_comparison.loc[12, compare_cols] - quad_comparison.loc[6, compare_cols]
rel_diff_6_12 = (diff_6_12 / quad_comparison.loc[6, compare_cols].abs()).replace([np.inf, -np.inf], np.nan)

# Differences between Q=12 and Q=24
diff_12_24 = quad_comparison.loc[24, compare_cols] - quad_comparison.loc[12, compare_cols]
rel_diff_12_24 = (diff_12_24 / quad_comparison.loc[12, compare_cols].abs()).replace([np.inf, -np.inf], np.nan)

diff_table = pd.DataFrame({
    'Q=6 value': quad_comparison.loc[6, compare_cols],
    'Q=12 value': quad_comparison.loc[12, compare_cols],
    'Q=24 value': quad_comparison.loc[24, compare_cols],
    'Diff (6->12)': diff_6_12,
    '% Change (6->12)': rel_diff_6_12 * 100,
    'Diff (12->24)': diff_12_24,
    '% Change (12->24)': rel_diff_12_24 * 100
})

print('Quadrature Sensitivity: Change Analysis')
print('=' * 100)
display(diff_table.round(4))

In [None]:
# ============================================================
# Visualization: Coefficient stability across Q
# ============================================================

fig, axes = plt.subplots(2, 2, figsize=(14, 10))

# 1. Coefficient values by Q (slope coefficients only)
ax = axes[0, 0]
slope_vars = var_names[1:]  # Exclude constant
for vname in slope_vars:
    vals = [quad_results[nq]['beta'][var_names.index(vname)] for nq in quad_points_list]
    ax.plot(quad_points_list, vals, marker='o', linewidth=2, markersize=8, label=vname)
ax.set_xlabel('Number of Quadrature Points (Q)', fontsize=12)
ax.set_ylabel('Coefficient Value', fontsize=12)
ax.set_title('Coefficient Stability Across Q', fontsize=13)
ax.set_xticks(quad_points_list)
ax.legend(fontsize=9, loc='best')
ax.grid(True, alpha=0.3)

# 2. Variance components by Q
ax = axes[0, 1]
sigma_eps_vals = [quad_results[nq]['sigma_eps'] for nq in quad_points_list]
sigma_alpha_vals = [quad_results[nq]['sigma_alpha'] for nq in quad_points_list]
ax.plot(quad_points_list, sigma_eps_vals, marker='s', linewidth=2.5,
        markersize=10, color='steelblue', label=r'$\sigma_\varepsilon$')
ax.plot(quad_points_list, sigma_alpha_vals, marker='^', linewidth=2.5,
        markersize=10, color='coral', label=r'$\sigma_\alpha$')
ax.set_xlabel('Number of Quadrature Points (Q)', fontsize=12)
ax.set_ylabel('Standard Deviation', fontsize=12)
ax.set_title('Variance Component Stability', fontsize=13)
ax.set_xticks(quad_points_list)
ax.legend(fontsize=11)
ax.grid(True, alpha=0.3)

# 3. Log-likelihood by Q
ax = axes[1, 0]
llf_vals = [quad_results[nq]['llf'] for nq in quad_points_list]
ax.plot(quad_points_list, llf_vals, marker='D', linewidth=2.5,
        markersize=10, color='seagreen')
for nq, llf_val in zip(quad_points_list, llf_vals):
    ax.annotate(f'{llf_val:.2f}', (nq, llf_val), textcoords='offset points',
                xytext=(0, 12), ha='center', fontsize=10, fontweight='bold')
ax.set_xlabel('Number of Quadrature Points (Q)', fontsize=12)
ax.set_ylabel('Log-Likelihood', fontsize=12)
ax.set_title('Log-Likelihood Convergence', fontsize=13)
ax.set_xticks(quad_points_list)
ax.grid(True, alpha=0.3)

# 4. ICC (rho) by Q
ax = axes[1, 1]
rho_vals = [quad_results[nq]['rho'] for nq in quad_points_list]
ax.plot(quad_points_list, rho_vals, marker='o', linewidth=2.5,
        markersize=10, color='purple')
for nq, rho_val in zip(quad_points_list, rho_vals):
    ax.annotate(f'{rho_val:.4f}', (nq, rho_val), textcoords='offset points',
                xytext=(0, 12), ha='center', fontsize=10, fontweight='bold')
ax.set_xlabel('Number of Quadrature Points (Q)', fontsize=12)
ax.set_ylabel(r'Intra-class Correlation ($\rho$)', fontsize=12)
ax.set_title('ICC Stability', fontsize=13)
ax.set_xticks(quad_points_list)
ax.grid(True, alpha=0.3)

plt.tight_layout()
plt.savefig(FIGURES_DIR / 'ex1_quadrature_sensitivity.png', dpi=300, bbox_inches='tight')
plt.show()

### Exercise 1: Discussion

**Key findings from the quadrature sensitivity analysis:**

1. **Coefficient stability**: The slope coefficients are generally stable across
   all three quadrature specifications (Q=6, 12, 24). The percent changes from
   Q=12 to Q=24 are much smaller than from Q=6 to Q=12, indicating convergence.

2. **Variance components**: `sigma_eps` and `sigma_alpha` show minor sensitivity
   to the number of quadrature points. By Q=12, the estimates are essentially
   stabilized. The change from Q=12 to Q=24 is negligible.

3. **Log-likelihood**: The log-likelihood values converge as Q increases. The
   difference between Q=12 and Q=24 is typically tiny compared to Q=6 vs Q=12.

4. **Practical recommendation**: For this dataset, Q=12 (the default) provides
   adequate accuracy. This is consistent with the general recommendation in the
   literature: Q=12 to Q=20 is typically sufficient for models with a single
   random effect. Q=6 may introduce non-trivial approximation error, while
   Q=24 offers negligible improvement over Q=12 at greater computational cost.

5. **When to use more points**: Higher Q may be needed when `sigma_alpha` is large
   relative to `sigma_eps` (strong individual effects make the integrand more
   peaked), or with highly unbalanced panels.

---

## Exercise 2: Subsample Analysis by Gender

**Task**: Estimate separate RE Tobit models for males and females. Compare:
- Do the key coefficients (income, chronic conditions) differ across genders?
- Is the intra-class correlation $\rho$ different for males vs females?
- What does this suggest about gender-specific health expenditure dynamics?

### Motivation

Health expenditure patterns may differ systematically between men and women.
Rather than only including a `female` dummy in the pooled specification,
estimating separate models allows **all coefficients** to differ by gender,
revealing structural differences in the expenditure process.

In [None]:
# ============================================================
# Exercise 2 Solution: Subsample Analysis by Gender
# ============================================================

# Create masks
mask_female = df['female'].values == 1
mask_male = df['female'].values == 0

print(f'Sample sizes:')
print(f'  Males:   {mask_male.sum()} obs from {df.loc[mask_male, "id"].nunique()} individuals')
print(f'  Females: {mask_female.sum()} obs from {df.loc[mask_female, "id"].nunique()} individuals')
print(f'  Censoring rate (males):   {(y[mask_male] == 0).mean() * 100:.1f}%')
print(f'  Censoring rate (females): {(y[mask_female] == 0).mean() * 100:.1f}%')

# For subsample models, exclude the 'female' variable since it's constant
# Use: const, income, age, chronic, insurance, bmi
X_sub_vars = df[['income', 'age', 'chronic', 'insurance', 'bmi']].values
X_sub = sm.add_constant(X_sub_vars)
sub_var_names = ['const', 'income', 'age', 'chronic', 'insurance', 'bmi']

In [None]:
# ============================================================
# Fit RE Tobit for males
# ============================================================

print('Fitting RE Tobit for MALES...')
print('=' * 60)

re_male = RandomEffectsTobit(
    endog=y[mask_male],
    exog=X_sub[mask_male],
    groups=groups[mask_male],
    time=time[mask_male],
    censoring_point=0.0,
    censoring_type='left',
    quadrature_points=12
)
re_male.fit(method='BFGS', maxiter=1000)

print()
print(re_male.summary())

rho_male = re_male.sigma_alpha**2 / (re_male.sigma_alpha**2 + re_male.sigma_eps**2)
print(f'\nICC (males): {rho_male:.4f}')

In [None]:
# ============================================================
# Fit RE Tobit for females
# ============================================================

print('Fitting RE Tobit for FEMALES...')
print('=' * 60)

re_female = RandomEffectsTobit(
    endog=y[mask_female],
    exog=X_sub[mask_female],
    groups=groups[mask_female],
    time=time[mask_female],
    censoring_point=0.0,
    censoring_type='left',
    quadrature_points=12
)
re_female.fit(method='BFGS', maxiter=1000)

print()
print(re_female.summary())

rho_female = re_female.sigma_alpha**2 / (re_female.sigma_alpha**2 + re_female.sigma_eps**2)
print(f'\nICC (females): {rho_female:.4f}')

In [None]:
# ============================================================
# Side-by-side comparison table
# ============================================================

K_sub = len(sub_var_names)

male_bse = re_male.bse[:K_sub]
female_bse = re_female.bse[:K_sub]

gender_comparison = pd.DataFrame({
    'Variable': sub_var_names,
    'Male_Coef': re_male.beta,
    'Male_SE': male_bse,
    'Female_Coef': re_female.beta,
    'Female_SE': female_bse,
    'Difference': re_female.beta - re_male.beta
})

print('Gender Subsample Comparison: RE Tobit Coefficients')
print('=' * 85)
display(gender_comparison.round(4))

# Variance components comparison
print(f'\nVariance Components:')
print(f'{"":20s} {"Males":>12s} {"Females":>12s} {"Difference":>12s}')
print('-' * 56)
print(f'{"sigma_eps":20s} {re_male.sigma_eps:>12.4f} {re_female.sigma_eps:>12.4f} {re_female.sigma_eps - re_male.sigma_eps:>12.4f}')
print(f'{"sigma_alpha":20s} {re_male.sigma_alpha:>12.4f} {re_female.sigma_alpha:>12.4f} {re_female.sigma_alpha - re_male.sigma_alpha:>12.4f}')
print(f'{"rho (ICC)":20s} {rho_male:>12.4f} {rho_female:>12.4f} {rho_female - rho_male:>12.4f}')
print(f'{"Log-likelihood":20s} {re_male.llf:>12.2f} {re_female.llf:>12.2f} {"":>12s}')
print(f'{"N observations":20s} {mask_male.sum():>12d} {mask_female.sum():>12d} {"":>12s}')

# Save
gender_comparison.to_csv(TABLES_DIR / 'ex2_gender_comparison.csv', index=False)
print(f'\nSaved to {TABLES_DIR / "ex2_gender_comparison.csv"}')

In [None]:
# ============================================================
# Visual comparison: Forest plot by gender
# ============================================================

fig, axes = plt.subplots(1, 2, figsize=(16, 6))

# 1. Forest plot of coefficients
ax = axes[0]
slope_sub_vars = sub_var_names[1:]  # Exclude constant
n_vars = len(slope_sub_vars)
y_pos = np.arange(n_vars)

male_coefs = re_male.beta[1:]
male_ses = male_bse[1:]
female_coefs = re_female.beta[1:]
female_ses = female_bse[1:]

ax.errorbar(male_coefs, y_pos + 0.12,
            xerr=1.96 * male_ses,
            fmt='o', color='steelblue', markersize=9, capsize=5,
            label='Males', linewidth=2)
ax.errorbar(female_coefs, y_pos - 0.12,
            xerr=1.96 * female_ses,
            fmt='s', color='coral', markersize=9, capsize=5,
            label='Females', linewidth=2)

ax.axvline(0, color='gray', linestyle='--', linewidth=1)
ax.set_yticks(y_pos)
ax.set_yticklabels(slope_sub_vars, fontsize=12)
ax.set_xlabel('Coefficient (95% CI)', fontsize=12)
ax.set_title('RE Tobit Coefficients by Gender', fontsize=14)
ax.legend(fontsize=11, loc='lower right')
ax.grid(alpha=0.3, axis='x')

# 2. Variance decomposition comparison
ax = axes[1]
categories = ['Males', 'Females']
sigma2_alpha_vals = [re_male.sigma_alpha**2, re_female.sigma_alpha**2]
sigma2_eps_vals = [re_male.sigma_eps**2, re_female.sigma_eps**2]

x_pos = np.arange(len(categories))
width = 0.35

bars1 = ax.bar(x_pos - width/2, sigma2_alpha_vals, width,
               label=r'$\sigma^2_\alpha$ (between)', color='#ff9999', edgecolor='black')
bars2 = ax.bar(x_pos + width/2, sigma2_eps_vals, width,
               label=r'$\sigma^2_\varepsilon$ (within)', color='#66b3ff', edgecolor='black')

# Add rho annotations
for i, (cat, rho_val) in enumerate(zip(categories, [rho_male, rho_female])):
    total = sigma2_alpha_vals[i] + sigma2_eps_vals[i]
    ax.text(i, total * 0.5, f'$\\rho$ = {rho_val:.3f}',
            ha='center', va='bottom', fontsize=12, fontweight='bold')

ax.set_xticks(x_pos)
ax.set_xticklabels(categories, fontsize=12)
ax.set_ylabel('Variance', fontsize=12)
ax.set_title('Variance Decomposition by Gender', fontsize=14)
ax.legend(fontsize=11)

plt.tight_layout()
plt.savefig(FIGURES_DIR / 'ex2_gender_comparison.png', dpi=300, bbox_inches='tight')
plt.show()

In [None]:
# ============================================================
# Informal test of coefficient equality via confidence intervals
# ============================================================

print('Informal Comparison: Do 95% CIs Overlap?')
print('=' * 70)
print(f'{"Variable":15s} {"Male 95% CI":>25s} {"Female 95% CI":>25s} {"Overlap?":>10s}')
print('-' * 75)

for i, vname in enumerate(slope_sub_vars):
    m_lo = male_coefs[i] - 1.96 * male_ses[i]
    m_hi = male_coefs[i] + 1.96 * male_ses[i]
    f_lo = female_coefs[i] - 1.96 * female_ses[i]
    f_hi = female_coefs[i] + 1.96 * female_ses[i]
    
    # Check overlap
    overlap = not (m_hi < f_lo or f_hi < m_lo)
    overlap_str = 'Yes' if overlap else 'NO'
    
    print(f'{vname:15s} [{m_lo:>8.3f}, {m_hi:>8.3f}]   [{f_lo:>8.3f}, {f_hi:>8.3f}]   {overlap_str:>10s}')

print()
print('Note: Non-overlapping 95% CIs strongly suggest a significant difference.')
print('Overlapping CIs do not necessarily mean no significant difference (this is')
print('a conservative test). A formal Chow-type test would require pooling the samples.')

### Exercise 2: Discussion

**Key findings from the gender subsample analysis:**

1. **Income effect**: Compare the income coefficient for males vs females.
   If different, this suggests that the relationship between income and health
   expenditure varies by gender -- potentially due to differences in healthcare
   utilization patterns or access.

2. **Chronic conditions**: The effect of chronic conditions on latent expenditure
   may differ by gender. If the coefficient is larger for one gender, it suggests
   that chronic conditions have a greater impact on their health spending.

3. **Intra-class correlation (rho)**: Differences in rho between males and females
   indicate different degrees of unobserved individual heterogeneity. A higher rho
   for one gender means more of their expenditure variation is explained by
   time-invariant individual characteristics.

4. **Practical implication**: If the models are substantially different, the
   pooled model with just a `female` dummy is restrictive -- it only allows an
   intercept shift, not different slopes. Separate models or a fully interacted
   specification would be more appropriate.

---

## Exercise 3: Marginal Effects

**Task**: Using the fitted RE Tobit model, compute the marginal effect of adding
one chronic condition on:
- The unconditional expected observed expenditure $E[y|X]$
- The conditional expected expenditure $E[y | y > 0, X]$
- The probability of positive expenditure $P(y > 0 | X)$

Compare `which='unconditional'` vs `which='conditional'`.

### Background: Three Types of Marginal Effects in the Tobit Model

In a standard Tobit with left censoring at $c=0$, the coefficient $\beta_k$ represents the
effect on the **latent** variable $y^*$. But we typically care about effects on the
**observed** outcome, which are attenuated by the censoring mechanism:

1. **Unconditional** ME: $\frac{\partial E[y|X]}{\partial x_k} = \beta_k \cdot \Phi(z)$ 
   where $z = (X'\beta - c) / \sigma$

2. **Conditional** ME: $\frac{\partial E[y|y>c,X]}{\partial x_k} = \beta_k \cdot [1 - \lambda(z)(z + \lambda(z))]$
   where $\lambda(z) = \phi(z)/\Phi(z)$ is the inverse Mills ratio

3. **Probability** ME: $\frac{\partial P(y>c|X)}{\partial x_k} = \frac{\beta_k}{\sigma} \cdot \phi(z)$

In [None]:
# ============================================================
# Exercise 3 Solution: Marginal Effects
# ============================================================

# Compute all three types of marginal effects
print('Computing marginal effects from the base RE Tobit model...')
print('=' * 60)

# Unconditional marginal effects: dE[y|X]/dx
print('\n--- Unconditional Marginal Effects: dE[y|X]/dx ---')
me_uncond = re_base.marginal_effects(at='overall', which='unconditional')
print()
display(me_uncond.summary())

In [None]:
# Conditional marginal effects: dE[y|y>0, X]/dx
print('--- Conditional Marginal Effects: dE[y|y>0, X]/dx ---')
me_cond = re_base.marginal_effects(at='overall', which='conditional')
print()
display(me_cond.summary())

In [None]:
# Probability marginal effects: dP(y>0|X)/dx
print('--- Probability Marginal Effects: dP(y>0|X)/dx ---')
me_prob = re_base.marginal_effects(at='overall', which='probability')
print()
display(me_prob.summary())

In [None]:
# ============================================================
# Focus on chronic conditions: compare all three ME types
# ============================================================

# Collect results into a focused comparison
me_types = ['unconditional', 'conditional', 'probability']
me_results = [me_uncond, me_cond, me_prob]

# Get the chronic condition index in the variable names
# The marginal_effects keys use the internal names (x0, x1, ...)
# Let's extract the chronic effect from each
chronic_effects = {}
for me_type, me_res in zip(me_types, me_results):
    me_summary = me_res.summary()
    chronic_effects[me_type] = {
        'me': me_res.marginal_effects,
        'se': me_res.std_errors,
        'summary': me_summary
    }

# Build comparison for the chronic variable specifically
# The latent coefficient for reference
chronic_idx = var_names.index('chronic')
beta_chronic = re_base.beta[chronic_idx]

print('Marginal Effect of Chronic Conditions -- Comparison')
print('=' * 65)
print(f'  Latent coefficient (beta):       {beta_chronic:.4f}')
print(f'  Interpretation: A 1-unit increase in chronic conditions')
print(f'  changes latent expenditure y* by {beta_chronic:.4f}')
print()
print('  Average Marginal Effects (all variables):')
print('-' * 65)
for me_type, me_res in zip(me_types, me_results):
    print(f'\n  {me_type.upper()}:')
    for key in me_res.marginal_effects.index:
        me_val = me_res.marginal_effects[key]
        se_val = me_res.std_errors[key]
        print(f'    {key:>5s}: ME = {me_val:>8.4f}  (SE = {se_val:.4f})')

In [None]:
# ============================================================
# Visual comparison: bar chart of ME types for all variables
# ============================================================

fig, axes = plt.subplots(1, 2, figsize=(16, 6))

# 1. All variables: unconditional vs conditional ME
ax = axes[0]
me_vars = me_uncond.marginal_effects.index.tolist()
n_me = len(me_vars)
x_pos = np.arange(n_me)
width = 0.35

uncond_vals = me_uncond.marginal_effects.values
cond_vals = me_cond.marginal_effects.values

bars1 = ax.bar(x_pos - width/2, uncond_vals, width,
               label='Unconditional', color='steelblue', edgecolor='black', alpha=0.8)
bars2 = ax.bar(x_pos + width/2, cond_vals, width,
               label='Conditional', color='coral', edgecolor='black', alpha=0.8)

ax.axhline(0, color='black', linewidth=0.5)
ax.set_xticks(x_pos)
ax.set_xticklabels(me_vars, rotation=45, ha='right', fontsize=10)
ax.set_ylabel('Average Marginal Effect', fontsize=12)
ax.set_title('Unconditional vs Conditional Marginal Effects', fontsize=13)
ax.legend(fontsize=11)
ax.grid(alpha=0.3, axis='y')

# 2. McDonald-Moffitt decomposition for all variables
# Unconditional ME = P(y>0) * Conditional ME + E[y|y>0] * Probability ME
# Show the three components for each variable
ax = axes[1]

prob_vals = me_prob.marginal_effects.values

x_pos3 = np.arange(n_me)
width3 = 0.25

ax.bar(x_pos3 - width3, uncond_vals, width3,
       label='Unconditional', color='steelblue', edgecolor='black', alpha=0.8)
ax.bar(x_pos3, cond_vals, width3,
       label='Conditional', color='coral', edgecolor='black', alpha=0.8)
ax.bar(x_pos3 + width3, prob_vals, width3,
       label='Probability', color='seagreen', edgecolor='black', alpha=0.8)

ax.axhline(0, color='black', linewidth=0.5)
ax.set_xticks(x_pos3)
ax.set_xticklabels(me_vars, rotation=45, ha='right', fontsize=10)
ax.set_ylabel('Average Marginal Effect', fontsize=12)
ax.set_title('All Three Types of Marginal Effects', fontsize=13)
ax.legend(fontsize=10)
ax.grid(alpha=0.3, axis='y')

plt.tight_layout()
plt.savefig(FIGURES_DIR / 'ex3_marginal_effects.png', dpi=300, bbox_inches='tight')
plt.show()

In [None]:
# ============================================================
# Scaling factors: ME / beta for each type
# Shows the attenuation from latent to observed effects
# ============================================================

print('Attenuation Factors: ME / beta')
print('=' * 60)
print('These show how much the latent effect is attenuated by censoring.')
print()
print(f'{"Variable":>12s} {"beta":>8s} {"Uncond ME":>10s} {"Ratio":>8s} {"Cond ME":>10s} {"Ratio":>8s}')
print('-' * 58)

for i, key in enumerate(me_vars):
    # Try to match variable name to get beta
    # The ME keys may be x0, x1, ... or the actual names
    # We use the index order (the constant is typically excluded in ME)
    # ME is computed for all exog columns, so index i maps to beta[i]
    beta_val = re_base.beta[i]
    uncond_val = uncond_vals[i]
    cond_val = cond_vals[i]
    
    ratio_u = uncond_val / beta_val if abs(beta_val) > 1e-10 else np.nan
    ratio_c = cond_val / beta_val if abs(beta_val) > 1e-10 else np.nan
    
    print(f'{key:>12s} {beta_val:>8.4f} {uncond_val:>10.4f} {ratio_u:>8.3f} {cond_val:>10.4f} {ratio_c:>8.3f}')

print()
print('Notes:')
print('  - Unconditional ratio = Phi(z_bar) approx = fraction uncensored')
print('  - Conditional ratio < 1 reflects the Mills ratio adjustment')
print('  - The unconditional ME is always smaller than the latent beta')
print('  - The conditional ME is between the unconditional ME and beta')

### Exercise 3: Discussion

**Key findings from the marginal effects analysis:**

1. **Unconditional vs Conditional**: The unconditional marginal effect is always
   smaller in absolute value than the conditional effect. This is because the
   unconditional effect accounts for the fact that some individuals are censored
   at zero -- a change in $x_k$ has no effect on their observed expenditure (it
   remains at zero).

2. **Chronic conditions**: The latent coefficient for chronic conditions is the
   largest (in the DGP, it is 3.0). The unconditional ME is attenuated by
   roughly the fraction of uncensored observations, while the conditional ME
   is attenuated by a smaller factor involving the inverse Mills ratio.

3. **Probability effects**: The probability marginal effects show how each
   variable affects the likelihood of having any positive expenditure. Variables
   with larger beta/sigma ratios have stronger effects on the extensive margin.

4. **McDonald-Moffitt decomposition**: The unconditional effect can be decomposed
   into an intensive margin (change among those already spending) and an
   extensive margin (change in the probability of any spending). This
   decomposition is:
   $$\frac{\partial E[y|X]}{\partial x_k} = P(y > 0) \cdot \frac{\partial E[y|y>0, X]}{\partial x_k} + E[y|y>0, X] \cdot \frac{\partial P(y>0|X)}{\partial x_k}$$

---

## Exercise 4: Prediction Performance

**Task**: Split the data into training (periods 1-3) and test (period 4) sets.
Estimate both Pooled Tobit and RE Tobit on the training data. Compare
prediction accuracy (RMSE, MAE) on the test set.

### Motivation

In-sample fit (e.g., log-likelihood, AIC) can favor more complex models that
overfit. A genuine **out-of-sample** evaluation tests whether the model
generalizes to new data. For panel data, a natural split is to hold out the
last time period.

In [None]:
# ============================================================
# Exercise 4 Solution: Train/Test Prediction Performance
# ============================================================

# Split data: training = periods 1-3, test = period 4
train_mask = df['time'].values <= 3
test_mask = df['time'].values == 4

y_train = y[train_mask]
y_test = y[test_mask]
X_train = X[train_mask]
X_test = X[test_mask]
groups_train = groups[train_mask]
groups_test = groups[test_mask]
time_train = time[train_mask]
time_test = time[test_mask]

print('Train/Test Split Summary')
print('=' * 50)
print(f'Training set (t=1,2,3):')
print(f'  N observations:     {train_mask.sum()}')
print(f'  N individuals:      {len(np.unique(groups_train))}')
print(f'  Censoring rate:     {(y_train == 0).mean() * 100:.1f}%')
print(f'  Mean expenditure:   {y_train.mean():.2f}')
print(f'\nTest set (t=4):')
print(f'  N observations:     {test_mask.sum()}')
print(f'  N individuals:      {len(np.unique(groups_test))}')
print(f'  Censoring rate:     {(y_test == 0).mean() * 100:.1f}%')
print(f'  Mean expenditure:   {y_test.mean():.2f}')

In [None]:
# ============================================================
# Fit Pooled Tobit on training data
# ============================================================

print('Fitting Pooled Tobit on training data (t=1,2,3)...')
print('=' * 60)

pooled_train = PooledTobit(
    endog=y_train,
    exog=X_train,
    groups=groups_train,
    censoring_point=0.0,
    censoring_type='left'
)
pooled_train.fit(method='BFGS', maxiter=1000)

print(f'  Log-likelihood: {pooled_train.llf:.2f}')
print(f'  Converged: {pooled_train.converged}')

In [None]:
# ============================================================
# Fit RE Tobit on training data
# ============================================================

print('Fitting RE Tobit on training data (t=1,2,3)...')
print('=' * 60)

re_train = RandomEffectsTobit(
    endog=y_train,
    exog=X_train,
    groups=groups_train,
    time=time_train,
    censoring_point=0.0,
    censoring_type='left',
    quadrature_points=12
)
re_train.fit(method='BFGS', maxiter=1000)

print(f'  Log-likelihood: {re_train.llf:.2f}')
print(f'  sigma_eps:      {re_train.sigma_eps:.4f}')
print(f'  sigma_alpha:    {re_train.sigma_alpha:.4f}')
print(f'  Converged:      {re_train.converged}')

In [None]:
# ============================================================
# Generate predictions on the test set
# ============================================================

# Pooled Tobit predictions on test set
y_pred_pooled_latent = pooled_train.predict(exog=X_test, pred_type='latent')
y_pred_pooled_cens = pooled_train.predict(exog=X_test, pred_type='censored')

# RE Tobit predictions on test set
# For out-of-sample, we use population-average predictions (no individual RE)
y_pred_re_latent = re_train.predict(exog=X_test, pred_type='latent')
y_pred_re_cens = re_train.predict(exog=X_test, pred_type='censored')

print('Predictions generated for test set (t=4).')
print(f'  Pooled censored predictions: mean={y_pred_pooled_cens.mean():.3f}, std={y_pred_pooled_cens.std():.3f}')
print(f'  RE censored predictions:     mean={y_pred_re_cens.mean():.3f}, std={y_pred_re_cens.std():.3f}')
print(f'  Actual test values:          mean={y_test.mean():.3f}, std={y_test.std():.3f}')

In [None]:
# ============================================================
# Compute prediction accuracy metrics
# ============================================================

def prediction_metrics(y_true, y_pred, name=''):
    """Compute RMSE, MAE, MAPE, and correlation."""
    residuals = y_true - y_pred
    rmse = np.sqrt(np.mean(residuals**2))
    mae = np.mean(np.abs(residuals))
    mean_error = np.mean(residuals)  # bias
    corr = np.corrcoef(y_true, y_pred)[0, 1]
    
    # RMSE for uncensored observations only
    uncens_mask = y_true > 0
    rmse_uncens = np.sqrt(np.mean((y_true[uncens_mask] - y_pred[uncens_mask])**2))
    mae_uncens = np.mean(np.abs(y_true[uncens_mask] - y_pred[uncens_mask]))
    
    # Classification accuracy: correctly predicting censored vs uncensored
    # (using a threshold of close to 0 for predicted values)
    pred_censored = y_pred < 0.5
    actual_censored = y_true == 0
    accuracy = np.mean(pred_censored == actual_censored)
    
    return {
        'Model': name,
        'RMSE': rmse,
        'MAE': mae,
        'Mean Error (bias)': mean_error,
        'Correlation': corr,
        'RMSE (uncensored)': rmse_uncens,
        'MAE (uncensored)': mae_uncens,
        'Censoring Accuracy': accuracy
    }

# Compute metrics for both models (censored predictions)
metrics_pooled = prediction_metrics(y_test, y_pred_pooled_cens, 'Pooled Tobit')
metrics_re = prediction_metrics(y_test, y_pred_re_cens, 'RE Tobit')

# Also compute a naive baseline: predict mean of training data
y_pred_naive = np.full_like(y_test, y_train.mean(), dtype=float)
metrics_naive = prediction_metrics(y_test, y_pred_naive, 'Naive (mean)')

metrics_df = pd.DataFrame([metrics_naive, metrics_pooled, metrics_re]).set_index('Model')

print('Out-of-Sample Prediction Accuracy (Test Set: t=4)')
print('=' * 80)
display(metrics_df.round(4))

# Improvement of RE over Pooled
rmse_improvement = (metrics_pooled['RMSE'] - metrics_re['RMSE']) / metrics_pooled['RMSE'] * 100
mae_improvement = (metrics_pooled['MAE'] - metrics_re['MAE']) / metrics_pooled['MAE'] * 100

print(f'\nRE Tobit improvement over Pooled Tobit:')
print(f'  RMSE improvement: {rmse_improvement:+.2f}%')
print(f'  MAE improvement:  {mae_improvement:+.2f}%')

In [None]:
# ============================================================
# Visualization: Prediction performance
# ============================================================

fig, axes = plt.subplots(2, 2, figsize=(14, 11))

# 1. Pooled Tobit: predicted vs actual
ax = axes[0, 0]
ax.scatter(y_pred_pooled_cens, y_test, alpha=0.3, s=15, color='coral')
max_val = max(y_pred_pooled_cens.max(), y_test.max())
ax.plot([0, max_val], [0, max_val], 'k--', linewidth=1.5, label='45-degree line')
ax.set_xlabel('Predicted (Pooled Tobit)', fontsize=12)
ax.set_ylabel('Actual Expenditure', fontsize=12)
ax.set_title(f'Pooled Tobit (RMSE={metrics_pooled["RMSE"]:.3f})', fontsize=13)
ax.legend(fontsize=10)
ax.grid(alpha=0.3)

# 2. RE Tobit: predicted vs actual
ax = axes[0, 1]
ax.scatter(y_pred_re_cens, y_test, alpha=0.3, s=15, color='steelblue')
max_val = max(y_pred_re_cens.max(), y_test.max())
ax.plot([0, max_val], [0, max_val], 'k--', linewidth=1.5, label='45-degree line')
ax.set_xlabel('Predicted (RE Tobit)', fontsize=12)
ax.set_ylabel('Actual Expenditure', fontsize=12)
ax.set_title(f'RE Tobit (RMSE={metrics_re["RMSE"]:.3f})', fontsize=13)
ax.legend(fontsize=10)
ax.grid(alpha=0.3)

# 3. Residual distributions
ax = axes[1, 0]
resid_pooled = y_test - y_pred_pooled_cens
resid_re = y_test - y_pred_re_cens

ax.hist(resid_pooled, bins=30, alpha=0.5, color='coral',
        edgecolor='white', label='Pooled Tobit', density=True)
ax.hist(resid_re, bins=30, alpha=0.5, color='steelblue',
        edgecolor='white', label='RE Tobit', density=True)
ax.axvline(0, color='black', linestyle='--', linewidth=1)
ax.set_xlabel('Prediction Residual (actual - predicted)', fontsize=12)
ax.set_ylabel('Density', fontsize=12)
ax.set_title('Residual Distributions (Test Set)', fontsize=13)
ax.legend(fontsize=11)
ax.grid(alpha=0.3)

# 4. RMSE comparison bar chart
ax = axes[1, 1]
models = ['Naive (mean)', 'Pooled Tobit', 'RE Tobit']
rmse_vals = [metrics_naive['RMSE'], metrics_pooled['RMSE'], metrics_re['RMSE']]
mae_vals = [metrics_naive['MAE'], metrics_pooled['MAE'], metrics_re['MAE']]

x_pos = np.arange(len(models))
width = 0.35

bars1 = ax.bar(x_pos - width/2, rmse_vals, width,
               label='RMSE', color='steelblue', edgecolor='black', alpha=0.8)
bars2 = ax.bar(x_pos + width/2, mae_vals, width,
               label='MAE', color='coral', edgecolor='black', alpha=0.8)

# Add value labels
for bar in bars1:
    ax.text(bar.get_x() + bar.get_width()/2., bar.get_height() + 0.05,
            f'{bar.get_height():.3f}', ha='center', va='bottom', fontsize=9)
for bar in bars2:
    ax.text(bar.get_x() + bar.get_width()/2., bar.get_height() + 0.05,
            f'{bar.get_height():.3f}', ha='center', va='bottom', fontsize=9)

ax.set_xticks(x_pos)
ax.set_xticklabels(models, fontsize=11)
ax.set_ylabel('Error', fontsize=12)
ax.set_title('Out-of-Sample Prediction Error Comparison', fontsize=13)
ax.legend(fontsize=11)
ax.grid(alpha=0.3, axis='y')

plt.tight_layout()
plt.savefig(FIGURES_DIR / 'ex4_prediction_performance.png', dpi=300, bbox_inches='tight')
plt.show()

In [None]:
# ============================================================
# Additional analysis: Performance by censoring status
# ============================================================

# How well do models predict for censored vs uncensored obs?
censored_test = y_test == 0
uncensored_test = y_test > 0

print('Performance by Censoring Status in Test Set')
print('=' * 70)
print(f'{"":25s} {"Censored (y=0)":>20s} {"Uncensored (y>0)":>20s}')
print(f'{"":25s} {"N=" + str(censored_test.sum()):>20s} {"N=" + str(uncensored_test.sum()):>20s}')
print('-' * 65)

for name, y_pred in [('Pooled Tobit', y_pred_pooled_cens), ('RE Tobit', y_pred_re_cens)]:
    # For censored obs: ideal prediction is 0 or close to 0
    mae_cens = np.mean(np.abs(y_test[censored_test] - y_pred[censored_test]))
    mae_uncens = np.mean(np.abs(y_test[uncensored_test] - y_pred[uncensored_test]))
    
    rmse_cens = np.sqrt(np.mean((y_test[censored_test] - y_pred[censored_test])**2))
    rmse_uncens = np.sqrt(np.mean((y_test[uncensored_test] - y_pred[uncensored_test])**2))
    
    print(f'{name + " RMSE":25s} {rmse_cens:>20.4f} {rmse_uncens:>20.4f}')
    print(f'{name + " MAE":25s} {mae_cens:>20.4f} {mae_uncens:>20.4f}')
    print()

print('Interpretation:')
print('  - For censored observations, a good model should predict values near zero.')
print('  - For uncensored observations, the model should match the positive expenditure.')
print('  - The RE model may better predict uncensored observations if individual')
print('    heterogeneity is important for explaining expenditure levels.')

In [None]:
# ============================================================
# Coefficient comparison: full sample vs training sample
# ============================================================

print('Coefficient Stability: Full Sample vs Training Sample (RE Tobit)')
print('=' * 70)
print(f'{"Variable":>12s} {"Full (t=1-4)":>14s} {"Train (t=1-3)":>14s} {"Difference":>12s}')
print('-' * 52)
for i, vname in enumerate(var_names):
    full_val = re_base.beta[i]
    train_val = re_train.beta[i]
    diff = train_val - full_val
    print(f'{vname:>12s} {full_val:>14.4f} {train_val:>14.4f} {diff:>12.4f}')

print(f'{"sigma_eps":>12s} {re_base.sigma_eps:>14.4f} {re_train.sigma_eps:>14.4f} {re_train.sigma_eps - re_base.sigma_eps:>12.4f}')
print(f'{"sigma_alpha":>12s} {re_base.sigma_alpha:>14.4f} {re_train.sigma_alpha:>14.4f} {re_train.sigma_alpha - re_base.sigma_alpha:>12.4f}')

print('\nSmall differences indicate stable estimates (good sign for generalization).')

### Exercise 4: Discussion

**Key findings from the prediction performance analysis:**

1. **Both models outperform the naive baseline**: Predicting the training mean
   for all test observations gives the worst RMSE/MAE, confirming that the
   covariates contain useful predictive information.

2. **Pooled vs RE Tobit**: For population-average out-of-sample predictions
   (without individual-specific random effect estimates), the two models often
   perform similarly. The RE model's main advantage is in characterizing
   **within-individual** dynamics and producing correct **standard errors**,
   rather than in point prediction for new periods.

3. **Censored vs uncensored predictions**: Both models tend to have larger
   prediction errors for uncensored observations, which is expected since
   positive expenditure values have much more variation than the mass point
   at zero.

4. **Coefficient stability**: The estimates from the training sample (t=1-3)
   are close to those from the full sample (t=1-4), suggesting that the
   model parameters are stable over time.

5. **Caveat**: If we had individual-level random effect estimates (empirical
   Bayes predictions of $\hat{\alpha}_i$ from periods 1-3), the RE model
   could potentially produce substantially better individual-level predictions
   for period 4. The population-average predictions used here do not exploit
   this information.

---

## Summary

In this solution notebook, we worked through four exercises that deepened our
understanding of the Random Effects Tobit model:

| Exercise | Key Takeaway |
|----------|-------------|
| 1. Quadrature Sensitivity | Q=12 is sufficient for this dataset; results stabilize quickly |
| 2. Gender Subsample | Separate models reveal structural differences in expenditure processes |
| 3. Marginal Effects | Unconditional ME < Conditional ME < Latent beta due to censoring |
| 4. Prediction Performance | Both models beat naive; RE advantages are more in inference than prediction |

### Key Methods Used

```python
# Quadrature sensitivity
RandomEffectsTobit(..., quadrature_points=nq)

# Subsample estimation
RandomEffectsTobit(endog=y[mask], exog=X[mask], groups=groups[mask], ...)

# Marginal effects
model.marginal_effects(at='overall', which='unconditional')
model.marginal_effects(at='overall', which='conditional')
model.marginal_effects(at='overall', which='probability')

# Out-of-sample prediction
model.predict(exog=X_test, pred_type='censored')
```