# Notebook 01: Introduction to Tobit Models - SOLUTIONS

**Tutorial 01 Solutions - Censored Models Series**

**Version**: 1.0  
**Date**: 2026-02-17  

---

## Overview

This notebook contains complete, working solutions for all 4 exercises from
Notebook 01: Introduction to Tobit Models for Censored Data.

**Exercises covered:**

1. **Exercise 1: Variable Selection (Easy)** -- Re-estimate Tobit with a restricted set of variables and perform a likelihood ratio test
2. **Exercise 2: Prediction Profiles (Medium)** -- Create profiles and compute latent, censored, and probability predictions
3. **Exercise 3: Right-Censoring (Medium)** -- Estimate a right-censored Tobit and compare with OLS and left-censored Tobit
4. **Exercise 4: Manual McDonald-Moffitt Decomposition (Hard)** -- Manually compute all three marginal effects and verify against PanelBox

---

## Setup

Import required libraries and configure visualization settings.

In [None]:
# Standard library imports
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from pathlib import Path
import warnings
warnings.filterwarnings('ignore')

# Statistical functions
from scipy import stats
import statsmodels.api as sm

# PanelBox imports
from panelbox.models.censored import PooledTobit

# Visualization configuration
plt.style.use('seaborn-v0_8-whitegrid')
sns.set_palette('husl')
plt.rcParams['figure.figsize'] = (10, 6)
plt.rcParams['font.size'] = 11
plt.rcParams['axes.labelsize'] = 12
plt.rcParams['axes.titlesize'] = 14
plt.rcParams['legend.fontsize'] = 10
pd.set_option('display.max_columns', None)
pd.set_option('display.precision', 4)

# Reproducibility
np.random.seed(42)

# Define paths
BASE_DIR = Path('..')
DATA_DIR = BASE_DIR / 'data'
OUTPUT_DIR = BASE_DIR / 'outputs'
FIGURES_DIR = OUTPUT_DIR / 'figures'
TABLES_DIR = OUTPUT_DIR / 'tables'
FIGURES_DIR.mkdir(parents=True, exist_ok=True)
TABLES_DIR.mkdir(parents=True, exist_ok=True)

print('Setup complete!')
print(f'  Data directory: {DATA_DIR}')
print(f'  Output directory: {OUTPUT_DIR}')

## Data Generation

Regenerate the labor supply data using the same DGP as the main notebook.
The latent variable (desired hours) is a function of wages, education,
experience, and household characteristics. Observed hours are censored at zero.

In [None]:
def generate_censored_labor_supply(n=500, seed=42):
    """
    Generate labor supply data with left-censoring at zero.
    
    The latent variable (desired hours) is a function of wages, education,
    experience, and household characteristics. Observed hours are:
        hours = max(0, latent_hours)
    """
    rng = np.random.default_rng(seed)
    
    # Generate covariates
    education = rng.integers(8, 21, size=n)
    age = rng.integers(25, 60, size=n)
    experience = np.clip(age - education - 6 + rng.normal(0, 2, n), 0, None)
    experience_sq = experience ** 2
    children = rng.poisson(0.8, size=n)
    married = rng.binomial(1, 0.6, size=n)
    non_labor_income = np.abs(rng.normal(20, 15, n))
    wage = np.exp(
        0.8 + 0.07 * education + 0.03 * experience
        - 0.0005 * experience_sq + rng.normal(0, 0.4, n)
    )
    
    # Latent hours (desired hours)
    # Lower intercept to ensure ~30-35% censoring
    latent_hours = (
        -5.0
        + 3.0 * np.log(wage)
        + 0.8 * education
        + 1.2 * experience
        - 0.02 * experience_sq
        - 3.5 * children
        + 1.5 * married
        - 0.25 * non_labor_income
        + rng.normal(0, 12, n)
    )
    
    # Apply censoring: observed hours = max(0, latent_hours)
    hours = np.maximum(latent_hours, 0.0)
    
    return pd.DataFrame({
        'hours': np.round(hours, 1),
        'wage': np.round(wage, 2),
        'education': education,
        'experience': np.round(experience, 1),
        'experience_sq': np.round(experience_sq, 1),
        'age': age,
        'children': children,
        'married': married,
        'non_labor_income': np.round(non_labor_income, 2),
    }), latent_hours


# Generate data
df, latent_hours = generate_censored_labor_supply(n=500, seed=42)
df['latent_hours'] = np.round(latent_hours, 1)

print('=' * 60)
print('Labor Supply Dataset')
print('=' * 60)
print(f'\nObservations: {len(df)}')
print(f'Variables: {list(df.columns)}')
print(f'\nCensored observations (hours = 0): {(df["hours"] == 0).sum()}')
print(f'Uncensored observations (hours > 0): {(df["hours"] > 0).sum()}')
print(f'Censoring rate: {(df["hours"] == 0).mean():.1%}')
print(f'\nFirst 5 rows:')
df.head()

## Fit the Base Tobit Model (Reference)

We fit the full Tobit model as a baseline for comparison in the exercises.
This uses all seven covariates: wage, education, experience, experience_sq,
children, married, non_labor_income.

In [None]:
# Prepare data for Tobit estimation
feature_cols = ['wage', 'education', 'experience', 'experience_sq',
                'children', 'married', 'non_labor_income']

y = df['hours'].values
X = sm.add_constant(df[feature_cols].values)
var_names = ['const'] + feature_cols

# Fit the full Tobit model
tobit_full = PooledTobit(
    endog=y,
    exog=X,
    censoring_point=0.0,
    censoring_type='left'
)
tobit_full = tobit_full.fit()

# Store variable names on the model for marginal effects
tobit_full.exog_names = var_names

print('=' * 60)
print('Full Tobit Model (Baseline)')
print('=' * 60)
print(f'\nConverged: {tobit_full.converged}')
print(f'Log-likelihood: {tobit_full.llf:.3f}')
print(f'Sigma: {tobit_full.sigma:.4f}')

print(f'\n{"Variable":<22} {"Coefficient":>12} {"Std. Error":>12}')
print('-' * 48)
for i, name in enumerate(var_names):
    print(f'{name:<22} {tobit_full.beta[i]:>12.4f} {tobit_full.bse[i]:>12.4f}')

In [None]:
# Also fit OLS for comparison (used in Exercise 3)
ols_full = sm.OLS(y, X).fit()

print('OLS model (full sample) fitted for comparison.')
print(f'R-squared: {ols_full.rsquared:.4f}')

---

# Exercise 1: Variable Selection (Easy)

**Task:** Re-estimate the Tobit model using only `wage`, `education`, and `children`
as explanatory variables (plus a constant). Compare the coefficients and log-likelihood
with the full model. Perform a likelihood ratio test to determine whether the
restricted model fits significantly worse.

**Likelihood Ratio Test:**

$$LR = -2(\ln L_{\text{restricted}} - \ln L_{\text{full}}) \sim \chi^2(q)$$

where $q$ is the number of restrictions (omitted variables).

### Solution

In [None]:
print('=' * 80)
print('EXERCISE 1: VARIABLE SELECTION -- RESTRICTED TOBIT MODEL')
print('=' * 80)

# Step 1: Estimate the restricted model with only wage, education, children
restricted_cols = ['wage', 'education', 'children']
X_restricted = sm.add_constant(df[restricted_cols].values)
var_names_restricted = ['const'] + restricted_cols

tobit_restricted = PooledTobit(
    endog=y,
    exog=X_restricted,
    censoring_point=0.0,
    censoring_type='left'
)
tobit_restricted = tobit_restricted.fit()
tobit_restricted.exog_names = var_names_restricted

print(f'\nRestricted model converged: {tobit_restricted.converged}')
print(f'Restricted log-likelihood: {tobit_restricted.llf:.3f}')
print(f'Restricted sigma: {tobit_restricted.sigma:.4f}')

# Step 2: Display coefficient comparison
print(f'\n{"":-<80}')
print(f'{"Variable":<22} {"Full Model":>14} {"Restricted":>14}')
print(f'{"":-<80}')

for name in var_names_restricted:
    idx_full = var_names.index(name)
    idx_rest = var_names_restricted.index(name)
    print(f'{name:<22} {tobit_full.beta[idx_full]:>14.4f} {tobit_restricted.beta[idx_rest]:>14.4f}')

# Show the omitted variables
omitted_vars = [v for v in feature_cols if v not in restricted_cols]
print(f'\nOmitted variables: {omitted_vars}')
for name in omitted_vars:
    idx_full = var_names.index(name)
    print(f'  {name}: beta = {tobit_full.beta[idx_full]:.4f} (in full model)')

In [None]:
# Step 3: Likelihood Ratio Test
print('\n' + '=' * 80)
print('LIKELIHOOD RATIO TEST')
print('=' * 80)

# Number of restrictions = number of omitted variables
# We omit: experience, experience_sq, married, non_labor_income
q = len(omitted_vars)

# LR statistic
LR = -2 * (tobit_restricted.llf - tobit_full.llf)

# p-value from chi-squared distribution
p_value = 1 - stats.chi2.cdf(LR, df=q)

print(f'\nLog-likelihood (full):       {tobit_full.llf:.3f}')
print(f'Log-likelihood (restricted): {tobit_restricted.llf:.3f}')
print(f'\nNumber of restrictions (q):  {q}')
print(f'LR statistic:                {LR:.4f}')
print(f'Chi-squared critical (5%):   {stats.chi2.ppf(0.95, df=q):.4f}')
print(f'p-value:                     {p_value:.6f}')

if p_value < 0.05:
    print(f'\nConclusion: REJECT H0 at 5% level.')
    print(f'The omitted variables ({omitted_vars}) are JOINTLY SIGNIFICANT.')
    print(f'The full model is statistically preferred over the restricted model.')
else:
    print(f'\nConclusion: FAIL TO REJECT H0 at 5% level.')
    print(f'The restricted model is adequate; the omitted variables are not jointly significant.')

print('\n' + '=' * 80)

### Interpretation

The likelihood ratio test compares the restricted model (wage, education, children only)
against the full model. A large LR statistic (relative to the chi-squared critical value)
indicates that the omitted variables -- experience, experience_sq, married, and
non_labor_income -- contribute significantly to explaining the variation in hours worked.

**Key points:**
- The restricted model omits 4 variables, so the test has 4 degrees of freedom
- The LR test is valid because the restricted model is nested within the full model
- Even if individual variables are insignificant, they may be jointly significant
- A significant LR test does not necessarily mean we should include *all* omitted variables;
  it only tells us that at least some of them matter

---

# Exercise 2: Prediction Profiles (Medium)

**Task:** Create three hypothetical individual profiles:
- A person very **likely to work** (high education, no children, low non-labor income)
- A person **on the margin** (moderate characteristics)
- A person **unlikely to work** (low education, many children, high non-labor income)

For each profile, compute:
1. The latent prediction $E[y^*|\mathbf{X}] = \mathbf{X}'\hat{\boldsymbol{\beta}}$
2. The censored prediction $E[y|\mathbf{X}]$
3. The probability of working $P(y > 0|\mathbf{X})$
4. The conditional expected hours $E[y|y>0, \mathbf{X}]$

Then verify:
$$E[y|\mathbf{X}] = P(y>0|\mathbf{X}) \cdot E[y|y>0, \mathbf{X}]$$

### Solution

In [None]:
print('=' * 80)
print('EXERCISE 2: PREDICTION PROFILES')
print('=' * 80)

# Define three profiles
profiles = {
    'Likely worker': {
        'wage': 20.0, 'education': 18, 'experience': 10, 'experience_sq': 100,
        'children': 0, 'married': 1, 'non_labor_income': 5.0,
    },
    'Marginal': {
        'wage': 10.0, 'education': 12, 'experience': 8, 'experience_sq': 64,
        'children': 2, 'married': 1, 'non_labor_income': 25.0,
    },
    'Unlikely worker': {
        'wage': 5.0, 'education': 9, 'experience': 3, 'experience_sq': 9,
        'children': 4, 'married': 0, 'non_labor_income': 40.0,
    },
}

# Extract model parameters
beta = tobit_full.beta
sigma = tobit_full.sigma

print(f'\nModel parameters:')
print(f'  sigma = {sigma:.4f}')
print(f'  beta = {beta}')

print(f'\n{"Profile":<22} {"E[y*|X]":>10} {"E[y|X]":>10} {"P(y>0|X)":>10} {"E[y|y>0,X]":>12}')
print('-' * 68)

results_profiles = {}

for name, vals in profiles.items():
    # Build the regressor vector: [1, wage, education, ...]
    x_vec = np.array([[1.0] + [vals[col] for col in feature_cols]])
    
    # 1. Latent prediction: E[y*|X] = X'beta
    y_latent = tobit_full.predict(exog=x_vec, pred_type='latent')[0]
    
    # 2. Censored prediction: E[y|X]
    y_censored = tobit_full.predict(exog=x_vec, pred_type='censored')[0]
    
    # 3. Probability of censoring: P(y=0|X)
    # Note: PooledTobit.predict(pred_type='probability') returns P(y=0|X)
    prob_censored = tobit_full.predict(exog=x_vec, pred_type='probability')[0]
    prob_uncensored = 1.0 - prob_censored
    
    # 4. Conditional expected hours: E[y|y>0, X]
    # Formula: E[y|y>0,X] = X'beta + sigma * lambda(z)
    # where z = X'beta / sigma (for censoring at 0)
    # and lambda(z) = phi(z) / Phi(z) is the inverse Mills ratio
    z = y_latent / sigma
    IMR = stats.norm.pdf(z) / stats.norm.cdf(z)
    y_conditional = y_latent + sigma * IMR
    
    results_profiles[name] = {
        'latent': y_latent,
        'censored': y_censored,
        'prob_uncensored': prob_uncensored,
        'conditional': y_conditional,
    }
    
    print(f'{name:<22} {y_latent:>10.2f} {y_censored:>10.2f} {prob_uncensored:>10.3f} {y_conditional:>12.2f}')

In [None]:
# Verify the identity: E[y|X] = P(y>0|X) * E[y|y>0,X]
# More precisely, for left-censoring at 0:
# E[y|X] = P(y>0|X) * E[y|y>0,X] + P(y=0|X) * 0
#        = P(y>0|X) * E[y|y>0,X]

print('\n' + '=' * 80)
print('VERIFICATION: E[y|X] = P(y>0|X) * E[y|y>0,X]')
print('=' * 80)

print(f'\n{"Profile":<22} {"E[y|X]":>12} {"P(y>0)*E[y|y>0]":>18} {"Difference":>12}')
print('-' * 68)

for name, res in results_profiles.items():
    lhs = res['censored']  # E[y|X]
    rhs = res['prob_uncensored'] * res['conditional']  # P(y>0|X) * E[y|y>0,X]
    diff = lhs - rhs
    print(f'{name:<22} {lhs:>12.4f} {rhs:>18.4f} {diff:>12.6f}')

print(f'\nThe differences should be very close to zero (within numerical precision).')
print(f'This confirms the law of iterated expectations applied to censored data.')

In [None]:
# Visualize the profiles
fig, axes = plt.subplots(1, 3, figsize=(18, 5))

profile_names = list(results_profiles.keys())
colors = ['steelblue', 'darkorange', 'firebrick']

# Panel 1: Bar chart of predictions
ax = axes[0]
x_pos = np.arange(len(profile_names))
width = 0.35
latent_vals = [results_profiles[n]['latent'] for n in profile_names]
censored_vals = [results_profiles[n]['censored'] for n in profile_names]

ax.bar(x_pos - width/2, latent_vals, width, label='Latent E[y*|X]',
       color='lightcoral', edgecolor='white')
ax.bar(x_pos + width/2, censored_vals, width, label='Censored E[y|X]',
       color='steelblue', edgecolor='white')
ax.set_xticks(x_pos)
ax.set_xticklabels(profile_names, rotation=15, ha='right')
ax.set_ylabel('Predicted Hours')
ax.set_title('Latent vs. Censored Predictions')
ax.axhline(y=0, color='red', linestyle='--', linewidth=1, alpha=0.5)
ax.legend()

# Panel 2: Probability of working
ax = axes[1]
prob_vals = [results_profiles[n]['prob_uncensored'] for n in profile_names]
bars = ax.bar(x_pos, prob_vals, color=colors, edgecolor='white')
ax.set_xticks(x_pos)
ax.set_xticklabels(profile_names, rotation=15, ha='right')
ax.set_ylabel('P(hours > 0 | X)')
ax.set_title('Probability of Working')
ax.set_ylim(0, 1.1)
for bar, val in zip(bars, prob_vals):
    ax.text(bar.get_x() + bar.get_width()/2., bar.get_height() + 0.02,
            f'{val:.3f}', ha='center', va='bottom', fontsize=11, fontweight='bold')

# Panel 3: Decomposition
ax = axes[2]
cond_vals = [results_profiles[n]['conditional'] for n in profile_names]
ax.bar(x_pos - width/2, cond_vals, width, label='E[y|y>0,X]',
       color='darkorange', edgecolor='white')
ax.bar(x_pos + width/2, censored_vals, width, label='E[y|X]',
       color='steelblue', edgecolor='white')
ax.set_xticks(x_pos)
ax.set_xticklabels(profile_names, rotation=15, ha='right')
ax.set_ylabel('Expected Hours')
ax.set_title('Conditional vs. Unconditional Predictions')
ax.legend()

plt.tight_layout()
plt.savefig(FIGURES_DIR / '01_ex2_prediction_profiles.png', dpi=150, bbox_inches='tight')
plt.show()

print('Figure saved.')

### Interpretation

The three profiles illustrate how the Tobit model differentiates between
individuals based on their characteristics:

- **Likely worker**: High education, high wage, no children, low non-labor income.
  The latent prediction is strongly positive, the probability of working is high,
  and the censored prediction is close to the latent prediction.

- **Marginal**: Moderate characteristics. The latent prediction is near zero,
  indicating this person is on the boundary of participating. The censored prediction
  is pulled upward relative to the latent prediction because E[y|X] accounts for
  the truncated normal.

- **Unlikely worker**: Low education, many children, high non-labor income.
  The latent prediction is negative (desired hours are below zero), indicating
  this person would prefer not to work. The probability of working is low.

**Key insight:** The identity $E[y|X] = P(y>0|X) \cdot E[y|y>0,X]$ holds exactly.
This shows that the unconditional expectation decomposes into the probability of
participation times the expected hours conditional on participating.

---

# Exercise 3: Right-Censoring (Medium)

**Task:** Suppose weekly hours are **right-censored** at 40 hours (full-time workers
are all recorded as 40 hours even if they work more). Create an artificial right-censored
variable and estimate:
1. A Tobit model with `censoring_type='right'` and `censoring_point=40.0`
2. Compare with OLS and the left-censored Tobit

### Solution

In [None]:
print('=' * 80)
print('EXERCISE 3: RIGHT-CENSORING AT 40 HOURS')
print('=' * 80)

# Step 1: Create the right-censored variable
hours_right_censored = np.minimum(df['hours'].values, 40.0)

n_right_censored = (hours_right_censored == 40.0).sum()
n_left_censored = (df['hours'].values == 0).sum()

print(f'\nOriginal hours range: [{df["hours"].min():.1f}, {df["hours"].max():.1f}]')
print(f'Right-censored hours range: [{hours_right_censored.min():.1f}, {hours_right_censored.max():.1f}]')
print(f'\nObservations censored at 40: {n_right_censored} ({n_right_censored/len(y):.1%})')
print(f'Observations at 0 (not working): {(hours_right_censored == 0).sum()}')
print(f'Uncensored observations (0 < h < 40): {((hours_right_censored > 0) & (hours_right_censored < 40)).sum()}')

In [None]:
# Step 2: Estimate the right-censored Tobit model
# Note: For right-censoring, we use only the uncensored observations
# plus those censored at the upper limit. We exclude the zeros
# (which represent left-censoring, not right-censoring).
# However, to keep it comparable, we use the right-censored variable as-is
# (treating the 40-hour ceiling as the censoring point).

tobit_right = PooledTobit(
    endog=hours_right_censored,
    exog=X,
    censoring_point=40.0,
    censoring_type='right'
)
tobit_right = tobit_right.fit()
tobit_right.exog_names = var_names

print('Right-censored Tobit model fitted.')
print(f'Converged: {tobit_right.converged}')
print(f'Log-likelihood: {tobit_right.llf:.3f}')
print(f'Sigma: {tobit_right.sigma:.4f}')

In [None]:
# Step 3: Estimate OLS on the right-censored data for comparison
ols_right = sm.OLS(hours_right_censored, X).fit()

print('OLS on right-censored data fitted.')
print(f'R-squared: {ols_right.rsquared:.4f}')

In [None]:
# Step 4: Compare all models
print('\n' + '=' * 100)
print('COEFFICIENT COMPARISON: LEFT-CENSORED TOBIT vs. RIGHT-CENSORED TOBIT vs. OLS')
print('=' * 100)

print(f'\n{"Variable":<22} {"Left Tobit":>14} {"Right Tobit":>14} {"OLS (full)":>14} {"OLS (right cens.)":>18}')
print('-' * 84)

for i, name in enumerate(var_names):
    print(f'{name:<22} {tobit_full.beta[i]:>14.4f} {tobit_right.beta[i]:>14.4f} '
          f'{ols_full.params[i]:>14.4f} {ols_right.params[i]:>18.4f}')

print(f'\n{"sigma":<22} {tobit_full.sigma:>14.4f} {tobit_right.sigma:>14.4f} '
      f'{np.sqrt(ols_full.mse_resid):>14.4f} {np.sqrt(ols_right.mse_resid):>18.4f}')
print(f'{"Log-likelihood":<22} {tobit_full.llf:>14.3f} {tobit_right.llf:>14.3f} '
      f'{"N/A":>14} {"N/A":>18}')

print('\n' + '=' * 100)

In [None]:
# Step 5: Visualize the right-censored data and predictions
fig, axes = plt.subplots(1, 3, figsize=(18, 5))

# Panel 1: Distribution of right-censored hours
ax = axes[0]
ax.hist(hours_right_censored, bins=40, color='steelblue', edgecolor='white', alpha=0.8)
ax.axvline(x=40, color='red', linestyle='--', linewidth=2, label='Censoring point (40h)')
ax.set_xlabel('Hours Worked per Week')
ax.set_ylabel('Frequency')
ax.set_title('Distribution of Right-Censored Hours\n(Pile-up at 40)')
ax.legend()

# Panel 2: Compare original and right-censored
ax = axes[1]
ax.hist(df['hours'].values, bins=40, color='lightcoral', edgecolor='white',
        alpha=0.5, label='Original hours')
ax.hist(hours_right_censored, bins=40, color='steelblue', edgecolor='white',
        alpha=0.5, label='Right-censored (at 40)')
ax.axvline(x=40, color='red', linestyle='--', linewidth=2)
ax.set_xlabel('Hours Worked per Week')
ax.set_ylabel('Frequency')
ax.set_title('Original vs. Right-Censored')
ax.legend()

# Panel 3: Coefficient comparison
ax = axes[2]
plot_vars = feature_cols
x_pos = np.arange(len(plot_vars))
width = 0.25

left_coefs = [tobit_full.beta[i+1] for i in range(len(plot_vars))]
right_coefs = [tobit_right.beta[i+1] for i in range(len(plot_vars))]
ols_coefs = [ols_full.params[i+1] for i in range(len(plot_vars))]

ax.bar(x_pos - width, left_coefs, width, label='Left Tobit', color='steelblue', edgecolor='white')
ax.bar(x_pos, right_coefs, width, label='Right Tobit', color='darkorange', edgecolor='white')
ax.bar(x_pos + width, ols_coefs, width, label='OLS', color='lightcoral', edgecolor='white')
ax.set_xticks(x_pos)
ax.set_xticklabels(plot_vars, rotation=45, ha='right')
ax.set_ylabel('Coefficient')
ax.set_title('Coefficient Comparison Across Models')
ax.legend()
ax.axhline(y=0, color='gray', linestyle='-', linewidth=0.5)

plt.tight_layout()
plt.savefig(FIGURES_DIR / '01_ex3_right_censoring.png', dpi=150, bbox_inches='tight')
plt.show()

print('Figure saved.')

### Interpretation

**Right-censoring** occurs when the dependent variable is capped at an upper limit.
Here, we cap hours at 40 (simulating a full-time constraint). Key observations:

1. **Right-censored Tobit vs. Left-censored Tobit**: The left-censored model addresses
   the pile-up at zero, while the right-censored model addresses the pile-up at 40.
   They recover different aspects of the latent distribution.

2. **Right-censored Tobit vs. OLS**: OLS on right-censored data attenuates
   coefficients toward zero because the capped observations reduce the apparent
   relationship between regressors and hours. The right-censored Tobit corrects
   for this by modeling the censoring mechanism.

3. **Practical relevance**: Right-censoring is common in survey data where
   top-coding is applied (e.g., income capped at a maximum value, test scores
   capped at 100%).

---

# Exercise 4: Manual McDonald-Moffitt Decomposition (Hard)

**Task:** Manually compute all three types of marginal effects using the formulas
from Section 6 of the main notebook, and verify that they match the PanelBox output.

For each observation $i$, compute:
1. $z_i = \mathbf{X}_i'\hat{\boldsymbol{\beta}} / \hat{\sigma}$
2. $\Phi(z_i)$ and $\phi(z_i)$
3. The inverse Mills ratio $\lambda(z_i) = \phi(z_i)/\Phi(z_i)$
4. For each variable $k$:
   - **Unconditional ME:** $\hat{\beta}_k \cdot \Phi(z_i)$
   - **Conditional ME:** $\hat{\beta}_k \cdot [1 - \lambda(z_i)(z_i + \lambda(z_i))]$
   - **Probability ME:** $(\hat{\beta}_k / \hat{\sigma}) \cdot \phi(z_i)$
5. Average across all observations to get AME
6. Compare with `tobit_full.marginal_effects(at='overall', which=...)`

### Solution

In [None]:
print('=' * 80)
print('EXERCISE 4: MANUAL McDONALD-MOFFITT DECOMPOSITION')
print('=' * 80)

# Extract estimated parameters
beta = tobit_full.beta
sigma = tobit_full.sigma

print(f'\nEstimated parameters:')
print(f'  sigma = {sigma:.6f}')
for i, name in enumerate(var_names):
    print(f'  beta[{name}] = {beta[i]:.6f}')

# Step 1: Compute z_i = X_i'beta / sigma for all observations
linear_pred = X @ beta         # X'beta for each observation
z = linear_pred / sigma        # Standardized index

print(f'\nLinear prediction X\'beta:')
print(f'  Mean: {linear_pred.mean():.4f}')
print(f'  Range: [{linear_pred.min():.4f}, {linear_pred.max():.4f}]')

# Step 2: Compute Phi(z) and phi(z)
Phi_z = stats.norm.cdf(z)      # Standard normal CDF
phi_z = stats.norm.pdf(z)      # Standard normal PDF

print(f'\nPhi(z) -- P(y > 0 | X):')
print(f'  Mean: {Phi_z.mean():.4f}')
print(f'  Range: [{Phi_z.min():.4f}, {Phi_z.max():.4f}]')

# Step 3: Compute inverse Mills ratio lambda(z) = phi(z) / Phi(z)
# Use safe computation to avoid division by zero
lambda_z = np.where(Phi_z > 1e-10, phi_z / Phi_z, -z)

print(f'\nInverse Mills ratio lambda(z):')
print(f'  Mean: {lambda_z.mean():.4f}')
print(f'  Range: [{lambda_z.min():.4f}, {lambda_z.max():.4f}]')

In [None]:
# Step 4: Compute the three types of marginal effects for each variable

manual_ame = {'unconditional': {}, 'conditional': {}, 'probability': {}}

for k, name in enumerate(var_names):
    if name == 'const':
        continue  # Skip intercept
    
    beta_k = beta[k]
    
    # Unconditional ME: beta_k * Phi(z_i)
    me_uncond_i = beta_k * Phi_z
    manual_ame['unconditional'][name] = np.mean(me_uncond_i)
    
    # Conditional ME: beta_k * [1 - lambda(z_i) * (z_i + lambda(z_i))]
    scaling_factor = 1.0 - lambda_z * (z + lambda_z)
    me_cond_i = beta_k * scaling_factor
    manual_ame['conditional'][name] = np.mean(me_cond_i)
    
    # Probability ME: (beta_k / sigma) * phi(z_i)
    me_prob_i = (beta_k / sigma) * phi_z
    manual_ame['probability'][name] = np.mean(me_prob_i)

print('Manual AME computation complete.')
print(f'\nComputed marginal effects for {len(manual_ame["unconditional"])} variables.')

In [None]:
# Step 5: Get PanelBox marginal effects for comparison

pb_me_uncond = tobit_full.marginal_effects(at='overall', which='unconditional')
pb_me_cond = tobit_full.marginal_effects(at='overall', which='conditional')
pb_me_prob = tobit_full.marginal_effects(at='overall', which='probability')

print('PanelBox AME computation complete.')

In [None]:
# Step 6: Compare manual vs. PanelBox results

print('\n' + '=' * 100)
print('COMPARISON: MANUAL vs. PANELBOX -- UNCONDITIONAL MARGINAL EFFECTS')
print('  dE[y|X]/dx_k = beta_k * Phi(z), averaged over all observations')
print('=' * 100)

print(f'\n{"Variable":<22} {"Manual":>14} {"PanelBox":>14} {"Difference":>14}')
print('-' * 66)

for name in feature_cols:
    manual_val = manual_ame['unconditional'][name]
    pb_val = pb_me_uncond.marginal_effects.get(name, np.nan)
    diff = manual_val - pb_val
    print(f'{name:<22} {manual_val:>14.6f} {pb_val:>14.6f} {diff:>14.8f}')


print('\n' + '=' * 100)
print('COMPARISON: MANUAL vs. PANELBOX -- CONDITIONAL MARGINAL EFFECTS')
print('  dE[y|y>0,X]/dx_k = beta_k * [1 - lambda(z)*(z + lambda(z))], averaged')
print('=' * 100)

print(f'\n{"Variable":<22} {"Manual":>14} {"PanelBox":>14} {"Difference":>14}')
print('-' * 66)

for name in feature_cols:
    manual_val = manual_ame['conditional'][name]
    pb_val = pb_me_cond.marginal_effects.get(name, np.nan)
    diff = manual_val - pb_val
    print(f'{name:<22} {manual_val:>14.6f} {pb_val:>14.6f} {diff:>14.8f}')


print('\n' + '=' * 100)
print('COMPARISON: MANUAL vs. PANELBOX -- PROBABILITY MARGINAL EFFECTS')
print('  dP(y>0|X)/dx_k = (beta_k / sigma) * phi(z), averaged')
print('=' * 100)

print(f'\n{"Variable":<22} {"Manual":>14} {"PanelBox":>14} {"Difference":>14}')
print('-' * 66)

for name in feature_cols:
    manual_val = manual_ame['probability'][name]
    pb_val = pb_me_prob.marginal_effects.get(name, np.nan)
    diff = manual_val - pb_val
    print(f'{name:<22} {manual_val:>14.6f} {pb_val:>14.6f} {diff:>14.8f}')

In [None]:
# Step 7: Verify the McDonald-Moffitt decomposition
# The unconditional ME decomposes as:
#   dE[y|X]/dx_k = P(y>0|X) * dE[y|y>0,X]/dx_k + E[y|y>0,X] * dP(y>0|X)/dx_k
#
# To verify this at the observation level, we compute both sides and average.

print('\n' + '=' * 100)
print('VERIFICATION: McDONALD-MOFFITT DECOMPOSITION')
print('  dE[y|X]/dx_k = P(y>0) * dE[y|y>0,X]/dx_k + E[y|y>0,X] * dP(y>0)/dx_k')
print('=' * 100)

# Compute E[y|y>0,X] = X'beta + sigma * lambda(z) for each observation
E_y_conditional = linear_pred + sigma * lambda_z

# The scaling factor for the conditional ME
scaling_cond = 1.0 - lambda_z * (z + lambda_z)

print(f'\n{"Variable":<22} {"Uncond. ME":>14} {"Decomposed":>14} {"Difference":>14}')
print('-' * 66)

for k, name in enumerate(var_names):
    if name == 'const':
        continue
    
    beta_k = beta[k]
    
    # Left-hand side: unconditional ME (averaged)
    lhs_i = beta_k * Phi_z
    lhs = np.mean(lhs_i)
    
    # Right-hand side: decomposition (averaged)
    # Term 1: P(y>0|X) * dE[y|y>0,X]/dx_k
    term1_i = Phi_z * (beta_k * scaling_cond)
    
    # Term 2: E[y|y>0,X] * dP(y>0|X)/dx_k
    term2_i = E_y_conditional * ((beta_k / sigma) * phi_z)
    
    rhs = np.mean(term1_i + term2_i)
    diff = lhs - rhs
    
    print(f'{name:<22} {lhs:>14.6f} {rhs:>14.6f} {diff:>14.8f}')

print(f'\nAll differences should be near zero (within numerical precision).')
print(f'This confirms the McDonald-Moffitt decomposition identity.')

In [None]:
# Step 8: Create a comprehensive summary table

print('\n' + '=' * 100)
print('COMPREHENSIVE SUMMARY: ALL MARGINAL EFFECTS')
print('=' * 100)

print(f'\n{"Variable":<22} {"Tobit beta":>12} {"Uncond. ME":>12} {"Cond. ME":>12} '
      f'{"Prob. ME":>12} {"Ratio U/beta":>12}')
print('-' * 84)

for name in feature_cols:
    idx = var_names.index(name)
    b = beta[idx]
    me_u = manual_ame['unconditional'][name]
    me_c = manual_ame['conditional'][name]
    me_p = manual_ame['probability'][name]
    ratio = me_u / b if abs(b) > 1e-10 else np.nan
    
    print(f'{name:<22} {b:>12.4f} {me_u:>12.4f} {me_c:>12.4f} {me_p:>12.4f} {ratio:>12.4f}')

print(f'\nThe Ratio U/beta column shows how much the unconditional ME is scaled')
print(f'relative to the raw Tobit coefficient. This ratio equals the average')
print(f'Phi(z_i), i.e., the average probability of being uncensored.')
print(f'Average P(y>0|X) = {Phi_z.mean():.4f}')
print(f'\nKey takeaway: The Tobit beta OVERESTIMATES the unconditional effect')
print(f'by a factor of roughly 1/{Phi_z.mean():.2f} = {1/Phi_z.mean():.2f}.')

In [None]:
# Step 9: Visualize the three types of marginal effects

fig, axes = plt.subplots(1, 2, figsize=(16, 6))

# Panel 1: Comparison of all three ME types
ax = axes[0]
x_pos = np.arange(len(feature_cols))
width = 0.25

me_u_vals = [manual_ame['unconditional'][name] for name in feature_cols]
me_c_vals = [manual_ame['conditional'][name] for name in feature_cols]
me_p_vals = [manual_ame['probability'][name] for name in feature_cols]

ax.bar(x_pos - width, me_u_vals, width, label='Unconditional (dE[y|X]/dx)',
       color='steelblue', edgecolor='white')
ax.bar(x_pos, me_c_vals, width, label='Conditional (dE[y|y>0,X]/dx)',
       color='darkorange', edgecolor='white')
ax.bar(x_pos + width, me_p_vals, width, label='Probability (dP(y>0|X)/dx)',
       color='mediumseagreen', edgecolor='white')

ax.set_xlabel('Variable')
ax.set_ylabel('Average Marginal Effect')
ax.set_title('Manual McDonald-Moffitt Decomposition\nThree Types of Marginal Effects')
ax.set_xticks(x_pos)
ax.set_xticklabels(feature_cols, rotation=45, ha='right')
ax.legend(loc='lower left', fontsize=9)
ax.axhline(y=0, color='gray', linestyle='-', linewidth=0.5)

# Panel 2: Distribution of observation-level unconditional ME for 'children'
ax = axes[1]
beta_children = beta[var_names.index('children')]
me_children_i = beta_children * Phi_z

ax.hist(me_children_i, bins=40, color='steelblue', edgecolor='white', alpha=0.8)
ax.axvline(x=np.mean(me_children_i), color='red', linestyle='--', linewidth=2,
           label=f'AME = {np.mean(me_children_i):.4f}')
ax.set_xlabel('Unconditional ME of Children')
ax.set_ylabel('Frequency')
ax.set_title('Distribution of Observation-Level\nUnconditional ME for Children')
ax.legend()

plt.tight_layout()
plt.savefig(FIGURES_DIR / '01_ex4_manual_decomposition.png', dpi=150, bbox_inches='tight')
plt.show()

print('Figure saved.')

### Interpretation

The manual computation confirms several important properties of the Tobit model:

1. **Manual vs. PanelBox agreement**: The manually computed AMEs match the PanelBox
   output to machine precision, confirming both the formulas and the implementation.

2. **Unconditional ME < Tobit beta**: The unconditional marginal effect is always
   smaller in absolute value than the Tobit coefficient. The scaling factor is
   $\Phi(z_i)$, the probability of being uncensored. This reflects the fact that
   a change in $x_k$ only affects observed hours through the "uncensored" channel.

3. **Conditional ME < Tobit beta**: The conditional marginal effect is also attenuated
   relative to the Tobit beta, but by a different (and generally smaller) factor.
   The scaling factor $[1 - \lambda(z)(z + \lambda(z))]$ is always between 0 and 1.

4. **Probability ME**: The probability marginal effect shows how a one-unit change
   in $x_k$ shifts the probability of participating in the labor force. These effects
   are on a different scale (probability units rather than hours).

5. **McDonald-Moffitt decomposition**: The identity
   $\frac{\partial E[y|X]}{\partial x_k} = P(y>0|X) \cdot \frac{\partial E[y|y>0,X]}{\partial x_k} + E[y|y>0,X] \cdot \frac{\partial P(y>0|X)}{\partial x_k}$
   holds exactly. This decomposition shows that a change in $x_k$ affects observed
   hours through two channels:
   - The **intensive margin**: changing hours among those who already work
   - The **extensive margin**: changing the probability of working at all

6. **Heterogeneity**: The observation-level marginal effects vary across individuals.
   The histogram for the children variable shows that the negative effect of children
   on hours is larger (more negative) for individuals with higher baseline probability
   of working.

---

## Summary

### What We Covered

| Exercise | Topic | Key Result |
|----------|-------|------------|
| 1 | Variable Selection | LR test determines if restricted model is adequate |
| 2 | Prediction Profiles | E[y\|X] = P(y>0\|X) * E[y\|y>0,X] verified |
| 3 | Right-Censoring | Right-censored Tobit recovers latent effects at the upper boundary |
| 4 | Manual Decomposition | Manual AMEs match PanelBox; decomposition identity holds exactly |

### Key Formulas Implemented

| Quantity | Formula |
|----------|--------|
| Unconditional ME | $\beta_k \cdot \Phi(z)$ |
| Conditional ME | $\beta_k \cdot [1 - \lambda(z)(z + \lambda(z))]$ |
| Probability ME | $(\beta_k / \sigma) \cdot \phi(z)$ |
| Inverse Mills Ratio | $\lambda(z) = \phi(z) / \Phi(z)$ |
| LR Test | $-2(\ln L_R - \ln L_U) \sim \chi^2(q)$ |

where $z = \mathbf{X}'\boldsymbol{\beta}/\sigma$.

---

**End of Solutions for Notebook 01: Introduction to Tobit Models**