# Complete Discrete Choice Case Study: Work Mode Choice

**Tutorial Series**: Discrete Choice Econometrics with PanelBox

**Notebook**: 09 - Complete Case Study

**Author**: PanelBox Contributors

**Date**: 2026-02-17

**Duration**: ~120 minutes

**Difficulty**: Advanced

---

## Learning Objectives

By the end of this notebook, you will be able to:

1. Conduct a complete empirical analysis from data exploration to policy recommendations
2. Systematically estimate and compare multiple discrete choice models
3. Select the best model based on diagnostic tests and information criteria
4. Compute and compare marginal effects across different model specifications
5. Generate counterfactual predictions for policy analysis
6. Produce publication-quality results tables and visualizations

---

## Table of Contents

1. [Research Context and Questions](#section-1)
2. [Exploratory Analysis](#section-2)
3. [Modeling Strategy](#section-3)
4. [Model 1: Pooled Logit (Baseline)](#section-4)
5. [Model 2: Fixed Effects Logit](#section-5)
6. [Model 3: RE Probit + CRE](#section-6)
7. [Model 4: Multinomial Logit](#section-7)
8. [Model 5: Dynamic Binary](#section-8)
9. [Model Comparison](#section-9)
10. [Marginal Effects Comparison](#section-10)
11. [Counterfactual Predictions](#section-11)
12. [Validation and Robustness](#section-12)
13. [Conclusions and Policy Recommendations](#section-13)

---

## Prerequisites

- **Required**: All previous notebooks (01-08)
- **Conceptual**: Full discrete choice toolkit (binary, multinomial, ordered, dynamic)
- **Technical**: Model comparison, specification testing, report generation

In [None]:
# Setup
import warnings
from pathlib import Path

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.patches as mpatches
import seaborn as sns
from scipy.stats import norm, chi2
from scipy.special import expit
import statsmodels.api as sm

# PanelBox models
from panelbox.models.discrete.binary import (
    PooledLogit, PooledProbit, FixedEffectsLogit, RandomEffectsProbit
)
from panelbox.models.discrete.multinomial import MultinomialLogit
from panelbox.models.discrete.dynamic import DynamicBinaryPanel

warnings.filterwarnings('ignore')
np.random.seed(42)
pd.set_option('display.max_columns', None)
pd.set_option('display.precision', 4)

# Visualization settings
plt.style.use('seaborn-v0_8-darkgrid')
plt.rcParams['figure.figsize'] = (12, 6)
plt.rcParams['font.size'] = 11
plt.rcParams['axes.labelsize'] = 12
plt.rcParams['axes.titlesize'] = 14
plt.rcParams['xtick.labelsize'] = 10
plt.rcParams['ytick.labelsize'] = 10
plt.rcParams['legend.fontsize'] = 10

# Paths
DATA_DIR = Path("..") / "data"
OUTPUT_DIR = Path("..") / "outputs"
FIG_DIR = OUTPUT_DIR / "figures"
TABLE_DIR = OUTPUT_DIR / "tables"
REPORT_DIR = OUTPUT_DIR / "reports"

FIG_DIR.mkdir(parents=True, exist_ok=True)
TABLE_DIR.mkdir(parents=True, exist_ok=True)
REPORT_DIR.mkdir(parents=True, exist_ok=True)

# Color scheme for work modes
MODE_COLORS = {0: '#e74c3c', 1: '#f39c12', 2: '#2ecc71'}
MODE_LABELS = {0: 'On-site', 1: 'Hybrid', 2: 'Remote'}

print("Setup complete.")

<a id='section-1'></a>
## Section 1: Research Context and Questions (15 min)

### The Post-Pandemic Work Mode Revolution

The COVID-19 pandemic fundamentally altered how people work. Before 2020, remote work was rare
outside the technology sector. The forced experiment of lockdowns revealed that many workers
could be productive remotely, leading to lasting changes in work arrangements.

### Research Questions

1. **What factors determine remote vs on-site work choice?**
2. **Is there unobserved heterogeneity** (individual preference for remote work)?
3. **Does experience with remote work in $t-1$ increase adoption in $t$?** (state dependence)
4. **How do marginal effects differ across model specifications?**
5. **What would happen if commute times increased by 20 minutes?** (counterfactual)

### Dataset

We analyze a panel of 2,000 workers observed from 2019 to 2023, choosing among three work modes:
- **On-site** (mode = 0): Traditional office work
- **Hybrid** (mode = 1): Mix of office and remote days
- **Remote** (mode = 2): Fully remote work

In [None]:
# Load data
data = pd.read_csv(DATA_DIR / "work_mode_panel.csv")

print(f"Workers:      {data['worker_id'].nunique()}")
print(f"Years:        {sorted(data['year'].unique())}")
print(f"Observations: {len(data)}")
print(f"\nVariables:    {list(data.columns)}")
print(f"\nMode distribution:")
for mode_val, label in MODE_LABELS.items():
    count = (data['mode'] == mode_val).sum()
    pct = count / len(data) * 100
    print(f"  {label:10s}: {count:5d} ({pct:.1f}%)")

In [None]:
# Summary statistics
print("=" * 70)
print("Summary Statistics")
print("=" * 70)
print(data.describe().round(2))

<a id='section-2'></a>
## Section 2: Exploratory Analysis (20 min)

Before modeling, we need to understand the data patterns. Key questions:
- How did mode shares evolve over time (pandemic effect)?
- How persistent are individual choices?
- What characteristics differ across modes?

In [None]:
# 2.1 Mode distribution over time (stacked area chart)
mode_by_year = pd.crosstab(data['year'], data['mode'], normalize='index')

fig, axes = plt.subplots(1, 2, figsize=(16, 6))

# Stacked area chart
ax = axes[0]
years = mode_by_year.index
ax.stackplot(years,
             mode_by_year[0], mode_by_year[1], mode_by_year[2],
             labels=['On-site', 'Hybrid', 'Remote'],
             colors=['#e74c3c', '#f39c12', '#2ecc71'],
             alpha=0.8)
ax.set_xlabel('Year')
ax.set_ylabel('Share')
ax.set_title('Work Mode Distribution Over Time', fontweight='bold')
ax.legend(loc='center right')
ax.set_ylim(0, 1)
ax.axvspan(2020, 2021, alpha=0.1, color='gray', label='Pandemic')

# Bar chart version
ax = axes[1]
x = np.arange(len(years))
width = 0.25
for mode_val, label in MODE_LABELS.items():
    ax.bar(x + mode_val * width, mode_by_year[mode_val], width,
           label=label, color=MODE_COLORS[mode_val], alpha=0.8)
ax.set_xlabel('Year')
ax.set_ylabel('Share')
ax.set_title('Mode Shares by Year', fontweight='bold')
ax.set_xticks(x + width)
ax.set_xticklabels(years)
ax.legend()

plt.tight_layout()
plt.savefig(FIG_DIR / '09_exploratory_mode_distribution.png', dpi=150, bbox_inches='tight')
plt.show()
print("Figure saved to outputs/figures/09_exploratory_mode_distribution.png")

In [None]:
# 2.2 Transition matrix
data_sorted = data.sort_values(['worker_id', 'year'])
data_sorted['mode_lag'] = data_sorted.groupby('worker_id')['mode'].shift(1)

trans_data = data_sorted.dropna(subset=['mode_lag'])
trans_matrix = pd.crosstab(
    trans_data['mode_lag'].map(MODE_LABELS),
    trans_data['mode'].map(MODE_LABELS),
    normalize='index'
)

print("Transition Matrix (row = mode_{t-1}, col = mode_t):")
print(trans_matrix.round(3))

fig, ax = plt.subplots(figsize=(8, 6))
sns.heatmap(trans_matrix, annot=True, fmt='.3f', cmap='YlOrRd',
            linewidths=2, ax=ax, vmin=0, vmax=0.7)
ax.set_title('Work Mode Transition Matrix', fontweight='bold')
ax.set_xlabel('Mode at $t$')
ax.set_ylabel('Mode at $t-1$')
plt.tight_layout()
plt.savefig(FIG_DIR / '09_exploratory_transition_matrix.png', dpi=150, bbox_inches='tight')
plt.show()
print("Figure saved to outputs/figures/09_exploratory_transition_matrix.png")

In [None]:
# 2.3 Within vs between variation
modes_per_worker = data.groupby('worker_id')['mode'].nunique()
n_switchers = (modes_per_worker > 1).sum()
n_total = len(modes_per_worker)

print(f"Workers who switch modes: {n_switchers} / {n_total} ({n_switchers/n_total:.1%})")
print(f"Workers always same mode: {n_total - n_switchers} / {n_total}")
print(f"\nDistribution of distinct modes per worker:")
print(modes_per_worker.value_counts().sort_index())

In [None]:
# 2.4 Characteristics by mode
chars_by_mode = data.groupby('mode')[[
    'prod_remote', 'commute', 'kids', 'age', 'educ', 'income', 'tech_job'
]].mean()
chars_by_mode.index = [MODE_LABELS[i] for i in chars_by_mode.index]

print("Mean Characteristics by Work Mode:")
print(chars_by_mode.round(2))

In [None]:
# 2.5 Box plots for key variables by mode
fig, axes = plt.subplots(2, 2, figsize=(14, 10))

vars_to_plot = ['prod_remote', 'commute', 'age', 'income']
titles = ['Remote Productivity Score', 'Commute Time (min)',
          'Age (years)', 'Monthly Income']

for ax, var, title in zip(axes.flat, vars_to_plot, titles):
    mode_data = [data[data['mode'] == m][var] for m in [0, 1, 2]]
    bp = ax.boxplot(mode_data, labels=['On-site', 'Hybrid', 'Remote'],
                    patch_artist=True)
    for patch, color in zip(bp['boxes'], [MODE_COLORS[0], MODE_COLORS[1], MODE_COLORS[2]]):
        patch.set_facecolor(color)
        patch.set_alpha(0.7)
    ax.set_title(title, fontweight='bold')
    ax.grid(True, alpha=0.3)

plt.suptitle('Worker Characteristics by Work Mode', fontsize=16,
             fontweight='bold', y=1.01)
plt.tight_layout()
plt.savefig(FIG_DIR / '09_exploratory_characteristics.png', dpi=150, bbox_inches='tight')
plt.show()
print("Figure saved to outputs/figures/09_exploratory_characteristics.png")

### Key Exploratory Findings

**Discuss the following based on the output above:**

1. How did the pandemic affect mode shares?
2. Which mode shows the strongest persistence (diagonal of transition matrix)?
3. What characteristics differ most across modes?
4. How many workers switch modes? Is this sufficient for FE estimation?

<a id='section-3'></a>
## Section 3: Modeling Strategy (15 min)

We will estimate 7 models of increasing complexity:

| # | Model | Type | Purpose |
|---|-------|------|------|
| 1 | Pooled Logit | Binary | Baseline (remote vs not) |
| 2 | Pooled Probit | Binary | Robustness check |
| 3 | FE Logit | Binary | Control unobserved heterogeneity |
| 4 | RE Probit | Binary | Keep all observations |
| 5 | CRE Probit | Binary | Test exogeneity |
| 6 | Multinomial Logit | Multi-category | 3 modes simultaneously |
| 7 | Dynamic Binary | Binary | State dependence |

### Decision Tree

```
Start: Pooled Logit (baseline)
  |
  +-- Are coefficients biased by unobserved heterogeneity?
  |     +-- YES: FE Logit (but lose time-invariant variables)
  |     +-- MAYBE: RE Probit (keep everything, test with CRE)
  |
  +-- Are all three modes meaningfully different?
  |     +-- YES: Multinomial Logit (don't force binary aggregation)
  |
  +-- Does past behavior predict current choice?
        +-- YES: Dynamic model (state dependence)
```

**For the binary models (1-5, 7), we define**: `remote = 1 if mode == 2, else 0`

In [None]:
# Create binary outcome: remote (1) vs not-remote (0)
data['remote'] = (data['mode'] == 2).astype(int)

print(f"Binary outcome: remote")
print(f"  P(remote=1): {data['remote'].mean():.3f}")
print(f"  P(remote=0): {1 - data['remote'].mean():.3f}")

<a id='section-4'></a>
## Section 4: Model 1 — Pooled Logit (15 min)

The simplest binary model. Pools all observations and ignores the panel structure
(except for cluster-robust standard errors).

$$P(\text{remote}_{it} = 1 \mid X_{it}) = \Lambda(X_{it}'\beta)$$

In [None]:
# Model 1: Pooled Logit
model_pooled = PooledLogit(
    "remote ~ prod_remote + commute + kids + age + educ + tech_job",
    data, "worker_id", "year"
)
results_pooled = model_pooled.fit(cov_type='cluster')

print("=" * 70)
print(" " * 15 + "MODEL 1: POOLED LOGIT")
print("=" * 70)
print(results_pooled.summary())

In [None]:
# Model 2: Pooled Probit (robustness)
model_probit = PooledProbit(
    "remote ~ prod_remote + commute + kids + age + educ + tech_job",
    data, "worker_id", "year"
)
results_probit = model_probit.fit(cov_type='cluster')

print("=" * 70)
print(" " * 15 + "MODEL 2: POOLED PROBIT")
print("=" * 70)
print(results_probit.summary())

In [None]:
# Quick comparison: Logit vs Probit coefficients
# Probit coefficients should be roughly Logit / 1.6
print("\nLogit vs Probit coefficient comparison:")
print(f"{'Variable':>15s} {'Logit':>10s} {'Probit':>10s} {'Ratio':>10s}")
print("-" * 50)
for var in results_pooled.params.index:
    b_logit = results_pooled.params[var]
    b_probit = results_probit.params[var]
    ratio = b_logit / b_probit if abs(b_probit) > 0.001 else np.nan
    print(f"{var:>15s} {b_logit:>10.4f} {b_probit:>10.4f} {ratio:>10.2f}")
print(f"\nExpected ratio (logit/probit): ~1.6")

<a id='section-5'></a>
## Section 5: Model 3 — Fixed Effects Logit (15 min)

Controls for time-invariant unobserved worker preferences using conditional MLE
(Chamberlain, 1980). Only workers who **switch** between remote and non-remote
contribute to estimation. Time-invariant variables (age base, educ, tech_job)
are absorbed.

$$P(\text{remote}_{it} = 1 \mid X_{it}, \alpha_i) = \Lambda(X_{it}'\beta + \alpha_i)$$

In [None]:
# Model 3: Fixed Effects Logit
# Only include time-varying variables (age, educ, tech_job are absorbed)
model_fe = FixedEffectsLogit(
    "remote ~ prod_remote + commute + kids",
    data, "worker_id", "year"
)
results_fe = model_fe.fit()

print("=" * 70)
print(" " * 15 + "MODEL 3: FIXED EFFECTS LOGIT")
print("=" * 70)
print(results_fe.summary())
print(f"\nWorkers used (switchers): {results_fe.n_used_entities} / {results_fe.n_entities}")
print(f"Utilization rate: {results_fe.n_used_entities / results_fe.n_entities:.1%}")

### FE vs Pooled Comparison

Compare the coefficients of `prod_remote`, `commute`, and `kids` between
the Pooled Logit and the FE Logit. What changed? Why?

In [None]:
# Compare FE vs Pooled for common variables
common_vars = ['prod_remote', 'commute', 'kids']
print(f"{'Variable':>15s} {'Pooled':>10s} {'FE':>10s} {'Change':>10s}")
print("-" * 50)
for var in common_vars:
    b_pooled = results_pooled.params[var]
    b_fe = results_fe.params[var]
    change = ((b_fe - b_pooled) / abs(b_pooled)) * 100
    print(f"{var:>15s} {b_pooled:>10.4f} {b_fe:>10.4f} {change:>+9.1f}%")

<a id='section-6'></a>
## Section 6: Models 4 & 5 — RE Probit + CRE (20 min)

### Random Effects Probit (Model 4)

Assumes unobserved heterogeneity $\alpha_i \sim N(0, \sigma_\alpha^2)$
is **uncorrelated** with regressors. Retains all observations and time-invariant variables.

### Correlated Random Effects (Model 5)

Adds Mundlak (1978) terms $\bar{X}_i$ to allow $\alpha_i$ to correlate with $X$.
If the time-mean coefficients are jointly significant, RE is inconsistent.

In [None]:
# Model 4: Random Effects Probit
model_re = RandomEffectsProbit(
    "remote ~ prod_remote + commute + kids + age + educ + tech_job",
    data, "worker_id", "year"
)
results_re = model_re.fit()

print("=" * 70)
print(" " * 15 + "MODEL 4: RANDOM EFFECTS PROBIT")
print("=" * 70)
print(results_re.summary())

In [None]:
# Model 5: Correlated Random Effects (CRE) Probit
# Add Mundlak terms: time means of time-varying variables
mundlak_vars = ['prod_remote', 'commute', 'kids']
for var in mundlak_vars:
    data[f'{var}_mean'] = data.groupby('worker_id')[var].transform('mean')

model_cre = RandomEffectsProbit(
    "remote ~ prod_remote + commute + kids + age + educ + tech_job + "
    "prod_remote_mean + commute_mean + kids_mean",
    data, "worker_id", "year"
)
results_cre = model_cre.fit()

print("=" * 70)
print(" " * 15 + "MODEL 5: CORRELATED RANDOM EFFECTS PROBIT")
print("=" * 70)
print(results_cre.summary())

In [None]:
# Wald test: Are Mundlak terms jointly significant?
# H0: prod_remote_mean = commute_mean = kids_mean = 0
# If rejected, RE is inconsistent -> need CRE or FE

# Your analysis here: examine the CRE coefficients on the mean terms
print("\nMundlak terms (test for endogeneity):")
for var in ['prod_remote_mean', 'commute_mean', 'kids_mean']:
    coef = results_cre.params[var]
    se = results_cre.std_errors[var]
    z = coef / se
    p = 2 * (1 - norm.cdf(abs(z)))
    sig = '***' if p < 0.01 else '**' if p < 0.05 else '*' if p < 0.1 else ''
    print(f"  {var:>20s}: {coef:>8.4f} (SE={se:.4f}, z={z:.2f}, p={p:.4f}) {sig}")

<a id='section-7'></a>
## Section 7: Model 6 — Multinomial Logit (25 min)

Instead of collapsing to binary (remote vs not), we model all three modes simultaneously.
This avoids information loss and allows us to see what distinguishes hybrid from remote.

$$P(\text{mode}_{it} = j \mid X_{it}) = \frac{\exp(X_{it}'\beta_j)}{\sum_{k=0}^{2} \exp(X_{it}'\beta_k)}$$

with on-site ($j=0$) as the reference category.

In [None]:
# Model 6: Multinomial Logit
exog_vars_mnl = ['prod_remote', 'commute', 'kids', 'age', 'educ', 'tech_job']
X_mnl = data[exog_vars_mnl].values

model_mnl = MultinomialLogit(
    endog=data['mode'].values,
    exog=X_mnl,
    base_alternative=0,  # on-site = reference
    method='pooled'
)
model_mnl.exog_names = exog_vars_mnl

np.random.seed(42)
results_mnl = model_mnl.fit()

print("=" * 70)
print(" " * 15 + "MODEL 6: MULTINOMIAL LOGIT")
print("=" * 70)
print(f"Log-likelihood: {results_mnl.llf:.2f}")
print(f"AIC: {results_mnl.aic:.2f}")
print(f"BIC: {results_mnl.bic:.2f}")
print(f"Pseudo R²: {results_mnl.pseudo_r2:.4f}")
print(f"Accuracy: {results_mnl.accuracy:.3f}")
print(f"Converged: {results_mnl.converged}")

In [None]:
# Display MNL coefficients by alternative
print("\nMNL Coefficients (relative to On-site):")
print(f"{'Variable':>15s} {'Hybrid':>12s} {'Remote':>12s}")
print("-" * 42)
for k, var in enumerate(exog_vars_mnl):
    b_hybrid = results_mnl.params_matrix[0, k]  # Alternative 1 (hybrid)
    b_remote = results_mnl.params_matrix[1, k]  # Alternative 2 (remote)
    print(f"{var:>15s} {b_hybrid:>12.4f} {b_remote:>12.4f}")

In [None]:
# MNL Marginal Effects (AME)
me_mnl = results_mnl.marginal_effects(at='overall')

print("\nAverage Marginal Effects (MNL):")
print(f"{'Variable':>15s} {'dP(On-site)':>12s} {'dP(Hybrid)':>12s} {'dP(Remote)':>12s}")
print("-" * 55)

if isinstance(me_mnl, dict):
    for k, var in enumerate(exog_vars_mnl):
        vals = [me_mnl.get(j, np.zeros(len(exog_vars_mnl)))[k] for j in range(3)]
        print(f"{var:>15s} {vals[0]:>12.4f} {vals[1]:>12.4f} {vals[2]:>12.4f}")
else:
    # If it returns an array
    me_avg = me_mnl.mean(axis=0) if me_mnl.ndim == 3 else me_mnl
    for k, var in enumerate(exog_vars_mnl):
        if me_avg.ndim == 2:
            vals = me_avg[:, k]
        else:
            vals = me_avg[k] if me_avg.ndim == 1 else [0, 0, 0]
        print(f"{var:>15s}", end="")
        for v in (vals if hasattr(vals, '__iter__') else [vals]):
            print(f" {v:>12.4f}", end="")
        print()

<a id='section-8'></a>
## Section 8: Model 7 — Dynamic Binary (25 min)

Does experience with remote work in $t-1$ causally increase remote adoption in $t$?
We estimate a dynamic binary model with Wooldridge (2005) initial conditions.

$$P(\text{remote}_{it} = 1 \mid X_{it}, y_{i,t-1}, \alpha_i) = \Phi(X_{it}'\beta + \gamma \cdot y_{i,t-1} + \alpha_i)$$

If $\gamma > 0$, there is **true state dependence**: past remote work causes future remote work.

In [None]:
# Prepare data for dynamic model
data_sorted = data.sort_values(['worker_id', 'year']).copy()

# The DynamicBinaryPanel model handles lags internally
exog_vars_dyn = ['prod_remote', 'commute', 'kids']

model_dyn = DynamicBinaryPanel(
    endog=data_sorted['remote'].values,
    exog=data_sorted[exog_vars_dyn].values,
    entity=data_sorted['worker_id'].values,
    time=data_sorted['year'].values,
    initial_conditions='wooldridge',
    effects='random'
)

results_dyn = model_dyn.fit()

print("=" * 70)
print(" " * 15 + "MODEL 7: DYNAMIC BINARY PANEL")
print("=" * 70)
print(results_dyn.summary())

In [None]:
# Interpretation of state dependence
gamma = results_dyn.gamma
print(f"State dependence (gamma): {gamma:.4f}")
print(f"\nInterpretation:")
if gamma > 0:
    print(f"  Having worked remotely in t-1 increases the latent utility")
    print(f"  of remote work in t by {gamma:.4f} standard deviations.")
    print(f"  This suggests TRUE state dependence: the pandemic forced")
    print(f"  remote work experience, which created lasting shifts.")
else:
    print(f"  No significant state dependence detected.")

if hasattr(results_dyn, 'sigma_u'):
    print(f"\nRandom effects std (sigma_u): {results_dyn.sigma_u:.4f}")

if hasattr(results_dyn, 'delta_y0'):
    print(f"Initial conditions (delta_y0): {results_dyn.delta_y0:.4f}")

<a id='section-9'></a>
## Section 9: Model Comparison (20 min)

Now we systematically compare all models to select the preferred specification.

In [None]:
# Build master comparison table
comparison = {}

# Pooled Logit
comparison['Pooled Logit'] = {
    'prod_remote': results_pooled.params.get('prod_remote', np.nan),
    'commute': results_pooled.params.get('commute', np.nan),
    'kids': results_pooled.params.get('kids', np.nan),
    'age': results_pooled.params.get('age', np.nan),
    'educ': results_pooled.params.get('educ', np.nan),
    'tech_job': results_pooled.params.get('tech_job', np.nan),
    'gamma (lag)': np.nan,
    'Log-L': results_pooled.llf,
}

# Pooled Probit
comparison['Pooled Probit'] = {
    'prod_remote': results_probit.params.get('prod_remote', np.nan),
    'commute': results_probit.params.get('commute', np.nan),
    'kids': results_probit.params.get('kids', np.nan),
    'age': results_probit.params.get('age', np.nan),
    'educ': results_probit.params.get('educ', np.nan),
    'tech_job': results_probit.params.get('tech_job', np.nan),
    'gamma (lag)': np.nan,
    'Log-L': results_probit.llf,
}

# FE Logit
comparison['FE Logit'] = {
    'prod_remote': results_fe.params.get('prod_remote', np.nan),
    'commute': results_fe.params.get('commute', np.nan),
    'kids': results_fe.params.get('kids', np.nan),
    'age': np.nan,  # absorbed
    'educ': np.nan,  # absorbed
    'tech_job': np.nan,  # absorbed
    'gamma (lag)': np.nan,
    'Log-L': results_fe.llf,
}

# RE Probit
comparison['RE Probit'] = {
    'prod_remote': results_re.params.get('prod_remote', np.nan),
    'commute': results_re.params.get('commute', np.nan),
    'kids': results_re.params.get('kids', np.nan),
    'age': results_re.params.get('age', np.nan),
    'educ': results_re.params.get('educ', np.nan),
    'tech_job': results_re.params.get('tech_job', np.nan),
    'gamma (lag)': np.nan,
    'Log-L': results_re.llf,
}

# CRE Probit
comparison['CRE Probit'] = {
    'prod_remote': results_cre.params.get('prod_remote', np.nan),
    'commute': results_cre.params.get('commute', np.nan),
    'kids': results_cre.params.get('kids', np.nan),
    'age': results_cre.params.get('age', np.nan),
    'educ': results_cre.params.get('educ', np.nan),
    'tech_job': results_cre.params.get('tech_job', np.nan),
    'gamma (lag)': np.nan,
    'Log-L': results_cre.llf,
}

# MNL (remote coefficients)
comparison['MNL (remote)'] = {
    'prod_remote': results_mnl.params_matrix[1, 0],
    'commute': results_mnl.params_matrix[1, 1],
    'kids': results_mnl.params_matrix[1, 2],
    'age': results_mnl.params_matrix[1, 3],
    'educ': results_mnl.params_matrix[1, 4],
    'tech_job': results_mnl.params_matrix[1, 5],
    'gamma (lag)': np.nan,
    'Log-L': results_mnl.llf,
}

# Dynamic
comparison['Dynamic'] = {
    'prod_remote': results_dyn.beta[0],
    'commute': results_dyn.beta[1],
    'kids': results_dyn.beta[2],
    'age': np.nan,
    'educ': np.nan,
    'tech_job': np.nan,
    'gamma (lag)': results_dyn.gamma,
    'Log-L': results_dyn.llf,
}

comparison_df = pd.DataFrame(comparison).round(4)

print("=" * 90)
print(" " * 25 + "MASTER MODEL COMPARISON")
print("=" * 90)
print(comparison_df.to_string())

# Save
comparison_df.to_csv(TABLE_DIR / '09_model_comparison.csv')
print(f"\nTable saved to outputs/tables/09_model_comparison.csv")

### Specification Tests

1. **FE vs RE**: Are the unobserved effects correlated with X?
2. **CRE Wald test**: Are Mundlak terms jointly significant?
3. **State dependence**: Is $\gamma = 0$?

In [None]:
# Specification tests
print("=" * 60)
print("SPECIFICATION TESTS")
print("=" * 60)

# Test 1: CRE Wald test (Mundlak terms = 0?)
print("\n1. CRE Wald Test (H0: Mundlak terms = 0)")
mundlak_coefs = []
mundlak_ses = []
for var in ['prod_remote_mean', 'commute_mean', 'kids_mean']:
    mundlak_coefs.append(results_cre.params[var])
    mundlak_ses.append(results_cre.std_errors[var])

# Individual Wald tests
chi2_sum = sum((c / s) ** 2 for c, s in zip(mundlak_coefs, mundlak_ses))
p_wald = 1 - chi2.cdf(chi2_sum, df=3)
print(f"  Chi² statistic: {chi2_sum:.4f}")
print(f"  p-value (df=3): {p_wald:.6f}")
print(f"  Conclusion: {'Reject H0 — RE inconsistent, use CRE/FE' if p_wald < 0.05 else 'Cannot reject H0 — RE may be consistent'}")

# Test 2: State dependence
print(f"\n2. State Dependence Test (H0: gamma = 0)")
print(f"  gamma = {results_dyn.gamma:.4f}")
print(f"  (Formal test requires standard errors from the dynamic model)")

<a id='section-10'></a>
## Section 10: Marginal Effects Comparison (20 min)

Coefficients from different models are not directly comparable (different scales).
Average Marginal Effects (AME) provide a common metric: the change in probability
for a unit change in X.

In [None]:
# Compute AME for binary models
# AME = average of individual marginal effects across all observations
# For logit:  AME_k = mean( Lambda(Xb) * (1 - Lambda(Xb)) ) * beta_k
# For probit: AME_k = mean( phi(Xb) ) * beta_k

from patsy import dmatrix

ame_vars = ['prod_remote', 'commute', 'kids', 'age', 'educ', 'tech_job']
ame_results = {}

# Build design matrix
X_design = dmatrix("prod_remote + commute + kids + age + educ + tech_job",
                   data, return_type='dataframe')

# Pooled Logit AME
Xb_logit = X_design.values @ results_pooled.params.values
p_logit = expit(Xb_logit)
scale_logit = np.mean(p_logit * (1 - p_logit))
ame_logit = {var: scale_logit * results_pooled.params[var] for var in ame_vars}
ame_results['Pooled Logit'] = ame_logit

# Pooled Probit AME
Xb_probit = X_design.values @ results_probit.params.values
scale_probit = np.mean(norm.pdf(Xb_probit))
ame_probit = {var: scale_probit * results_probit.params[var] for var in ame_vars}
ame_results['Pooled Probit'] = ame_probit

# RE Probit AME (approximate: ignore integration over alpha_i)
re_vars = [v for v in ame_vars if v in results_re.params.index]
re_params_for_X = results_re.params[['Intercept'] + re_vars]
X_re = dmatrix("prod_remote + commute + kids + age + educ + tech_job",
               data, return_type='dataframe')
Xb_re = X_re.values @ re_params_for_X.values
scale_re = np.mean(norm.pdf(Xb_re))
ame_re = {var: scale_re * results_re.params[var] for var in re_vars}
ame_results['RE Probit'] = ame_re

# CRE Probit AME
cre_vars_all = [v for v in results_cre.params.index if v != 'Intercept' and v != 'log_sigma_alpha']
X_cre = dmatrix("prod_remote + commute + kids + age + educ + tech_job + "
                "prod_remote_mean + commute_mean + kids_mean",
                data, return_type='dataframe')
cre_params_for_X = results_cre.params[['Intercept'] + cre_vars_all]
Xb_cre = X_cre.values @ cre_params_for_X.values
scale_cre = np.mean(norm.pdf(Xb_cre))
ame_cre = {var: scale_cre * results_cre.params[var] for var in ame_vars if var in results_cre.params.index}
ame_results['CRE Probit'] = ame_cre

ame_df = pd.DataFrame(ame_results).round(4)
print("\nAverage Marginal Effects Comparison (Binary Models):")
print(ame_df.to_string())

In [None]:
# AME of prod_remote across all models (including MNL and Dynamic)
key_var = 'prod_remote'
print(f"\nAME of {key_var} across ALL models:")
print(f"{'Model':>20s} {'Coefficient':>12s} {'AME':>12s}")
print("-" * 48)

# Binary models from AME computation above
for model_name in ['Pooled Logit', 'Pooled Probit', 'RE Probit', 'CRE Probit']:
    if model_name in ame_results and key_var in ame_results[model_name]:
        coef_map = {
            'Pooled Logit': results_pooled.params[key_var],
            'Pooled Probit': results_probit.params[key_var],
            'RE Probit': results_re.params[key_var],
            'CRE Probit': results_cre.params[key_var],
        }
        ame_val = ame_results[model_name][key_var]
        print(f"{model_name:>20s} {coef_map[model_name]:>12.4f} {ame_val:>12.4f}")

# FE Logit (no intercept, so AME approximation uses within-variation)
b_fe = results_fe.params[key_var]
print(f"{'FE Logit':>20s} {b_fe:>12.4f}      (cond.)")

# MNL (remote category AME)
me_remote = me_mnl  # from cell-29
if isinstance(me_remote, dict) and 2 in me_remote:
    ame_mnl_val = me_remote[2][0]
    print(f"{'MNL (P(remote))':>20s} {results_mnl.params_matrix[1, 0]:>12.4f} {ame_mnl_val:>12.4f}")
elif isinstance(me_remote, np.ndarray):
    if me_remote.ndim == 3:
        ame_mnl_val = me_remote[:, 2, 0].mean()
    elif me_remote.ndim == 2:
        ame_mnl_val = me_remote[2, 0] if me_remote.shape[0] == 3 else me_remote[0, 2]
    else:
        ame_mnl_val = np.nan
    print(f"{'MNL (P(remote))':>20s} {results_mnl.params_matrix[1, 0]:>12.4f} {ame_mnl_val:>12.4f}")

# Dynamic
b_dyn = results_dyn.beta[0]
print(f"{'Dynamic':>20s} {b_dyn:>12.4f}      (latent)")

In [None]:
# Forest plot: AME comparison (if available)
# This is a placeholder for when AME data is computed
fig, ax = plt.subplots(figsize=(10, 6))

# Example forest plot structure
models_list = ['Pooled Logit', 'Pooled Probit', 'FE Logit', 'RE Probit', 'CRE Probit']
# Coefficients as proxy (for visualization purposes)
coefs = [
    results_pooled.params.get('prod_remote', 0),
    results_probit.params.get('prod_remote', 0),
    results_fe.params.get('prod_remote', 0),
    results_re.params.get('prod_remote', 0),
    results_cre.params.get('prod_remote', 0),
]
ses = [
    results_pooled.std_errors.get('prod_remote', 0),
    results_probit.std_errors.get('prod_remote', 0),
    results_fe.std_errors.get('prod_remote', 0),
    results_re.std_errors.get('prod_remote', 0),
    results_cre.std_errors.get('prod_remote', 0),
]

y_pos = np.arange(len(models_list))
ax.barh(y_pos, coefs, xerr=[1.96 * s for s in ses], height=0.4,
        color=['#3498db', '#2ecc71', '#e74c3c', '#f39c12', '#9b59b6'],
        alpha=0.7, capsize=5)
ax.set_yticks(y_pos)
ax.set_yticklabels(models_list)
ax.set_xlabel('Coefficient on prod_remote')
ax.set_title('Coefficient Comparison: prod_remote Across Models', fontweight='bold')
ax.axvline(x=0, color='black', linestyle='--', alpha=0.5)
ax.grid(True, alpha=0.3)

plt.tight_layout()
plt.savefig(FIG_DIR / '09_ame_comparison_forest.png', dpi=150, bbox_inches='tight')
plt.show()
print("Figure saved to outputs/figures/09_ame_comparison_forest.png")

<a id='section-11'></a>
## Section 11: Counterfactual Predictions (20 min)

Using estimated models, we answer "what if" questions relevant to policy.

### Scenarios:
1. **Commute shock**: What if commute increases by 20 minutes for everyone?
2. **Technology improvement**: What if remote productivity increases by 2 points?
3. **Representative profiles**: Predict mode probabilities for different worker types

In [None]:
# Scenario 1: Commute shock (+20 min) using MNL
X_baseline = data[exog_vars_mnl].values.copy()
X_shock = X_baseline.copy()
commute_idx = exog_vars_mnl.index('commute')
X_shock[:, commute_idx] += 20  # Add 20 minutes

# Predict probabilities
probs_baseline = results_mnl.predict_proba(X_baseline)
probs_shock = results_mnl.predict_proba(X_shock)

print("=" * 60)
print("COUNTERFACTUAL: Commute +20 minutes")
print("=" * 60)
print(f"\n{'Mode':>10s} {'Baseline':>10s} {'Shock':>10s} {'Change':>10s}")
print("-" * 45)
for j, label in MODE_LABELS.items():
    base = probs_baseline[:, j].mean()
    shock = probs_shock[:, j].mean()
    change = shock - base
    print(f"{label:>10s} {base:>10.3f} {shock:>10.3f} {change:>+10.3f}")

In [None]:
# Scenario 2: Technology improvement (prod_remote +2)
X_tech = X_baseline.copy()
prod_idx = exog_vars_mnl.index('prod_remote')
X_tech[:, prod_idx] += 2

probs_tech = results_mnl.predict_proba(X_tech)

print("=" * 60)
print("COUNTERFACTUAL: Remote Productivity +2")
print("=" * 60)
print(f"\n{'Mode':>10s} {'Baseline':>10s} {'Improved':>10s} {'Change':>10s}")
print("-" * 45)
for j, label in MODE_LABELS.items():
    base = probs_baseline[:, j].mean()
    tech = probs_tech[:, j].mean()
    change = tech - base
    print(f"{label:>10s} {base:>10.3f} {tech:>10.3f} {change:>+10.3f}")

In [None]:
# Counterfactual visualization
fig, axes = plt.subplots(1, 2, figsize=(14, 6))

scenarios = [
    ('Commute +20 min', probs_baseline, probs_shock),
    ('Prod. Remote +2', probs_baseline, probs_tech),
]

for ax, (title, p_base, p_new) in zip(axes, scenarios):
    x = np.arange(3)
    width = 0.35
    base_means = [p_base[:, j].mean() for j in range(3)]
    new_means = [p_new[:, j].mean() for j in range(3)]

    bars1 = ax.bar(x - width/2, base_means, width, label='Baseline',
                   color='#3498db', alpha=0.8)
    bars2 = ax.bar(x + width/2, new_means, width, label='Counterfactual',
                   color='#e74c3c', alpha=0.8)

    ax.set_xticks(x)
    ax.set_xticklabels(['On-site', 'Hybrid', 'Remote'])
    ax.set_ylabel('Mean Predicted Probability')
    ax.set_title(title, fontweight='bold')
    ax.legend()
    ax.set_ylim(0, 0.6)
    ax.grid(True, alpha=0.3)

    # Annotate changes
    for j, (b, n) in enumerate(zip(base_means, new_means)):
        change = n - b
        ax.annotate(f'{change:+.3f}', xy=(j + width/2, n),
                    ha='center', va='bottom', fontsize=9, color='red')

plt.suptitle('Counterfactual Scenarios: Mode Share Changes',
             fontsize=15, fontweight='bold', y=1.02)
plt.tight_layout()
plt.savefig(FIG_DIR / '09_counterfactual_scenarios.png', dpi=150, bbox_inches='tight')
plt.show()
print("Figure saved to outputs/figures/09_counterfactual_scenarios.png")

In [None]:
# Scenario 3: Representative profiles
profiles = {
    'Young tech, short commute': [8.0, 15, 0, 28, 16, 1],
    'Mid-career, kids, long commute': [5.0, 60, 2, 40, 14, 0],
    'Older worker, no kids': [4.0, 30, 0, 55, 12, 0],
    'Avg tech worker': [7.5, 25, 1, 35, 16, 1],
}

print("=" * 70)
print("PREDICTED MODE PROBABILITIES FOR REPRESENTATIVE PROFILES")
print("=" * 70)
print(f"\n{'Profile':>35s} {'On-site':>10s} {'Hybrid':>10s} {'Remote':>10s}")
print("-" * 70)

profile_probs = {}
for name, vals in profiles.items():
    X_profile = np.array(vals).reshape(1, -1)
    probs = results_mnl.predict_proba(X_profile)[0]
    profile_probs[name] = probs
    print(f"{name:>35s} {probs[0]:>10.3f} {probs[1]:>10.3f} {probs[2]:>10.3f}")

In [None]:
# Stacked bar chart for representative profiles
fig, ax = plt.subplots(figsize=(12, 6))

profile_names = list(profiles.keys())
x = np.arange(len(profile_names))

bottoms = np.zeros(len(profile_names))
for j, (mode_val, label) in enumerate(MODE_LABELS.items()):
    heights = [profile_probs[name][mode_val] for name in profile_names]
    ax.bar(x, heights, bottom=bottoms, label=label,
           color=MODE_COLORS[mode_val], alpha=0.8)
    # Annotate
    for i, h in enumerate(heights):
        if h > 0.05:
            ax.text(i, bottoms[i] + h/2, f'{h:.0%}',
                    ha='center', va='center', fontweight='bold', fontsize=10)
    bottoms += heights

ax.set_xticks(x)
ax.set_xticklabels(profile_names, rotation=15, ha='right')
ax.set_ylabel('Predicted Probability')
ax.set_title('Predicted Mode Probabilities for Representative Workers',
             fontweight='bold')
ax.legend(loc='upper right')
ax.set_ylim(0, 1.05)

plt.tight_layout()
plt.savefig(FIG_DIR / '09_counterfactual_profiles.png', dpi=150, bbox_inches='tight')
plt.show()
print("Figure saved to outputs/figures/09_counterfactual_profiles.png")

<a id='section-12'></a>
## Section 12: Validation and Robustness (15 min)

We assess reliability through:
1. Temporal cross-validation (train 2019-2021, test 2022-2023)
2. Sensitivity to binary cutoff
3. Subsample analysis by sector

In [None]:
# 12.1 Temporal cross-validation
train = data[data['year'] <= 2021].copy()
test = data[data['year'] >= 2022].copy()

print(f"Train: {len(train)} obs ({train['year'].unique()})")
print(f"Test:  {len(test)} obs ({test['year'].unique()})")

# Train Pooled Logit on 2019-2021
model_train = PooledLogit(
    "remote ~ prod_remote + commute + kids + age + educ + tech_job",
    train, "worker_id", "year"
)
results_train = model_train.fit(cov_type='cluster')

# Predict on test set
# Build design matrix for test data using same formula
from patsy import dmatrix
X_test = dmatrix("prod_remote + commute + kids + age + educ + tech_job",
                 test, return_type='dataframe')
pred_probs = expit(X_test.values @ results_train.params.values)
pred_class = (pred_probs >= 0.5).astype(int)

accuracy = (pred_class == test['remote'].values).mean()
print(f"\nOut-of-sample accuracy (2022-2023): {accuracy:.3f}")
print(f"In-sample accuracy (2019-2021): baseline comparison")

In [None]:
# 12.2 Sensitivity: alternative binary cutoff
# Instead of remote vs not, try (hybrid+remote) vs on-site
data['flex_work'] = (data['mode'] >= 1).astype(int)

model_flex = PooledLogit(
    "flex_work ~ prod_remote + commute + kids + age + educ + tech_job",
    data, "worker_id", "year"
)
results_flex = model_flex.fit(cov_type='cluster')

print("\nSensitivity: Flexible Work (hybrid+remote) vs On-site")
print(f"P(flex_work=1): {data['flex_work'].mean():.3f}")
print(f"\n{'Variable':>15s} {'Remote vs Not':>12s} {'Flex vs OnSite':>14s}")
print("-" * 45)
for var in ['prod_remote', 'commute', 'kids', 'age', 'educ', 'tech_job']:
    b1 = results_pooled.params[var]
    b2 = results_flex.params[var]
    print(f"{var:>15s} {b1:>12.4f} {b2:>14.4f}")

In [None]:
# 12.3 Subsample by sector (tech vs non-tech)
for tech_val, label in [(1, 'Tech Workers'), (0, 'Non-Tech Workers')]:
    sub = data[data['tech_job'] == tech_val].copy()
    model_sub = PooledLogit(
        "remote ~ prod_remote + commute + kids + age + educ",
        sub, "worker_id", "year"
    )
    res_sub = model_sub.fit(cov_type='cluster')
    print(f"\n{label} (N={len(sub)}):")
    print(f"  P(remote): {sub['remote'].mean():.3f}")
    for var in ['prod_remote', 'commute', 'kids']:
        print(f"  {var}: {res_sub.params[var]:.4f} (SE={res_sub.std_errors[var]:.4f})")

<a id='section-13'></a>
## Section 13: Conclusions and Policy Recommendations (15 min)

### Main Findings

Based on the analysis above, summarize:

1. **What determines remote work adoption?**
2. **Is unobserved heterogeneity important?**
3. **Is there true state dependence?**
4. **Which model is preferred?**
5. **What do counterfactuals suggest for policy?**

In [None]:
# Generate final HTML report
report_html = f"""<!DOCTYPE html>
<html>
<head><title>Work Mode Choice Analysis - Complete Case Study</title>
<style>
body {{font-family: Arial, sans-serif; margin: 40px; max-width: 900px;}}
table {{border-collapse: collapse; margin: 20px 0; width: 100%;}}
th, td {{border: 1px solid #ddd; padding: 8px; text-align: right;}}
th {{background-color: #3498db; color: white;}}
h1 {{color: #2c3e50;}} h2 {{color: #3498db;}} h3 {{color: #27ae60;}}
.highlight {{background-color: #f1f8e9; padding: 15px; border-radius: 5px; margin: 15px 0;}}
</style></head>
<body>

<h1>Work Mode Choice Analysis: Complete Case Study</h1>
<p><em>Generated by PanelBox - Discrete Choice Econometrics Toolkit</em></p>

<h2>Dataset</h2>
<ul>
    <li>Workers: {data['worker_id'].nunique()}</li>
    <li>Years: {sorted(data['year'].unique())}</li>
    <li>Observations: {len(data)}</li>
    <li>Modes: On-site ({(data['mode']==0).mean():.1%}), 
              Hybrid ({(data['mode']==1).mean():.1%}), 
              Remote ({(data['mode']==2).mean():.1%})</li>
</ul>

<h2>Models Estimated</h2>
<p>Seven models of increasing complexity were estimated: Pooled Logit, 
Pooled Probit, FE Logit, RE Probit, CRE Probit, Multinomial Logit, 
and Dynamic Binary Panel.</p>

<h2>Key Findings</h2>
<div class="highlight">
<ul>
    <li><strong>Remote productivity</strong> is the strongest determinant of remote work adoption</li>
    <li><strong>Commute time</strong> positively affects remote work probability</li>
    <li><strong>State dependence</strong> (gamma = {results_dyn.gamma:.3f}): past remote experience 
        increases future adoption</li>
    <li><strong>Unobserved heterogeneity</strong> is significant (CRE test)</li>
    <li><strong>Technology workers</strong> have higher remote work probability</li>
</ul>
</div>

<h2>Policy Implications</h2>
<ul>
    <li><strong>For firms</strong>: Invest in remote tools and allow experimentation periods</li>
    <li><strong>For government</strong>: Broadband infrastructure and urban planning adjustments</li>
    <li><strong>For workers</strong>: Remote skills development increases long-term flexibility</li>
</ul>

<h2>Counterfactual Results</h2>
<p>A 20-minute commute increase would shift approximately 
{(probs_shock[:, 2].mean() - probs_baseline[:, 2].mean()):.1%} more workers to remote work.</p>

</body></html>"""

with open(REPORT_DIR / '09_complete_analysis.html', 'w') as f:
    f.write(report_html)

print("Report saved to outputs/reports/09_complete_analysis.html")

---

## Exercises

### Exercise 1: Replicate with Different Binary Cutoff (Medium)

Apply the full pipeline (Pooled Logit, FE Logit, RE Probit) using
`flex_work = (mode >= 1)` instead of `remote = (mode == 2)`. How do the
results change? Which specification gives better predictive accuracy?

In [None]:
# Exercise 1: Your solution here

# Step 1: Create flex_work variable
# data['flex_work'] = ...

# Step 2: Estimate Pooled Logit
# model_flex_logit = PooledLogit(...)

# Step 3: Estimate FE Logit
# model_flex_fe = FixedEffectsLogit(...)

# Step 4: Estimate RE Probit
# model_flex_re = RandomEffectsProbit(...)

# Step 5: Compare results

### Exercise 2: Ordered Alternative (Medium)

Treat work mode as ordered: on-site < hybrid < remote (ordered by flexibility).
Estimate Ordered Logit and compare with MNL. Which is more appropriate here?

In [None]:
# Exercise 2: Your solution here

# from panelbox.models.discrete.ordered import OrderedLogit

# Step 1: Estimate Ordered Logit
# model_ordered = OrderedLogit(
#     endog=data['mode'].values,
#     exog=data[exog_vars_mnl].values,
#     groups=data['worker_id'].values
# )
# results_ordered = model_ordered.fit()

# Step 2: Compare AIC/BIC with MNL

# Step 3: Discuss appropriateness

### Exercise 3: Heterogeneous Effects by Sector (Hard)

Estimate the Pooled Logit separately for tech vs non-tech workers.
Are the determinants of remote work different by sector?
Compute AME for each group and compare.

In [None]:
# Exercise 3: Your solution here

# Step 1: Split data by tech_job
# tech_data = data[data['tech_job'] == 1]
# nontech_data = data[data['tech_job'] == 0]

# Step 2: Estimate separate models
# model_tech = PooledLogit(...)
# model_nontech = PooledLogit(...)

# Step 3: Compare coefficients

# Step 4: Discuss differences

### Exercise 4: Temporal Cross-Validation (Medium)

Implement proper temporal cross-validation for the MNL model.
Train on 2019-2021, predict mode choice for 2022-2023.
Compute classification accuracy and confusion matrix.

In [None]:
# Exercise 4: Your solution here

# Step 1: Split data
# train_mnl = data[data['year'] <= 2021]
# test_mnl = data[data['year'] >= 2022]

# Step 2: Estimate MNL on training data

# Step 3: Predict on test data

# Step 4: Compute accuracy and confusion matrix

### Exercise 5: Executive Report (Hard)

Using PanelBox, generate a comprehensive HTML report with:
- Top 3 model comparison table
- Marginal effects for key variables
- One counterfactual scenario with visualization
- Target audience: non-technical managers

In [None]:
# Exercise 5: Your solution here

# Step 1: Select top 3 models

# Step 2: Build comparison table (use simple language)

# Step 3: Create counterfactual visualization

# Step 4: Generate HTML report
# report = """..."""
# with open(REPORT_DIR / '09_executive_report.html', 'w') as f:
#     f.write(report)

---

## Summary & Key Takeaways

### What We Learned

| Concept | Lesson |
|---------|--------|
| **Start simple** | Pooled Logit baseline before adding complexity |
| **Test assumptions** | CRE test for endogeneity, IIA for MNL |
| **FE requires switchers** | Check utilization rate before trusting FE |
| **Compare via AME** | Coefficients across models are not directly comparable |
| **Dynamic matters** | State dependence means temporary shocks have lasting effects |
| **Counterfactuals** | Models enable "what if" analysis for policy |

### Common Pitfalls

1. Starting with complex models without a simple baseline
2. Ignoring within variation — FE Logit requires switchers
3. Not testing assumptions (IIA for MNL, Mundlak for CRE)
4. Comparing incomparable metrics across model families
5. Over-interpreting dynamic models without addressing initial conditions
6. Forgetting cluster-robust standard errors in panel data

---

## References

### Essential Reading
- Wooldridge, J. M. (2010). *Econometric Analysis of Cross Section and Panel Data*. Ch. 15-16.
- Cameron, A. C., & Trivedi, P. K. (2005). *Microeconometrics*. Ch. 14-15.
- Train, K. (2009). *Discrete Choice Methods with Simulation*. Cambridge University Press.

### Additional References
- Greene, W. H. (2018). *Econometric Analysis*. Ch. 17-18.
- Chamberlain, G. (1980). Analysis of Covariance with Qualitative Data. *RES*, 47, 225-238.
- Mundlak, Y. (1978). On the Pooling of Time Series and Cross Section Data. *Econometrica*, 46, 69-85.
- Wooldridge, J. M. (2005). Simple Solutions to the Initial Conditions Problem. *JAE*, 20, 39-54.

---

**End of Notebook 09**