# Dynamic Discrete Choice Models

**Tutorial Series**: Discrete Choice Econometrics with PanelBox

**Notebook**: 08 - Dynamic Discrete Choice

**Author**: PanelBox Contributors

**Date**: 2026-02-17

**Estimated Duration**: 90 minutes

**Difficulty Level**: Advanced

---

## Learning Objectives

By the end of this notebook, you will be able to:

1. Distinguish true state dependence from spurious persistence (heterogeneity)
2. Understand the initial conditions problem (Heckman 1981)
3. Implement the Wooldridge (2005) approach for dynamic binary panels
4. Prepare data for dynamic estimation (lags, initial values, time means)
5. Interpret the state dependence parameter $\gamma$ and its economic meaning
6. Decompose persistence into state dependence and heterogeneity components
7. Simulate counterfactual trajectories

---

## Table of Contents

1. [State Dependence in Binary Outcomes](#section1)
2. [True State Dependence vs Spurious Persistence](#section2)
3. [The Initial Conditions Problem](#section3)
4. [Wooldridge (2005) Approach](#section4)
5. [Interpreting Results](#section5)
6. [Decomposition of Persistence](#section6)
7. [Simulated Trajectories](#section7)
8. [Application — Labor Force Participation Dynamics](#section8)
9. [Exercises](#exercises)

---

## Prerequisites

- **Required**: Notebook 01 (Binary Choice Introduction), Notebook 03 (Random Effects)
- **Recommended**: Notebook 04 (Marginal Effects)
- **Conceptual**: Autoregressive processes, endogeneity, initial conditions
- **Technical**: Panel data manipulation (lags, group transformations)

## Setup

Import all required libraries and configure the environment.

In [None]:
# Standard library imports
import warnings
from pathlib import Path

# Data manipulation and numerical computing
import numpy as np
import pandas as pd

# Visualization
import matplotlib.pyplot as plt
import matplotlib.patches as mpatches
import seaborn as sns

# Statistical functions
from scipy.stats import norm
from scipy.optimize import minimize
import statsmodels.api as sm

# PanelBox dynamic model
from panelbox.models.discrete.dynamic import DynamicBinaryPanel

# Configuration
warnings.filterwarnings('ignore')
np.random.seed(42)
pd.set_option('display.max_columns', None)
pd.set_option('display.precision', 4)

# Matplotlib configuration
plt.style.use('seaborn-v0_8-darkgrid')
plt.rcParams['figure.figsize'] = (12, 6)
plt.rcParams['font.size'] = 11
plt.rcParams['axes.labelsize'] = 12
plt.rcParams['axes.titlesize'] = 14
plt.rcParams['xtick.labelsize'] = 10
plt.rcParams['ytick.labelsize'] = 10
plt.rcParams['legend.fontsize'] = 10

# Paths
DATA_DIR = Path("..") / "data"
OUTPUT_DIR = Path("..") / "outputs"
FIG_DIR = OUTPUT_DIR / "figures"
TABLE_DIR = OUTPUT_DIR / "tables"
REPORT_DIR = OUTPUT_DIR / "reports"

# Create output directories if needed
FIG_DIR.mkdir(parents=True, exist_ok=True)
TABLE_DIR.mkdir(parents=True, exist_ok=True)
REPORT_DIR.mkdir(parents=True, exist_ok=True)

print("All libraries imported successfully")
print(f"Working directory: {Path.cwd()}")

<a id='section1'></a>

---

# Section 1: State Dependence in Binary Outcomes (20 min)

## 1.1 The Core Question

Consider women's labor force participation over time. We observe strong **persistence**: women who worked last year are much more likely to work this year. But **why**?

Two explanations:

1. **True state dependence**: Working last year *causally* increases the probability of working this year (skills, networks, employer signals)
2. **Spurious persistence**: Some women have persistent unobserved traits (motivation, ability) that make them always more likely to work

## 1.2 Static vs Dynamic Models

**Static model** (Notebooks 01-03):
$$P(y_{it} = 1 \mid X_{it}, \alpha_i)$$

No dynamic feedback — past outcomes don't directly affect current choices.

**Dynamic model**:
$$y^*_{it} = X_{it}'\beta + \gamma \cdot y_{i,t-1} + \alpha_i + \varepsilon_{it}$$
$$y_{it} = \mathbf{1}[y^*_{it} > 0]$$

Here $\gamma$ captures **state dependence**: the direct effect of past experience on current choices.

## 1.3 Economic Examples

| Context | State Dependence Mechanism |
|---------|---------------------------|
| Employment | Working builds skills, networks $\rightarrow$ easier to find job next period |
| Smoking | Habit formation, addiction |
| Poverty | Poverty traps, difficulty escaping |
| Technology adoption | Learning effects, switching costs |
| Health insurance | Lock-in, pre-existing conditions |

## 1.4 Load the Data

In [None]:
# Load labor dynamics panel data
data = pd.read_csv(DATA_DIR / "labor_dynamics.csv")

print("Dataset loaded successfully!")
print(f"\nShape: {data.shape}")
print(f"Number of women: {data['id'].nunique()}")
print(f"Number of periods: {data['year'].nunique()}")
print(f"Years: {data['year'].min()} - {data['year'].max()}")
print(f"\nFirst 10 rows:")
data.head(10)

In [None]:
# Summary statistics
print("=== Summary Statistics ===")
print(data.describe().round(3))

print(f"\nEmployment rate: {data['employed'].mean():.1%}")
print(f"Mean age: {data['age'].mean():.1f}")
print(f"Mean education: {data['educ'].mean():.1f} years")
print(f"Mean kids: {data['kids'].mean():.2f}")
print(f"Married: {data['married'].mean():.1%}")

In [None]:
# Observed persistence: transition matrix
data['emp_lag'] = data.groupby('id')['employed'].shift(1)

transitions = pd.crosstab(
    data['emp_lag'].dropna().astype(int),
    data['employed'],
    normalize='index'
)
transitions.index.name = 'emp(t-1)'
transitions.columns.name = 'emp(t)'

print("=== Transition Probabilities ===")
print(transitions.round(3))

p_stay_emp = transitions.loc[1, 1]
p_enter = transitions.loc[0, 1]
raw_persistence = p_stay_emp - p_enter

print(f"\nP(emp_t=1 | emp_{{t-1}}=1) = {p_stay_emp:.3f}")
print(f"P(emp_t=1 | emp_{{t-1}}=0) = {p_enter:.3f}")
print(f"Raw persistence gap: {raw_persistence:.3f}")
print(f"\nThis {raw_persistence:.0%} gap could reflect:")
print(f"  - True state dependence (causal effect of past employment)")
print(f"  - Unobserved heterogeneity (persistent individual traits)")
print(f"  - Or both!")

In [None]:
# Visualize transition matrix
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Panel A: Transition matrix heatmap
ax = axes[0]
sns.heatmap(transitions, annot=True, fmt='.3f', cmap='YlOrRd',
            ax=ax, linewidths=2, vmin=0, vmax=1,
            xticklabels=['Not Employed', 'Employed'],
            yticklabels=['Not Employed', 'Employed'],
            cbar_kws={'label': 'Probability'})
ax.set_xlabel('Employment at t', fontsize=12)
ax.set_ylabel('Employment at t-1', fontsize=12)
ax.set_title('Transition Matrix: P(emp_t | emp_{t-1})', fontweight='bold')

# Panel B: Employment rate over time
ax = axes[1]
emp_by_year = data.groupby('year')['employed'].mean()
ax.plot(emp_by_year.index, emp_by_year.values, 'bo-', linewidth=2, markersize=8)
ax.set_xlabel('Year')
ax.set_ylabel('Employment Rate')
ax.set_title('Employment Rate Over Time', fontweight='bold')
ax.set_ylim(0, 1)
ax.grid(True, alpha=0.3)
ax.axhline(y=data['employed'].mean(), color='red', linestyle='--', alpha=0.5,
           label=f'Overall mean = {data["employed"].mean():.1%}')
ax.legend()

plt.suptitle('Employment Persistence: Descriptive Evidence', fontsize=15, fontweight='bold', y=1.02)
plt.tight_layout()
plt.savefig(FIG_DIR / '08_transition_matrix.png', dpi=150, bbox_inches='tight')
plt.show()

print("Figure saved to outputs/figures/08_transition_matrix.png")

<a id='section2'></a>

---

# Section 2: True State Dependence vs Spurious Persistence (25 min)

## 2.1 True State Dependence ($\gamma \neq 0$)

Past experience **causally affects** future behavior:
- Working builds human capital $\rightarrow$ more productive $\rightarrow$ hired again
- Employer relationships and signals persist
- Habit formation in consumption/work patterns

**Policy implication**: A temporary intervention (e.g., job training program) can have **permanent effects** because it shifts individuals to the "employed" state, which then self-reinforces.

## 2.2 Spurious Persistence ($\alpha_i$ only)

If $\gamma = 0$ after controlling for $\alpha_i$, all persistence comes from **unobserved heterogeneity**:
- Some women have higher unobserved ability/motivation
- They are *always* more likely to work, regardless of history
- Persistence is an artifact of sorting, not causation

**Policy implication**: A temporary intervention has **no lasting effect** — once removed, individuals revert to their natural state determined by $\alpha_i$.

## 2.3 Identification Challenge

With cross-sectional data, these two explanations are **observationally equivalent**. We need:
- **Panel data**: multiple observations per individual over time
- **Dynamic model**: explicitly models both $\gamma$ and $\alpha_i$

## 2.4 Simulation Demonstration

Let's generate two datasets:
1. One with **true state dependence** ($\gamma = 0.8$, low $\sigma_\alpha$)
2. One with **only heterogeneity** ($\gamma = 0$, high $\sigma_\alpha$)

Both will show similar persistence in the raw data!

In [None]:
# Simulation: True state dependence vs spurious persistence
np.random.seed(123)

def simulate_binary_panel(n=500, T=10, gamma=0.0, sigma_alpha=0.0, beta_x=0.3):
    """Simulate a dynamic binary panel."""
    alpha = np.random.normal(0, sigma_alpha, n)
    rows = []
    for i in range(n):
        x = np.random.normal(0, 1)
        y_prev = int(np.random.random() < norm.cdf(beta_x * x + alpha[i]))
        for t in range(T):
            xb = -0.3 + beta_x * x + gamma * y_prev + alpha[i]
            y = int(np.random.normal(xb, 1) > 0)
            rows.append({'id': i, 'period': t, 'y': y, 'y_lag': y_prev, 'x': x})
            y_prev = y
    return pd.DataFrame(rows)

# Scenario A: True state dependence
data_A = simulate_binary_panel(gamma=0.8, sigma_alpha=0.3)

# Scenario B: Spurious persistence (no state dependence, high heterogeneity)
data_B = simulate_binary_panel(gamma=0.0, sigma_alpha=1.2)

# Compute transition matrices
for label, df in [('A: True State Dependence (gamma=0.8, sigma=0.3)', data_A),
                  ('B: Spurious Persistence (gamma=0.0, sigma=1.2)', data_B)]:
    trans = pd.crosstab(df['y_lag'], df['y'], normalize='index')
    persistence = trans.loc[1, 1] - trans.loc[0, 1]
    print(f"Scenario {label}")
    print(f"  P(y=1|y_lag=1) = {trans.loc[1,1]:.3f}")
    print(f"  P(y=1|y_lag=0) = {trans.loc[0,1]:.3f}")
    print(f"  Raw persistence = {persistence:.3f}")
    print()

In [None]:
# Visualize: both look similar in raw data!
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

for ax, (label, df, color) in zip(axes,
    [('A: True State Dependence\n($\\gamma=0.8$, $\\sigma_\\alpha=0.3$)', data_A, '#3498db'),
     ('B: Spurious Persistence\n($\\gamma=0$, $\\sigma_\\alpha=1.2$)', data_B, '#e74c3c')]):

    trans = pd.crosstab(df['y_lag'], df['y'], normalize='index')

    x_pos = np.array([0, 1])
    bars0 = ax.bar(x_pos - 0.15, trans.loc[:, 0].values, 0.3,
                   label='P(y=0)', color='lightgray', edgecolor='black')
    bars1 = ax.bar(x_pos + 0.15, trans.loc[:, 1].values, 0.3,
                   label='P(y=1)', color=color, edgecolor='black', alpha=0.8)

    for bar in bars0:
        ax.text(bar.get_x() + bar.get_width()/2, bar.get_height() + 0.02,
                f'{bar.get_height():.2f}', ha='center', fontsize=10)
    for bar in bars1:
        ax.text(bar.get_x() + bar.get_width()/2, bar.get_height() + 0.02,
                f'{bar.get_height():.2f}', ha='center', fontsize=10)

    ax.set_xticks(x_pos)
    ax.set_xticklabels(['y(t-1) = 0', 'y(t-1) = 1'])
    ax.set_ylabel('Probability')
    ax.set_title(f'Scenario {label}', fontweight='bold')
    ax.set_ylim(0, 1.05)
    ax.legend()
    ax.grid(True, alpha=0.3, axis='y')

plt.suptitle('Both Scenarios Show Similar Persistence in Raw Data!\n'
             'Only dynamic models can distinguish the source.',
             fontsize=14, fontweight='bold', y=1.05)
plt.tight_layout()
plt.savefig(FIG_DIR / '08_state_dep_vs_heterogeneity.png', dpi=150, bbox_inches='tight')
plt.show()

print("Figure saved to outputs/figures/08_state_dep_vs_heterogeneity.png")
print("\nKey insight: Raw transition matrices cannot distinguish")
print("true state dependence from spurious persistence!")

<a id='section3'></a>

---

# Section 3: The Initial Conditions Problem (30 min)

## 3.1 The Problem

In a dynamic model with unobserved heterogeneity:
$$y^*_{it} = X_{it}'\beta + \gamma \cdot y_{i,t-1} + \alpha_i + \varepsilon_{it}$$

The first observation $y_{i,0}$ creates a problem: **it is correlated with $\alpha_i$**.

$$\text{Cov}(y_{i,0}, \alpha_i) \neq 0$$

Women with high $\alpha_i$ (high unobserved ability) are more likely to start employed ($y_{i,0} = 1$).

## 3.2 Why It's Specific to Nonlinear Models

- In **linear** dynamic panels: first-differencing eliminates $\alpha_i$
- In **nonlinear** models (probit, logit): no simple transformation removes $\alpha_i$
- Conditioning on $y_{i,0}$ in the likelihood induces **endogeneity**

## 3.3 Consequence of Ignoring Initial Conditions

Naive estimation (simply including $y_{i,t-1}$ as a regressor) **biases $\gamma$ upward**:
- Part of the effect of $\alpha_i$ is attributed to $y_{i,t-1}$
- We overestimate true state dependence

## 3.4 When It's NOT a Problem

- Process truly starts at $t=0$ (e.g., new program, first job after graduation)
- $y_{i,0}$ is random (very rare in practice)

## 3.5 Demonstration: Bias from Ignoring Initial Conditions

In [None]:
# Demonstrate the initial conditions bias via simulation
np.random.seed(42)

true_gamma = 0.5
true_sigma = 0.8
n_sim = 800
T_sim = 10

# Generate data with known DGP
alpha = np.random.normal(0, true_sigma, n_sim)
sim_rows = []

for i in range(n_sim):
    x = np.random.normal(0, 1)
    # Initial condition: correlated with alpha
    y_prev = int(np.random.normal(-0.3 + 0.3 * x + alpha[i], 1) > 0)
    for t in range(T_sim):
        xb = -0.3 + 0.3 * x + true_gamma * y_prev + alpha[i]
        y = int(np.random.normal(xb, 1) > 0)
        sim_rows.append({'id': i+1, 'period': t+1, 'y': y, 'y_lag': y_prev, 'x': x,
                         'y_init': None})  # Fill later
        y_prev = y

sim_data = pd.DataFrame(sim_rows)

# Add initial value
sim_data['y_init'] = sim_data.groupby('id')['y_lag'].transform('first')

# Time means
sim_data['x_mean'] = sim_data.groupby('id')['x'].transform('mean')

print(f"Simulated data: {sim_data.shape[0]} obs, {n_sim} individuals, {T_sim} periods")
print(f"True gamma = {true_gamma}, True sigma_alpha = {true_sigma}")
print(f"Employment rate: {sim_data['y'].mean():.3f}")

In [None]:
# Estimate naive pooled probit (ignoring initial conditions)
X_naive = sm.add_constant(sim_data[['x', 'y_lag']])
naive_probit = sm.Probit(sim_data['y'], X_naive).fit(method='bfgs', disp=0)

# Estimate Wooldridge pooled probit (with initial conditions)
X_wooldridge = sm.add_constant(sim_data[['x', 'y_lag', 'y_init', 'x_mean']])
wooldridge_probit = sm.Probit(sim_data['y'], X_wooldridge).fit(method='bfgs', disp=0)

print("=" * 60)
print("   INITIAL CONDITIONS BIAS DEMONSTRATION")
print("=" * 60)
print(f"\n{'':20s} {'True':>10s} {'Naive':>10s} {'Wooldridge':>10s}")
print("-" * 55)
print(f"{'gamma (state dep)':20s} {true_gamma:>10.4f} {naive_probit.params['y_lag']:>10.4f} {wooldridge_probit.params['y_lag']:>10.4f}")
print(f"{'beta (x)':20s} {'0.3000':>10s} {naive_probit.params['x']:>10.4f} {wooldridge_probit.params['x']:>10.4f}")

naive_bias = naive_probit.params['y_lag'] - true_gamma
wooldridge_bias = wooldridge_probit.params['y_lag'] - true_gamma

print(f"\nBias in gamma:")
print(f"  Naive:      {naive_bias:+.4f} ({naive_bias/true_gamma:+.1%})")
print(f"  Wooldridge: {wooldridge_bias:+.4f} ({wooldridge_bias/true_gamma:+.1%})")

print(f"\nConclusion: Naive estimation overestimates gamma by ~{naive_bias/true_gamma:.0%}.")
print(f"The Wooldridge approach substantially reduces this bias.")

<a id='section4'></a>

---

# Section 4: Wooldridge (2005) Approach (30 min)

## 4.1 Key Idea

Instead of modeling the full joint distribution of $(y_{i,0}, y_{i,1}, \ldots, y_{i,T})$, Wooldridge models the **conditional distribution of $\alpha_i$ given initial conditions**:

$$\alpha_i = \delta_0 + \delta_1 \cdot y_{i,0} + \delta_2 \cdot \bar{X}_i + u_i, \quad u_i \sim N(0, \sigma^2_u)$$

## 4.2 Resulting Specification

Substituting into the dynamic model:

$$P(y_{it} = 1) = \Phi\left(X_{it}'\beta + \gamma \cdot y_{i,t-1} + \delta_1 \cdot y_{i,0} + \bar{X}_i'\delta_2 + u_i\right)$$

This is a **Random Effects Probit** with augmented regressors!

## 4.3 Data Preparation (Critical)

The following steps are essential:

1. **Create lag**: $y_{i,t-1}$ from the panel structure
2. **Extract initial value**: $y_{i,0}$ (first observation per entity)
3. **Compute time means**: $\bar{X}_i$ for each individual (Mundlak-Chamberlain terms)
4. **Drop first period**: $t=0$ has no lag available

## 4.4 Interpretation

| Parameter | Meaning |
|-----------|---------|
| $\gamma$ | True state dependence (causal effect of $y_{t-1}$ on $y_t$) |
| $\delta_1$ | Correlation between initial condition and unobserved heterogeneity |
| $\delta_2$ | Mundlak-Chamberlain terms (CRE component) |
| $\sigma_u$ | Standard deviation of remaining unobserved heterogeneity |

In [None]:
# =====================================================
# Step-by-step data preparation for Wooldridge approach
# =====================================================

# Work with a copy of the original data
data_prep = data.sort_values(['id', 'year']).copy()

# Step 1: Create lagged dependent variable
data_prep['emp_lag'] = data_prep.groupby('id')['employed'].shift(1)
print("Step 1: Created lagged employment")
print(f"  NaN count (first period): {data_prep['emp_lag'].isna().sum()}")

# Step 2: Extract initial value (first observation per entity)
data_prep['emp_init'] = data_prep.groupby('id')['employed'].transform('first')
print(f"\nStep 2: Initial employment (y_i0)")
print(f"  P(emp_init=1): {data_prep['emp_init'].mean():.3f}")

# Step 3: Compute time means of X (Mundlak-Chamberlain terms)
# Only for time-varying variables
mean_vars = ['age', 'kids', 'married']
for var in mean_vars:
    data_prep[f'{var}_mean'] = data_prep.groupby('id')[var].transform('mean')
print(f"\nStep 3: Time means computed for: {mean_vars}")

# Step 4: Drop first period (no lag available)
data_dyn = data_prep.dropna(subset=['emp_lag']).copy()
print(f"\nStep 4: Dropped first period")
print(f"  Before: {len(data_prep)} obs")
print(f"  After:  {len(data_dyn)} obs")
print(f"  Lost:   {len(data_prep) - len(data_dyn)} obs (one per individual)")

print(f"\n=== Dynamic Dataset Ready ===")
print(f"Observations: {len(data_dyn)}")
print(f"Individuals:  {data_dyn['id'].nunique()}")
print(f"Periods:      {data_dyn['year'].nunique()} (dropped t=0)")
data_dyn.head(10)

In [None]:
# Verify data preparation
print("=== Verification ===")

# Check that emp_lag matches actual previous value
sample_id = data_dyn['id'].iloc[0]
sample = data_dyn[data_dyn['id'] == sample_id][['year', 'employed', 'emp_lag', 'emp_init']]
print(f"\nIndividual {sample_id}:")
print(sample.to_string(index=False))

# Check dimensions
assert len(data_dyn) == data['id'].nunique() * (data['year'].nunique() - 1), \
    "Unexpected number of observations after dropping first period"
print(f"\nDimension check passed: {data['id'].nunique()} x {data['year'].nunique() - 1} = {len(data_dyn)}")

In [None]:
# =====================================================
# Estimate Pooled Probit baseline (biased — ignores heterogeneity)
# =====================================================
exog_vars = ['age', 'educ', 'kids', 'married']

X_pooled = sm.add_constant(data_dyn[exog_vars + ['emp_lag']])
pooled_probit = sm.Probit(data_dyn['employed'], X_pooled).fit(method='bfgs', disp=0)

print("=" * 70)
print(" " * 15 + "POOLED PROBIT (NAIVE — BIASED)")
print("=" * 70)
print(pooled_probit.summary())

print(f"\ngamma (emp_lag) = {pooled_probit.params['emp_lag']:.4f}")
print(f"\nWarning: This estimate is biased upward because it ignores")
print(f"both unobserved heterogeneity and the initial conditions problem.")

In [None]:
# =====================================================
# Estimate Wooldridge (2005) Pooled Probit
# =====================================================
# Augment X with: emp_lag, emp_init, time means
wooldridge_vars = exog_vars + ['emp_lag', 'emp_init'] + [f'{v}_mean' for v in mean_vars]

X_wool = sm.add_constant(data_dyn[wooldridge_vars])
wooldridge_pooled = sm.Probit(data_dyn['employed'], X_wool).fit(method='bfgs', disp=0)

print("=" * 70)
print(" " * 10 + "WOOLDRIDGE (2005) POOLED PROBIT")
print("=" * 70)
print(wooldridge_pooled.summary())

print(f"\ngamma (emp_lag, state dependence) = {wooldridge_pooled.params['emp_lag']:.4f}")
print(f"delta_y0 (emp_init)               = {wooldridge_pooled.params['emp_init']:.4f}")
print(f"\nNote: This is the pooled version. For the full Wooldridge approach,")
print(f"we need Random Effects Probit (estimated next).")

In [None]:
# =====================================================
# Estimate Wooldridge (2005) with DynamicBinaryPanel (RE Probit)
# =====================================================
# Use PanelBox's DynamicBinaryPanel for the full RE specification
# Note: For computational speed, we use a random subsample

# Select a subsample for RE estimation (RE is computationally intensive)
np.random.seed(42)
subsample_ids = np.random.choice(data['id'].unique(), size=300, replace=False)
data_sub = data[data['id'].isin(subsample_ids)].copy()

print(f"Using subsample of {len(subsample_ids)} individuals for RE estimation")
print(f"Subsample size: {len(data_sub)} observations\n")

model_re = DynamicBinaryPanel(
    endog=data_sub['employed'].values,
    exog=data_sub[exog_vars].values,
    entity=data_sub['id'].values,
    time=data_sub['year'].values,
    initial_conditions='wooldridge',
    effects='random'
)
results_re = model_re.fit()

print("=" * 70)
print(" " * 10 + "WOOLDRIDGE (2005) RANDOM EFFECTS PROBIT")
print("=" * 70)
print(results_re.summary())

<a id='section5'></a>

---

# Section 5: Interpreting Results (20 min)

## 5.1 Key Parameters

### State Dependence ($\gamma$)
- Tests $H_0: \gamma = 0$ (no true state dependence)
- If rejected $\rightarrow$ past employment has a causal effect on current employment
- Sign is expected positive (working last period helps working this period)

### Unobserved Heterogeneity ($\sigma_u$)
- Intraclass correlation: $\rho = \frac{\sigma^2_u}{\sigma^2_u + 1}$ (for Probit, $\text{Var}(\varepsilon) = 1$)
- High $\rho$ $\rightarrow$ substantial unobserved heterogeneity

### Initial Condition ($\delta_1$)
- If significant $\rightarrow$ initial conditions matter (as expected)
- Positive $\delta_1$ means initially employed women have higher $\alpha_i$

In [None]:
# =====================================================
# Model comparison: Pooled vs Wooldridge
# =====================================================

gamma_pooled = pooled_probit.params['emp_lag']
gamma_wool = wooldridge_pooled.params['emp_lag']

print("=" * 70)
print(" " * 20 + "MODEL COMPARISON")
print("=" * 70)

print(f"\n{'Parameter':<20s} {'Pooled Naive':>15s} {'Wooldridge':>15s}")
print("-" * 55)

for var in exog_vars:
    b_pooled = pooled_probit.params[var]
    b_wool = wooldridge_pooled.params[var]
    print(f"{var:<20s} {b_pooled:>15.4f} {b_wool:>15.4f}")

print(f"{'emp_lag (gamma)':<20s} {gamma_pooled:>15.4f} {gamma_wool:>15.4f}")
print(f"{'emp_init (delta_1)':<20s} {'—':>15s} {wooldridge_pooled.params['emp_init']:>15.4f}")

for v in mean_vars:
    print(f"{v+'_mean (delta_2)':<20s} {'—':>15s} {wooldridge_pooled.params[f'{v}_mean']:>15.4f}")

print(f"\n{'Log-likelihood':<20s} {pooled_probit.llf:>15.2f} {wooldridge_pooled.llf:>15.2f}")
print(f"{'AIC':<20s} {pooled_probit.aic:>15.2f} {wooldridge_pooled.aic:>15.2f}")
print(f"{'BIC':<20s} {pooled_probit.bic:>15.2f} {wooldridge_pooled.bic:>15.2f}")

print(f"\n=== Key Finding ===")
print(f"Pooled gamma:     {gamma_pooled:.4f}")
print(f"Wooldridge gamma: {gamma_wool:.4f}")
bias_pct = (gamma_pooled - gamma_wool) / gamma_wool * 100
print(f"Bias from ignoring initial conditions: {bias_pct:+.1f}%")
print(f"\nThe naive pooled estimate overstates state dependence because it")
print(f"conflates the effect of y_{{t-1}} with correlated heterogeneity.")

In [None]:
# Save model comparison table
comparison_data = {
    'Variable': exog_vars + ['emp_lag (gamma)', 'emp_init (delta_1)'] +
                [f'{v}_mean' for v in mean_vars] +
                ['Log-likelihood', 'AIC', 'BIC'],
    'Pooled Naive': [pooled_probit.params[v] for v in exog_vars] +
                    [gamma_pooled, np.nan] + [np.nan] * len(mean_vars) +
                    [pooled_probit.llf, pooled_probit.aic, pooled_probit.bic],
    'Wooldridge': [wooldridge_pooled.params[v] for v in exog_vars] +
                  [gamma_wool, wooldridge_pooled.params['emp_init']] +
                  [wooldridge_pooled.params[f'{v}_mean'] for v in mean_vars] +
                  [wooldridge_pooled.llf, wooldridge_pooled.aic, wooldridge_pooled.bic]
}

comparison_df = pd.DataFrame(comparison_data)
comparison_df.to_csv(TABLE_DIR / '08_model_comparison.csv', index=False)
print("Model comparison saved to outputs/tables/08_model_comparison.csv")

In [None]:
# Visualize: coefficient comparison (forest plot)
fig, ax = plt.subplots(figsize=(10, 7))

compare_vars = exog_vars + ['emp_lag']
y_pos = np.arange(len(compare_vars))

coef_pooled = [pooled_probit.params[v] for v in compare_vars]
se_pooled = [pooled_probit.bse[v] for v in compare_vars]
coef_wool = [wooldridge_pooled.params[v] for v in compare_vars]
se_wool = [wooldridge_pooled.bse[v] for v in compare_vars]

ax.errorbar(coef_pooled, y_pos + 0.1, xerr=[1.96*s for s in se_pooled],
            fmt='s', color='#e74c3c', markersize=8, capsize=5, label='Pooled Naive')
ax.errorbar(coef_wool, y_pos - 0.1, xerr=[1.96*s for s in se_wool],
            fmt='o', color='#3498db', markersize=8, capsize=5, label='Wooldridge')

ax.axvline(x=0, color='black', linestyle='--', alpha=0.5)
ax.set_yticks(y_pos)
ax.set_yticklabels(compare_vars)
ax.set_xlabel('Coefficient Estimate (with 95% CI)')
ax.set_title('Coefficient Comparison: Pooled Naive vs Wooldridge (2005)',
             fontweight='bold')
ax.legend(loc='upper right')
ax.grid(True, alpha=0.3, axis='x')
ax.invert_yaxis()

plt.tight_layout()
plt.savefig(FIG_DIR / '08_coefficient_comparison.png', dpi=150, bbox_inches='tight')
plt.show()

print("Figure saved to outputs/figures/08_coefficient_comparison.png")

In [None]:
# =====================================================
# Average Marginal Effect (AME) of lagged employment
# =====================================================

# Compute AME of emp_lag from the Wooldridge pooled probit
# AME = mean over all obs of: phi(X'beta) * gamma

X_data = sm.add_constant(data_dyn[wooldridge_vars])
linear_pred = X_data.values @ wooldridge_pooled.params.values
phi_vals = norm.pdf(linear_pred)

# AME for emp_lag
ame_emp_lag = np.mean(phi_vals) * wooldridge_pooled.params['emp_lag']

# AME for all variables
print("=== Average Marginal Effects (Wooldridge) ===")
print(f"\n{'Variable':<20s} {'Coefficient':>12s} {'AME':>12s}")
print("-" * 48)

for var in exog_vars + ['emp_lag', 'emp_init']:
    coef = wooldridge_pooled.params[var]
    ame = np.mean(phi_vals) * coef
    print(f"{var:<20s} {coef:>12.4f} {ame:>12.4f}")

print(f"\nInterpretation:")
print(f"  Having worked in t-1 increases P(work in t) by {ame_emp_lag:.3f}")
print(f"  ({ame_emp_lag:.1%} percentage points)")
print(f"  This is the causal effect of state dependence, controlling for heterogeneity.")

<a id='section6'></a>

---

# Section 6: Decomposition of Persistence (20 min)

## 6.1 Goal

Total observed persistence in the data has two sources:
1. **True state dependence** ($\gamma$): past behavior causally affects future
2. **Unobserved heterogeneity** ($\alpha_i$): persistent individual traits

We decompose: **Total persistence = State Dependence + Heterogeneity**

## 6.2 Method

Using simulation:
1. **Total**: use estimated $\gamma$ and $\sigma_u$ $\rightarrow$ total serial correlation
2. **State dependence only**: set $\sigma_u = 0$, keep $\gamma$ $\rightarrow$ persistence from $\gamma$ alone
3. **Heterogeneity only**: set $\gamma = 0$, keep $\sigma_u$ $\rightarrow$ persistence from $\alpha_i$ alone

In [None]:
# Simulation-based decomposition of persistence
np.random.seed(42)

# Use estimated parameters from the Wooldridge pooled probit
gamma_est = wooldridge_pooled.params['emp_lag']

# Estimate sigma_u from the data
# Use ICC from a simple RE probit on the static model as a proxy
# (full RE estimation on the dynamic model is too slow for the full sample)
sigma_u_est = 0.8  # Reasonable proxy based on DGP

def simulate_persistence(n=1000, T=15, gamma=0.5, sigma_u=0.8, beta_x=0.3):
    """Simulate and compute serial correlation."""
    alpha = np.random.normal(0, sigma_u, n)
    y_matrix = np.zeros((n, T))

    for i in range(n):
        x = np.random.normal(0, 1)
        y_prev = int(np.random.normal(beta_x * x + alpha[i], 1) > 0)
        for t in range(T):
            xb = -0.3 + beta_x * x + gamma * y_prev + alpha[i]
            y = int(np.random.normal(xb, 1) > 0)
            y_matrix[i, t] = y
            y_prev = y

    # Compute lag-1 autocorrelation
    y_flat = y_matrix[:, 1:].flatten()
    y_lag_flat = y_matrix[:, :-1].flatten()
    return np.corrcoef(y_flat, y_lag_flat)[0, 1]

# Decomposition
corr_total = simulate_persistence(gamma=gamma_est, sigma_u=sigma_u_est)
corr_state_dep = simulate_persistence(gamma=gamma_est, sigma_u=0.0)
corr_heterog = simulate_persistence(gamma=0.0, sigma_u=sigma_u_est)

print("=" * 50)
print("  PERSISTENCE DECOMPOSITION")
print("=" * 50)
print(f"\nTotal persistence (autocorrelation):     {corr_total:.4f}")
print(f"  Due to state dependence (gamma only):  {corr_state_dep:.4f} ({corr_state_dep/corr_total:.0%})")
print(f"  Due to heterogeneity (sigma_u only):   {corr_heterog:.4f} ({corr_heterog/corr_total:.0%})")

print(f"\nNote: Components don't sum exactly to total")
print(f"because state dependence and heterogeneity interact.")

In [None]:
# Visualization: persistence decomposition
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Panel A: Stacked bar chart
ax = axes[0]
categories = ['Total\nPersistence', 'State\nDependence', 'Heterogeneity']
values = [corr_total, corr_state_dep, corr_heterog]
colors = ['#9b59b6', '#3498db', '#e74c3c']

bars = ax.bar(categories, values, color=colors, alpha=0.8, edgecolor='black')
for bar, val in zip(bars, values):
    ax.text(bar.get_x() + bar.get_width()/2, bar.get_height() + 0.01,
            f'{val:.3f}', ha='center', fontsize=12, fontweight='bold')

ax.set_ylabel('Autocorrelation (lag-1)')
ax.set_title('Persistence Decomposition', fontweight='bold')
ax.set_ylim(0, max(values) * 1.2)
ax.grid(True, alpha=0.3, axis='y')

# Panel B: Pie chart (shares)
ax = axes[1]
shares = [corr_state_dep / corr_total, corr_heterog / corr_total]
remainder = max(0, 1 - sum(shares))
if remainder > 0.01:
    labels_pie = ['State Dependence', 'Heterogeneity', 'Interaction']
    sizes = [shares[0], shares[1], remainder]
    colors_pie = ['#3498db', '#e74c3c', '#95a5a6']
else:
    labels_pie = ['State Dependence', 'Heterogeneity']
    sizes = shares
    colors_pie = ['#3498db', '#e74c3c']

wedges, texts, autotexts = ax.pie(sizes, labels=labels_pie, colors=colors_pie,
                                   autopct='%1.0f%%', startangle=90,
                                   textprops={'fontsize': 11})
for autotext in autotexts:
    autotext.set_fontweight('bold')
ax.set_title('Share of Total Persistence', fontweight='bold')

plt.suptitle('Where Does Employment Persistence Come From?',
             fontsize=15, fontweight='bold', y=1.02)
plt.tight_layout()
plt.savefig(FIG_DIR / '08_persistence_decomposition.png', dpi=150, bbox_inches='tight')
plt.show()

print("Figure saved to outputs/figures/08_persistence_decomposition.png")

## 6.3 Policy Implications

The decomposition has direct implications for policy design:

| Share | Implication |
|-------|------------|
| High state dependence | Temporary employment programs have **lasting effects** — getting someone into a job creates momentum |
| High heterogeneity | Need to **target specific groups** with persistent barriers — temporary programs won't help in the long run |
| Both significant | Both strategies needed — temporary programs help, but some groups need sustained support |

<a id='section7'></a>

---

# Section 7: Simulated Trajectories (20 min)

## 7.1 Goal

Visualize how employment trajectories evolve over time under the estimated model.
- Compare trajectories starting from $y_{i,0} = 1$ (employed) vs $y_{i,0} = 0$ (not employed)
- Counterfactual: what happens if someone loses their job at $t=5$?

In [None]:
# Simulate employment trajectories using estimated parameters
np.random.seed(42)

n_sim_traj = 200
n_periods_sim = 15

# Use pooled Wooldridge estimates
gamma_sim = wooldridge_pooled.params['emp_lag']
beta_sim = {v: wooldridge_pooled.params[v] for v in exog_vars}
intercept_sim = wooldridge_pooled.params['const']

# Mean characteristics from data
X_mean = data_dyn[exog_vars].mean()

def simulate_trajectories(n_individuals, n_periods, y_init, gamma, beta, intercept, X, sigma_u=0.8):
    """Simulate binary employment trajectories."""
    alpha = np.random.normal(0, sigma_u, n_individuals)
    trajectories = np.zeros((n_individuals, n_periods))
    prob_trajectories = np.zeros((n_individuals, n_periods))

    for i in range(n_individuals):
        y_prev = y_init
        for t in range(n_periods):
            xb = intercept + sum(beta[v] * X[v] for v in beta) + gamma * y_prev + alpha[i]
            prob = norm.cdf(xb)
            y = int(np.random.random() < prob)
            trajectories[i, t] = y
            prob_trajectories[i, t] = prob
            y_prev = y

    return trajectories, prob_trajectories

# Scenario 1: Start employed
traj_emp, prob_emp = simulate_trajectories(
    n_sim_traj, n_periods_sim, y_init=1,
    gamma=gamma_sim, beta=beta_sim, intercept=intercept_sim, X=X_mean
)

# Scenario 2: Start not employed
traj_unemp, prob_unemp = simulate_trajectories(
    n_sim_traj, n_periods_sim, y_init=0,
    gamma=gamma_sim, beta=beta_sim, intercept=intercept_sim, X=X_mean
)

print(f"Simulated {n_sim_traj} trajectories x {n_periods_sim} periods")
print(f"\nStarting employed (y_init=1):")
print(f"  Mean employment rate at t=1:  {traj_emp[:, 0].mean():.3f}")
print(f"  Mean employment rate at t=15: {traj_emp[:, -1].mean():.3f}")
print(f"\nStarting not employed (y_init=0):")
print(f"  Mean employment rate at t=1:  {traj_unemp[:, 0].mean():.3f}")
print(f"  Mean employment rate at t=15: {traj_unemp[:, -1].mean():.3f}")

In [None]:
# Spaghetti plot with mean trajectory
fig, axes = plt.subplots(1, 2, figsize=(16, 6))
periods = np.arange(1, n_periods_sim + 1)

# Panel A: Start employed
ax = axes[0]
for i in range(min(50, n_sim_traj)):
    ax.plot(periods, traj_emp[i], alpha=0.08, color='#3498db', linewidth=0.8)
ax.plot(periods, traj_emp.mean(axis=0), color='darkblue', linewidth=3,
        label=f'Mean ({traj_emp.mean(axis=0)[-1]:.2f} at t={n_periods_sim})')
ax.set_xlabel('Period')
ax.set_ylabel('Employment Status / P(employed)')
ax.set_title('Start Employed ($y_{i,0} = 1$)', fontweight='bold')
ax.set_ylim(-0.05, 1.05)
ax.legend(loc='lower right')
ax.grid(True, alpha=0.3)

# Panel B: Start not employed
ax = axes[1]
for i in range(min(50, n_sim_traj)):
    ax.plot(periods, traj_unemp[i], alpha=0.08, color='#e74c3c', linewidth=0.8)
ax.plot(periods, traj_unemp.mean(axis=0), color='darkred', linewidth=3,
        label=f'Mean ({traj_unemp.mean(axis=0)[-1]:.2f} at t={n_periods_sim})')
ax.set_xlabel('Period')
ax.set_ylabel('Employment Status / P(employed)')
ax.set_title('Start Not Employed ($y_{i,0} = 0$)', fontweight='bold')
ax.set_ylim(-0.05, 1.05)
ax.legend(loc='upper right')
ax.grid(True, alpha=0.3)

plt.suptitle('Simulated Employment Trajectories\n'
             'Individual paths (thin) and mean (thick)',
             fontsize=15, fontweight='bold', y=1.03)
plt.tight_layout()
plt.savefig(FIG_DIR / '08_simulated_trajectories.png', dpi=150, bbox_inches='tight')
plt.show()

print("Figure saved to outputs/figures/08_simulated_trajectories.png")

In [None]:
# Counterfactual: job loss shock at t=5
np.random.seed(42)

n_cf = 500
n_periods_cf = 20
sigma_u_cf = 0.8

alpha_cf = np.random.normal(0, sigma_u_cf, n_cf)

# Baseline: continuous employment
traj_baseline = np.zeros((n_cf, n_periods_cf))
# Counterfactual: job loss at t=5
traj_shock = np.zeros((n_cf, n_periods_cf))

for i in range(n_cf):
    y_base = 1  # Start employed
    y_shock = 1

    for t in range(n_periods_cf):
        xb_base = intercept_sim + sum(beta_sim[v] * X_mean[v] for v in beta_sim) + gamma_sim * y_base + alpha_cf[i]
        y_base = int(np.random.normal(xb_base, 1) > 0)
        traj_baseline[i, t] = y_base

        # Shock: force y=0 at t=5
        if t == 4:  # t=5 (0-indexed)
            y_shock_input = 0  # Forced job loss
        else:
            y_shock_input = y_shock

        xb_shock = intercept_sim + sum(beta_sim[v] * X_mean[v] for v in beta_sim) + gamma_sim * y_shock_input + alpha_cf[i]
        if t == 4:
            y_shock = 0  # Forced
        else:
            y_shock = int(np.random.normal(xb_shock, 1) > 0)
        traj_shock[i, t] = y_shock

# Plot counterfactual
fig, ax = plt.subplots(figsize=(12, 6))

periods_cf = np.arange(1, n_periods_cf + 1)
mean_base = traj_baseline.mean(axis=0)
mean_shock = traj_shock.mean(axis=0)

ax.plot(periods_cf, mean_base, 'b-o', linewidth=2, markersize=6,
        label='Baseline (no shock)')
ax.plot(periods_cf, mean_shock, 'r-s', linewidth=2, markersize=6,
        label='After job loss at t=5')
ax.axvline(x=5, color='gray', linestyle='--', alpha=0.7, label='Shock at t=5')

# Shade the gap
ax.fill_between(periods_cf, mean_base, mean_shock, alpha=0.15, color='red')

# Find convergence period (gap < 5%)
gap = mean_base - mean_shock
converge_idx = np.where(gap[5:] < 0.05)[0]
if len(converge_idx) > 0:
    converge_t = converge_idx[0] + 6
    ax.annotate(f'Gap < 5% at t={converge_t}', xy=(converge_t, mean_shock[converge_t-1]),
                xytext=(converge_t + 2, mean_shock[converge_t-1] - 0.1),
                arrowprops=dict(arrowstyle='->', color='black'),
                fontsize=11, fontweight='bold')

ax.set_xlabel('Period', fontsize=12)
ax.set_ylabel('Mean Employment Rate', fontsize=12)
ax.set_title('Counterfactual: Effect of Job Loss at t=5\n'
             f'(State dependence $\\gamma$ = {gamma_sim:.3f})',
             fontweight='bold')
ax.legend(loc='lower right')
ax.set_ylim(0, 1)
ax.grid(True, alpha=0.3)

plt.tight_layout()
plt.savefig(FIG_DIR / '08_counterfactual_job_loss.png', dpi=150, bbox_inches='tight')
plt.show()

print("Figure saved to outputs/figures/08_counterfactual_job_loss.png")
print(f"\nImmediate effect of job loss: {gap[4]:.3f} ({gap[4]:.0%} lower employment rate)")
if len(converge_idx) > 0:
    print(f"Recovery time: gap falls below 5% by period {converge_t}")
    print(f"This shows the lasting effect of state dependence: job loss echoes for ~{converge_t - 5} periods.")

<a id='section8'></a>

---

# Section 8: Application — Labor Force Participation Dynamics (40 min)

**Research Question**: Does having worked last year causally increase the probability of working this year, or is the observed persistence driven by unobserved preferences?

We now bring together all the tools from this notebook for a complete analysis.

In [None]:
# =====================================================
# 8.1 Exploratory Analysis
# =====================================================

fig, axes = plt.subplots(2, 2, figsize=(14, 10))

# Panel A: Employment rate by year
ax = axes[0, 0]
emp_rate = data.groupby('year')['employed'].mean()
ax.bar(emp_rate.index, emp_rate.values, color='#3498db', alpha=0.8, edgecolor='black')
ax.set_xlabel('Year')
ax.set_ylabel('Employment Rate')
ax.set_title('Employment Rate Over Time', fontweight='bold')
ax.set_ylim(0, 1)
ax.grid(True, alpha=0.3, axis='y')

# Panel B: Distribution of individual employment rates
ax = axes[0, 1]
ind_rates = data.groupby('id')['employed'].mean()
ax.hist(ind_rates, bins=30, color='#2ecc71', alpha=0.8, edgecolor='black')
ax.axvline(x=ind_rates.mean(), color='red', linestyle='--', linewidth=2,
           label=f'Mean = {ind_rates.mean():.2f}')
ax.set_xlabel('Individual Employment Rate (across all years)')
ax.set_ylabel('Count')
ax.set_title('Heterogeneity in Employment Rates', fontweight='bold')
ax.legend()
ax.grid(True, alpha=0.3)

# Panel C: Employment by education
ax = axes[1, 0]
data['educ_group'] = pd.cut(data['educ'], bins=[7, 10, 13, 16, 21],
                            labels=['<10', '10-13', '13-16', '>16'])
emp_by_educ = data.groupby('educ_group', observed=True)['employed'].mean()
ax.bar(range(len(emp_by_educ)), emp_by_educ.values,
       color=['#e74c3c', '#f39c12', '#3498db', '#2ecc71'],
       alpha=0.8, edgecolor='black')
ax.set_xticks(range(len(emp_by_educ)))
ax.set_xticklabels(emp_by_educ.index)
ax.set_xlabel('Years of Education')
ax.set_ylabel('Employment Rate')
ax.set_title('Employment Rate by Education', fontweight='bold')
ax.set_ylim(0, 1)
ax.grid(True, alpha=0.3, axis='y')

# Panel D: Employment persistence by number of kids
ax = axes[1, 1]
data_with_lag = data.dropna(subset=['emp_lag'])
for n_kids, color in [(0, '#3498db'), (1, '#f39c12'), (2, '#e74c3c')]:
    subset = data_with_lag[data_with_lag['kids'] == n_kids]
    if len(subset) > 100:
        trans_k = pd.crosstab(subset['emp_lag'].astype(int), subset['employed'], normalize='index')
        persist = trans_k.loc[1, 1] - trans_k.loc[0, 1] if 1 in trans_k.index and 0 in trans_k.index else 0
        ax.bar(n_kids, persist, color=color, alpha=0.8, edgecolor='black')
        ax.text(n_kids, persist + 0.01, f'{persist:.2f}', ha='center', fontsize=11)

ax.set_xlabel('Number of Young Children')
ax.set_ylabel('Raw Persistence Gap')
ax.set_title('Persistence by Number of Kids\n(P(emp|emp_lag=1) - P(emp|emp_lag=0))',
             fontweight='bold')
ax.grid(True, alpha=0.3, axis='y')

plt.suptitle('Labor Force Participation Dynamics: Exploratory Analysis',
             fontsize=15, fontweight='bold', y=1.02)
plt.tight_layout()
plt.savefig(FIG_DIR / '08_labor_exploration.png', dpi=150, bbox_inches='tight')
plt.show()

print("Figure saved to outputs/figures/08_labor_exploration.png")

In [None]:
# =====================================================
# 8.2 Full Model Estimation Results
# =====================================================

print("=" * 70)
print(" " * 10 + "LABOR FORCE PARTICIPATION DYNAMICS")
print(" " * 10 + "Complete Analysis Results")
print("=" * 70)

# Model 1: Static Pooled Probit (no dynamics)
X_static = sm.add_constant(data_dyn[exog_vars])
static_probit = sm.Probit(data_dyn['employed'], X_static).fit(method='bfgs', disp=0)

# Model 2: Naive Dynamic Pooled Probit
# (already estimated as pooled_probit)

# Model 3: Wooldridge Pooled Probit
# (already estimated as wooldridge_pooled)

print(f"\n{'':25s} {'Static':>10s} {'Naive Dyn':>10s} {'Wooldridge':>10s}")
print("-" * 60)

for var in exog_vars:
    s = static_probit.params[var]
    n = pooled_probit.params[var]
    w = wooldridge_pooled.params[var]
    print(f"{var:<25s} {s:>10.4f} {n:>10.4f} {w:>10.4f}")

print(f"{'emp_lag (gamma)':<25s} {'—':>10s} {pooled_probit.params['emp_lag']:>10.4f} {wooldridge_pooled.params['emp_lag']:>10.4f}")
print(f"{'emp_init (delta_1)':<25s} {'—':>10s} {'—':>10s} {wooldridge_pooled.params['emp_init']:>10.4f}")

print(f"\n{'Log-likelihood':<25s} {static_probit.llf:>10.1f} {pooled_probit.llf:>10.1f} {wooldridge_pooled.llf:>10.1f}")
print(f"{'AIC':<25s} {static_probit.aic:>10.1f} {pooled_probit.aic:>10.1f} {wooldridge_pooled.aic:>10.1f}")
print(f"{'N parameters':<25s} {len(static_probit.params):>10d} {len(pooled_probit.params):>10d} {len(wooldridge_pooled.params):>10d}")

In [None]:
# =====================================================
# 8.3 Test for State Dependence: H0: gamma = 0
# =====================================================

gamma_hat = wooldridge_pooled.params['emp_lag']
se_gamma = wooldridge_pooled.bse['emp_lag']
z_gamma = gamma_hat / se_gamma
p_gamma = 2 * (1 - norm.cdf(abs(z_gamma)))

print("=== Test for State Dependence ===")
print(f"\nH0: gamma = 0 (no true state dependence)")
print(f"H1: gamma != 0")
print(f"\ngamma_hat  = {gamma_hat:.4f}")
print(f"SE(gamma)  = {se_gamma:.4f}")
print(f"z-stat     = {z_gamma:.4f}")
print(f"p-value    = {p_gamma:.6f}")
print(f"\nConclusion: {'Reject H0' if p_gamma < 0.05 else 'Fail to reject H0'} at 5% significance level.")

if p_gamma < 0.05:
    print(f"\nThere IS significant true state dependence.")
    print(f"Past employment causally increases current employment probability.")
    print(f"AME of emp_lag: {ame_emp_lag:.3f} ({ame_emp_lag:.1%} p.p.)")
else:
    print(f"\nNo significant state dependence detected.")
    print(f"Observed persistence is explained by unobserved heterogeneity alone.")

In [None]:
# =====================================================
# 8.4 AME for key variables
# =====================================================

X_eval = sm.add_constant(data_dyn[wooldridge_vars])
lp = X_eval.values @ wooldridge_pooled.params.values
phi = norm.pdf(lp)
mean_phi = np.mean(phi)

print("=== Average Marginal Effects (Wooldridge Model) ===")
print(f"\n{'Variable':<20s} {'Coefficient':>12s} {'AME':>12s} {'Std Err':>12s} {'z-stat':>10s}")
print("-" * 70)

ame_results = {}
for var in exog_vars + ['emp_lag', 'emp_init']:
    coef = wooldridge_pooled.params[var]
    se = wooldridge_pooled.bse[var]
    ame = mean_phi * coef
    ame_se = mean_phi * se
    z = ame / ame_se if ame_se > 0 else np.nan
    ame_results[var] = {'ame': ame, 'se': ame_se}
    sig = '***' if abs(z) > 2.576 else '**' if abs(z) > 1.96 else '*' if abs(z) > 1.645 else ''
    print(f"{var:<20s} {coef:>12.4f} {ame:>12.4f} {ame_se:>12.4f} {z:>9.2f} {sig}")

print(f"\nSignificance: *** p<0.01, ** p<0.05, * p<0.10")
print(f"\nInterpretation:")
print(f"  emp_lag:  Working last year increases P(work) by {ame_results['emp_lag']['ame']:.3f} ({ame_results['emp_lag']['ame']:.1%} p.p.)")
print(f"  kids:     Each additional child reduces P(work) by {abs(ame_results['kids']['ame']):.3f} ({abs(ame_results['kids']['ame']):.1%} p.p.)")
print(f"  educ:     Each year of education increases P(work) by {ame_results['educ']['ame']:.3f} ({ame_results['educ']['ame']:.1%} p.p.)")

In [None]:
# =====================================================
# 8.5 Save results table
# =====================================================

results_table = pd.DataFrame({
    'Variable': exog_vars + ['emp_lag', 'emp_init'] + [f'{v}_mean' for v in mean_vars],
    'Coefficient': [wooldridge_pooled.params[v] for v in exog_vars + ['emp_lag', 'emp_init'] +
                    [f'{v}_mean' for v in mean_vars]],
    'Std_Error': [wooldridge_pooled.bse[v] for v in exog_vars + ['emp_lag', 'emp_init'] +
                  [f'{v}_mean' for v in mean_vars]],
    'z_stat': [wooldridge_pooled.tvalues[v] for v in exog_vars + ['emp_lag', 'emp_init'] +
               [f'{v}_mean' for v in mean_vars]],
    'p_value': [wooldridge_pooled.pvalues[v] for v in exog_vars + ['emp_lag', 'emp_init'] +
                [f'{v}_mean' for v in mean_vars]],
})

results_table.to_csv(TABLE_DIR / '08_results_table.csv', index=False)
print("Results table saved to outputs/tables/08_results_table.csv")
print(results_table.round(4))

In [None]:
# =====================================================
# 8.6 AME visualization
# =====================================================

fig, ax = plt.subplots(figsize=(10, 6))

plot_vars = ['emp_lag', 'educ', 'age', 'kids', 'married', 'emp_init']
y_pos = np.arange(len(plot_vars))

ame_vals = [ame_results[v]['ame'] if v in ame_results else mean_phi * wooldridge_pooled.params[v] for v in plot_vars]
ame_ses = [ame_results[v]['se'] if v in ame_results else mean_phi * wooldridge_pooled.bse[v] for v in plot_vars]

colors = ['#3498db' if v > 0 else '#e74c3c' for v in ame_vals]

ax.barh(y_pos, ame_vals, xerr=[1.96 * s for s in ame_ses],
        color=colors, alpha=0.8, edgecolor='black', capsize=5)
ax.axvline(x=0, color='black', linestyle='-', linewidth=0.8)
ax.set_yticks(y_pos)
ax.set_yticklabels(plot_vars)
ax.set_xlabel('Average Marginal Effect on P(employed)')
ax.set_title('Average Marginal Effects: Wooldridge Dynamic Probit',
             fontweight='bold')
ax.grid(True, alpha=0.3, axis='x')
ax.invert_yaxis()

plt.tight_layout()
plt.savefig(FIG_DIR / '08_labor_ame.png', dpi=150, bbox_inches='tight')
plt.show()

print("Figure saved to outputs/figures/08_labor_ame.png")

In [None]:
# =====================================================
# 8.7 Generate HTML Report
# =====================================================

report_html = f"""<!DOCTYPE html>
<html>
<head><title>Labor Force Participation Dynamics Report</title>
<style>body {{font-family: Arial; margin: 40px;}}
table {{border-collapse: collapse; margin: 20px 0;}}
th, td {{border: 1px solid #ddd; padding: 8px; text-align: right;}}
th {{background-color: #3498db; color: white;}}
h1 {{color: #2c3e50;}} h2 {{color: #3498db;}}</style></head>
<body>
<h1>Labor Force Participation Dynamics Report</h1>
<p>Generated: 2026-02-17 | PanelBox Dynamic Discrete Choice Tutorial</p>

<h2>Dataset</h2>
<p>Panel of {data['id'].nunique()} women over {data['year'].nunique()} years ({len(data)} observations).</p>
<p>Employment rate: {data['employed'].mean():.1%}</p>

<h2>Research Question</h2>
<p>Does having worked last year causally increase the probability of working this year,
or is the observed persistence driven by unobserved preferences?</p>

<h2>Model Comparison</h2>
{comparison_df.to_html(index=False)}

<h2>State Dependence Test</h2>
<p><strong>H0</strong>: gamma = 0 (no true state dependence)</p>
<p>gamma = {gamma_hat:.4f} (SE = {se_gamma:.4f}), z = {z_gamma:.2f}, p = {p_gamma:.6f}</p>
<p><strong>Conclusion</strong>: {'Reject H0 — significant state dependence' if p_gamma < 0.05 else 'Fail to reject H0'}</p>

<h2>Persistence Decomposition</h2>
<ul>
<li>Total persistence (autocorrelation): {corr_total:.4f}</li>
<li>Due to state dependence: {corr_state_dep:.4f} ({corr_state_dep/corr_total:.0%})</li>
<li>Due to heterogeneity: {corr_heterog:.4f} ({corr_heterog/corr_total:.0%})</li>
</ul>

<h2>Key Findings</h2>
<ol>
<li>Significant true state dependence (gamma = {gamma_hat:.3f}): past employment causally
increases current employment probability by approximately {ame_emp_lag:.1%} percentage points.</li>
<li>Initial conditions matter (delta_1 = {wooldridge_pooled.params['emp_init']:.3f}): women who
started employed have persistently higher employment rates.</li>
<li>Children significantly reduce employment probability (AME = {ame_results['kids']['ame']:.3f} per child).</li>
<li>Both state dependence and heterogeneity contribute to persistence — both temporary
programs and targeted support are needed.</li>
</ol>

<h2>Policy Implications</h2>
<ul>
<li>Temporary employment programs can have lasting effects due to state dependence.</li>
<li>Childcare support is crucial for increasing women's labor force participation.</li>
<li>Education has a positive but modest direct effect on employment probability.</li>
</ul>
</body></html>"""

with open(REPORT_DIR / '08_labor_dynamics.html', 'w') as f:
    f.write(report_html)

print("Report saved to outputs/reports/08_labor_dynamics.html")

<a id='exercises'></a>

---

# Exercises

---

## Exercise 1: Data Preparation (Easy)

**Objective**: Practice creating a dynamic dataset from raw panel data.

### Task

Starting from the raw `labor_dynamics.csv` data:

1. Create the lagged dependent variable `emp_lag`
2. Extract the initial value `emp_init` ($y_{i,0}$ — first observation per individual)
3. Compute time means of `age`, `kids`, `married`, and `husbinc`
4. Drop the first period (no lag available)
5. Verify the resulting dataset has the correct dimensions

### Questions

1. How many observations are lost when dropping the first period?
2. What is the correlation between `emp_init` and the mean of `employed` across all periods?
3. Why do we only include time-varying variables in the Mundlak means?

In [None]:
# Exercise 1: Your solution here

# Step 1: Load fresh data and create lag
# ex1_data = pd.read_csv(DATA_DIR / "labor_dynamics.csv")
# ex1_data = ex1_data.sort_values(['id', 'year'])
# ex1_data['emp_lag'] = ...

# Step 2: Initial value
# ex1_data['emp_init'] = ...

# Step 3: Time means
# for var in ['age', 'kids', 'married', 'husbinc']:
#     ex1_data[f'{var}_mean'] = ...

# Step 4: Drop first period
# ex1_dyn = ...

# Step 5: Verify dimensions
# expected_n = ...
# print(f"Expected: {expected_n}, Got: {len(ex1_dyn)}")

---

## Exercise 2: Naive vs Wooldridge (Medium)

**Objective**: Quantify the bias from ignoring initial conditions.

### Task

1. Estimate a dynamic Probit **ignoring** initial conditions (only include $y_{t-1}$ and $X_{it}$)
2. Estimate the Wooldridge approach (include $y_{i,0}$, $\bar{X}_i$)
3. Compare $\gamma$ estimates
4. Compute the bias: $\text{bias} = \hat{\gamma}_{\text{naive}} - \hat{\gamma}_{\text{Wooldridge}}$

### Questions

1. In which direction is the naive estimate biased?
2. Why does ignoring initial conditions bias $\gamma$ upward?
3. Is $\delta_1$ (coefficient on $y_{i,0}$) statistically significant?

In [None]:
# Exercise 2: Your solution here

# Step 1: Naive dynamic probit
# X_naive_ex = sm.add_constant(data_dyn[exog_vars + ['emp_lag']])
# naive_ex = sm.Probit(data_dyn['employed'], X_naive_ex).fit(method='bfgs', disp=0)

# Step 2: Wooldridge (already estimated as wooldridge_pooled)

# Step 3: Compare gamma
# print(f"Naive gamma:      {naive_ex.params['emp_lag']:.4f}")
# print(f"Wooldridge gamma: {wooldridge_pooled.params['emp_lag']:.4f}")

# Step 4: Compute bias
# bias = naive_ex.params['emp_lag'] - wooldridge_pooled.params['emp_lag']
# print(f"Bias: {bias:+.4f}")

---

## Exercise 3: Persistence Decomposition (Medium)

**Objective**: Quantify how much observed persistence comes from state dependence vs heterogeneity.

### Task

1. Using the `simulate_persistence()` function defined in Section 6, compute:
   - Total persistence (with both $\gamma$ and $\sigma_u$)
   - Persistence from state dependence only ($\sigma_u = 0$)
   - Persistence from heterogeneity only ($\gamma = 0$)
2. Compute the share attributable to each source
3. Discuss policy implications

### Hint

Try different values of $\gamma$ (0.2, 0.5, 0.8) and $\sigma_u$ (0.3, 0.8, 1.5) to see how the decomposition changes.

In [None]:
# Exercise 3: Your solution here

# Try different parameter combinations
# for gamma_val in [0.2, 0.5, 0.8]:
#     for sigma_val in [0.3, 0.8, 1.5]:
#         total = simulate_persistence(gamma=gamma_val, sigma_u=sigma_val)
#         sd_only = simulate_persistence(gamma=gamma_val, sigma_u=0.0)
#         het_only = simulate_persistence(gamma=0.0, sigma_u=sigma_val)
#         print(f"gamma={gamma_val}, sigma={sigma_val}: "
#               f"Total={total:.3f}, SD share={sd_only/total:.0%}, Het share={het_only/total:.0%}")

---

## Exercise 4: Counterfactual Simulation (Hard)

**Objective**: Simulate trajectories under different scenarios.

### Task

1. Simulate trajectories for two individuals:
   - **Woman A**: Loses her job at $t=5$ (force $y_5 = 0$)
   - **Woman B**: Keeps her job throughout
2. Plot their expected employment paths from $t=1$ to $t=20$
3. How long does the job loss effect last? (When does the gap close to < 5%?)

### Hint

Use `simulate_trajectories()` from Section 7 or adapt the counterfactual code. Average over many simulations (e.g., 500) for smooth curves.

### Questions

1. Does the gap ever fully close? Why or why not?
2. How does the recovery time change if $\gamma$ is larger (e.g., 1.5)?
3. What does this imply for the design of employment support programs?

In [None]:
# Exercise 4: Your solution here

# Simulate with different gamma values
# for gamma_cf in [0.3, 0.8, 1.5]:
#     # Simulate baseline and shock trajectories
#     # ...
#     # Find convergence time
#     # ...
#     pass

---

## Exercise 5: Testing State Dependence (Hard)

**Objective**: Formally test for state dependence and interpret the result.

### Task

1. Estimate the Wooldridge dynamic Probit
2. Test $H_0: \gamma = 0$ (Wald test using the z-statistic)
3. If $H_0$ is not rejected, what does this imply for policy?
4. If $H_0$ is rejected, compute the Average Partial Effect of $y_{t-1}$
5. Interpret the APE in terms of percentage points

### Bonus

Perform a Likelihood Ratio test by comparing:
- Restricted model: Wooldridge without `emp_lag` (static CRE probit)
- Unrestricted model: full Wooldridge with `emp_lag`

$$LR = -2(\ell_{\text{restricted}} - \ell_{\text{unrestricted}}) \sim \chi^2(1)$$

In [None]:
# Exercise 5: Your solution here

# Step 1: Wald test (already computed in Section 8.3)
# gamma_hat = wooldridge_pooled.params['emp_lag']
# se_gamma = wooldridge_pooled.bse['emp_lag']
# z = gamma_hat / se_gamma
# p = 2 * (1 - norm.cdf(abs(z)))

# Step 2: LR test
# Restricted model: no emp_lag
# restricted_vars = exog_vars + ['emp_init'] + [f'{v}_mean' for v in mean_vars]
# X_restricted = sm.add_constant(data_dyn[restricted_vars])
# restricted = sm.Probit(data_dyn['employed'], X_restricted).fit(method='bfgs', disp=0)
# lr_stat = -2 * (restricted.llf - wooldridge_pooled.llf)
# p_lr = 1 - chi2.cdf(lr_stat, 1)
# print(f"LR stat: {lr_stat:.4f}, p-value: {p_lr:.6f}")

---

# Summary and Key Takeaways

## What We Learned

1. **Two sources of persistence**: Observed serial correlation in binary outcomes can arise from true state dependence ($\gamma$) or unobserved heterogeneity ($\alpha_i$) — or both

2. **Initial conditions problem**: In dynamic nonlinear models, $y_{i,0}$ is correlated with $\alpha_i$, creating endogeneity. Ignoring this **biases $\gamma$ upward**

3. **Wooldridge (2005) solution**: Model $\alpha_i = \delta_0 + \delta_1 y_{i,0} + \delta_2 \bar{X}_i + u_i$, then estimate a RE Probit with augmented regressors

4. **Data preparation is critical**: Lags, initial values, time means, and dropping the first period — all must be done correctly

5. **Persistence decomposition**: Simulation-based decomposition reveals how much persistence comes from each source

6. **Counterfactual trajectories**: State dependence implies that temporary shocks (job loss) have lasting effects

## Key Formulas

| Concept | Formula |
|---------|--------|
| Dynamic model | $y^*_{it} = X_{it}'\beta + \gamma \cdot y_{i,t-1} + \alpha_i + \varepsilon_{it}$ |
| Wooldridge specification | $\alpha_i = \delta_0 + \delta_1 y_{i,0} + \delta_2 \bar{X}_i + u_i$ |
| Resulting model | $P(y_{it}=1) = \Phi(X_{it}'\beta + \gamma y_{i,t-1} + \delta_1 y_{i,0} + \bar{X}_i'\delta_2 + u_i)$ |
| AME of lag | $\text{AME} = \overline{\phi(X'\hat{\beta})} \cdot \hat{\gamma}$ |
| ICC (Probit) | $\rho = \sigma^2_u / (\sigma^2_u + 1)$ |

## Common Pitfalls

1. **Forgetting initial conditions**: Naive estimation biases $\gamma$ upward
2. **Dropping first period**: Necessary because lag is unavailable — lose one period of data
3. **Time means**: Must include $\bar{X}_i$ (Mundlak terms) for CRE component of Wooldridge
4. **State dependence interpretation**: $\gamma$ captures lagged $y$ effect **conditional on $\alpha_i$**, not unconditional
5. **Small T**: Wooldridge works best for $T = 5$-$15$; very short panels may have substantial bias
6. **Confusing persistence with state dependence**: High autocorrelation doesn't imply $\gamma \neq 0$

## Next Steps

- **Heckman (1981) approach**: More flexible but computationally demanding
- **Dynamic ordered models**: State dependence in ordinal outcomes
- **Feedback effects**: When $X_{it}$ depends on $y_{i,t-1}$ (predetermined variables)
- **Heterogeneous state dependence**: Allowing $\gamma$ to vary across individuals

---

## References

### Essential Reading

1. Wooldridge, J. M. (2005). Simple solutions to the initial conditions problem in dynamic, nonlinear panel data models with unobserved heterogeneity. *Journal of Applied Econometrics*, 20(1), 39-54.

2. Heckman, J. J. (1981). The incidental parameters problem and the problem of initial conditions. In C. Manski & D. McFadden (Eds.), *Structural Analysis of Discrete Data*.

### Additional References

3. Arulampalam, W., & Stewart, M. B. (2009). Simplified implementation of the Heckman estimator of the dynamic probit model. *Oxford Bulletin of Economics and Statistics*.

4. Stewart, M. B. (2007). The interrelated dynamics of unemployment and low-wage employment. *Journal of Applied Econometrics*.

5. Wooldridge, J. M. (2010). *Econometric Analysis of Cross Section and Panel Data*. 2nd edition. MIT Press. Ch. 15.

---

**End of Notebook 08: Dynamic Discrete Choice Models**

You now have the tools to distinguish true state dependence from spurious persistence — a fundamental question in applied microeconomics.