# Tutorial 02: Fixed Effects Logit Models

**Version**: 1.0  
**Date**: 2026-02-16  
**Estimated Duration**: 90 minutes  
**Difficulty Level**: Intermediate

---

## Learning Objectives

By the end of this notebook, you will be able to:

1. Understand the problem of unobserved heterogeneity in binary choice models
2. Recognize why the incidental parameters problem makes standard FE estimation infeasible
3. Explain Chamberlain's conditional MLE solution for Logit models
4. Understand why Fixed Effects Probit doesn't work
5. Estimate Fixed Effects Logit models using PanelBox
6. Identify which observations contribute to estimation (switchers)
7. Interpret within-variation effects
8. Compare Pooled and FE Logit results appropriately
9. Recognize limitations and when not to use FE Logit

---

## Prerequisites

### Required
- **Notebook 01**: Binary Choice Introduction (Pooled Logit/Probit)
- Understanding of fixed effects in linear models
- Panel data structure concepts

### Recommended
- Static panel models tutorials (PooledOLS, FixedEffectsOLS)
- Maximum Likelihood Estimation basics

---

In [None]:
# Setup
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from pathlib import Path
import warnings
warnings.filterwarnings('ignore')

# PanelBox imports
from panelbox.models.discrete.binary import PooledLogit, PooledProbit, FixedEffectsLogit

# Set plotting style
sns.set_style('whitegrid')
plt.rcParams['figure.figsize'] = (12, 6)
plt.rcParams['font.size'] = 11

# Set random seed for reproducibility
np.random.seed(42)

# Data directory
DATA_DIR = Path("..") / "data"

print("Setup complete!")
print(f"Data directory: {DATA_DIR}")

---

# Section 1: The Problem of Unobserved Heterogeneity

## Why Pooled Logit Can Be Biased

In **Notebook 01**, we learned about Pooled Logit models:

$$P(y_{it}=1|X_{it}) = \Lambda(X_{it}'\beta)$$

where $\Lambda(z) = \frac{e^z}{1+e^z}$ is the logistic CDF.

**Critical Assumption**: The model assumes that all individual-specific characteristics affecting $y_{it}$ are either:
1. Included in $X_{it}$, or
2. Independent of $X_{it}$

But what if there are **unobserved individual characteristics** $\alpha_i$ that:
- Affect the outcome $y_{it}$
- Correlate with the regressors $X_{it}$?

This leads to **omitted variable bias**.

### Example: Labor Force Participation

Consider modeling whether person $i$ participates in job training at time $t$:
- $y_{it}$ = 1 if person $i$ participates in training at time $t$
- $X_{it}$ = (age, prior wage, experience)
- $\alpha_i$ = unobserved work motivation, ability, career orientation

**Problem**: If high-ability individuals ($\alpha_i$) tend to have higher prior wages AND are more likely to seek training, then the coefficient on prior wage will be biased.

Let's demonstrate this with a simulation:

In [None]:
print("="*70)
print("SIMULATION: Omitted Variable Bias in Pooled Logit")
print("="*70)

# Simulate data with correlated alpha_i
np.random.seed(42)
n_individuals = 500
n_periods = 5

# Generate unobserved heterogeneity
alpha_i = np.random.normal(0, 1, n_individuals)

# Education correlates with alpha_i (ability)
education = 12 + 2*alpha_i + np.random.normal(0, 1, n_individuals)

# Generate panel data
data_list = []
for i in range(n_individuals):
    for t in range(n_periods):
        age = 30 + t
        # TRUE MODEL: y* = 0.5*educ + 2.0*alpha_i + noise
        y_star = 0.5*education[i] + 2.0*alpha_i[i] + np.random.logistic(0, 1)
        y = int(y_star > 0)
        data_list.append({
            'id': i,
            'year': t,
            'y': y,
            'educ': education[i],
            'age': age,
            'true_alpha': alpha_i[i]
        })

sim_data = pd.DataFrame(data_list)

# Estimate Pooled Logit (ignoring alpha_i)
pooled_model = PooledLogit("y ~ educ + age", sim_data, "id", "year")
pooled_results = pooled_model.fit()

print("\nData Generation:")
print(f"  True model: y* = 0.5*educ + 2.0*alpha_i + logistic_error")
print(f"  Correlation(educ, alpha_i) = {np.corrcoef(education, alpha_i)[0,1]:.3f}")
print(f"  → Education and ability are positively correlated!")

print("\nEstimation Results:")
print(f"  True β_educ: 0.500")
print(f"  Pooled estimate: {pooled_results.params['educ']:.4f}")
print(f"  Bias: {pooled_results.params['educ'] - 0.50:.4f}")
print(f"  Bias percentage: {100*(pooled_results.params['educ'] - 0.50)/0.50:.1f}%")

print("\n" + "="*70)
print("CONCLUSION: Pooled Logit is BIASED when α_i correlates with X_it")
print("="*70)

### Visualizing the Problem

In [None]:
# Visualize correlation between alpha_i and education
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Left: Scatter plot showing correlation
individual_data = sim_data.groupby('id').first()
axes[0].scatter(individual_data['true_alpha'], individual_data['educ'], 
                alpha=0.5, s=30)
axes[0].set_xlabel('Unobserved Ability (α_i)')
axes[0].set_ylabel('Education (years)')
axes[0].set_title(f'Correlation between α_i and Education\nr = {np.corrcoef(education, alpha_i)[0,1]:.3f}')
axes[0].grid(True, alpha=0.3)

# Right: Average outcome by alpha_i quartile
individual_data['alpha_quartile'] = pd.qcut(individual_data['true_alpha'], q=4, 
                                             labels=['Q1 (Low)', 'Q2', 'Q3', 'Q4 (High)'])
quartile_means = individual_data.groupby('alpha_quartile')[['y', 'educ']].mean()

x = np.arange(len(quartile_means))
width = 0.35
axes[1].bar(x - width/2, quartile_means['educ'] - quartile_means['educ'].mean(), 
            width, label='Education (demeaned)', alpha=0.7)
axes[1].bar(x + width/2, quartile_means['y'], width, label='P(y=1)', alpha=0.7)
axes[1].set_xlabel('Ability Quartile')
axes[1].set_ylabel('Value')
axes[1].set_title('Both Education and Outcome Increase with α_i')
axes[1].set_xticks(x)
axes[1].set_xticklabels(quartile_means.index)
axes[1].legend()
axes[1].grid(True, alpha=0.3, axis='y')

plt.tight_layout()
plt.show()

print("The plots show:")
print("  LEFT: Education increases with unobserved ability α_i")
print("  RIGHT: Both education AND the outcome increase with α_i")
print("  → This confounds the causal effect of education!")

### Discussion: Why Not Use Within Transformation?

In **linear panel models**, we solve this problem with the **within transformation**:

$$y_{it} - \bar{y}_i = (X_{it} - \bar{X}_i)'\beta + (\epsilon_{it} - \bar{\epsilon}_i)$$

This eliminates $\alpha_i$ because it doesn't vary over time.

**But this doesn't work for binary choice models!**

Why not?
- Logit/Probit are **nonlinear** models
- $E[y_{it} | X_{it}, \alpha_i] = \Lambda(X_{it}'\beta + \alpha_i)$
- Subtracting means doesn't eliminate $\alpha_i$ due to nonlinearity:
  - $E[y_{it}] - E[\bar{y}_i] \neq \Lambda(X_{it}'\beta) - \Lambda(\bar{X}_i'\beta)$

We need a different solution!

---

# Section 2: The Incidental Parameters Problem

## Naive Approach: Add Dummy Variables

A natural idea: just estimate $\alpha_i$ along with $\beta$!

$$P(y_{it}=1|X_{it}, \alpha_i) = \Lambda(X_{it}'\beta + \alpha_i)$$

Estimate with dummy variables:
- $N$ individual dummies (one for each person)
- $K$ slope coefficients $\beta$
- Total parameters: $N + K$

### The Neyman-Scott Problem (1948)

**Problem**: As $N \to \infty$ with fixed $T$, the number of parameters grows with the sample size!

This causes **inconsistency**:
- MLE of $\beta$ is biased
- Bias doesn't disappear as $N \to \infty$ (with fixed $T$)
- Bias is $O(1/T)$ - only disappears as $T \to \infty$

This is called the **incidental parameters problem**.

The $\alpha_i$ are "incidental" parameters:
- Not of direct interest
- But cause bias in estimating $\beta$ (the parameters we care about)

### Simulation: Incidental Parameters Bias

In [None]:
print("="*70)
print("SIMULATION: Incidental Parameters Bias for Different T")
print("="*70)

def estimate_logit_with_dummies(data, T):
    """
    Estimate Logit with individual dummies.
    This is the WRONG approach - we're doing it to show the bias!
    """
    import statsmodels.api as sm
    from statsmodels.discrete.discrete_model import Logit
    
    # Filter to first T periods
    data_subset = data[data['year'] < T].copy()
    
    # Create dummy variables for individuals
    data_with_dummies = pd.get_dummies(data_subset, columns=['id'], 
                                       prefix='id', drop_first=True)
    
    # Prepare X and y
    y = data_with_dummies['y'].values
    X_cols = ['educ', 'age'] + [col for col in data_with_dummies.columns 
                                 if col.startswith('id_')]
    X = data_with_dummies[X_cols].values
    
    # Estimate
    try:
        model = Logit(y, X)
        results = model.fit(disp=0, maxiter=100)
        beta_educ = results.params[0]  # First coefficient is education
        return beta_educ
    except:
        return np.nan

# Test for different T
T_values = [3, 5, 10, 20]
results_bias = []

print("\nEstimating models for different T values...")
for T in T_values:
    print(f"  T = {T}...", end=" ")
    beta_est = estimate_logit_with_dummies(sim_data, T)
    bias = beta_est - 0.50
    bias_pct = 100 * bias / 0.50
    results_bias.append({
        'T': T, 
        'β_estimate': beta_est, 
        'Bias': bias,
        'Bias_%': bias_pct
    })
    print(f"β = {beta_est:.4f}")

bias_df = pd.DataFrame(results_bias)
print("\nIncidental Parameters Bias by Panel Length:")
print(bias_df.to_string(index=False))

print("\n" + "="*70)
print("CONCLUSION: Bias decreases as T increases, but remains for fixed T")
print("This approach is INCONSISTENT for fixed T as N → ∞")
print("="*70)

In [None]:
# Visualize bias vs T
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Left: Bias vs T
axes[0].plot(bias_df['T'], bias_df['Bias'], 'o-', linewidth=2, markersize=8)
axes[0].axhline(y=0, color='red', linestyle='--', label='No bias')
axes[0].set_xlabel('Panel Length (T)')
axes[0].set_ylabel('Bias in β_educ')
axes[0].set_title('Incidental Parameters Bias Decreases with T')
axes[0].grid(True, alpha=0.3)
axes[0].legend()

# Right: Show O(1/T) relationship
axes[1].plot(1/bias_df['T'], np.abs(bias_df['Bias']), 'o-', 
             linewidth=2, markersize=8, label='|Bias|')
axes[1].set_xlabel('1/T')
axes[1].set_ylabel('|Bias|')
axes[1].set_title('Bias is O(1/T) - Linear in 1/T')
axes[1].grid(True, alpha=0.3)
axes[1].legend()

plt.tight_layout()
plt.show()

print("Key Insight: Cannot simply add dummy variables in nonlinear models!")

### Why This Matters

The incidental parameters problem means:
1. We cannot use the standard "fixed effects" approach from linear models
2. Simply adding individual dummies gives biased estimates
3. We need a fundamentally different approach

**The solution**: Conditional Maximum Likelihood (Section 4)

---

# Section 3: Why FE Probit Doesn't Work

## The Difference Between Logit and Probit

Both Logit and Probit are binary choice models, but they use different distributions:

| Model | CDF | Distribution |
|-------|-----|-------------|
| **Logit** | $\Lambda(z) = \frac{e^z}{1+e^z}$ | Logistic (Type I Extreme Value) |
| **Probit** | $\Phi(z) = \int_{-\infty}^{z} \phi(t)dt$ | Standard Normal |

This seemingly small difference has huge implications for fixed effects!

## Sufficient Statistics

The key to eliminating $\alpha_i$ is finding a **sufficient statistic**:

**Definition**: A statistic $S(y_i)$ is sufficient for $\alpha_i$ if the conditional distribution $P(y_i | S(y_i), X_i, \beta)$ does not depend on $\alpha_i$.

### For Logit:
- $S_i = \sum_{t=1}^{T} y_{it}$ is a sufficient statistic for $\alpha_i$
- This is a **special property of the logistic distribution**
- Allows conditional MLE to eliminate $\alpha_i$

### For Probit:
- $\sum_{t=1}^{T} y_{it}$ is **NOT** a sufficient statistic for $\alpha_i$
- No simple sufficient statistic exists
- Conditional likelihood still depends on $\alpha_i$

## Why This Matters

**Logit**: We can condition on $\sum_t y_{it}$ to eliminate $\alpha_i$ → FE Logit works!

**Probit**: Conditioning doesn't eliminate $\alpha_i$ → FE Probit has no simple solution!

In [None]:
print("="*70)
print("FE Logit vs FE Probit: Summary")
print("="*70)

comparison_data = {
    'Property': [
        'Distribution',
        'Sufficient Statistic for α_i',
        'Conditional Likelihood',
        'FE Estimator',
        'Consistency',
        'PanelBox Support'
    ],
    'Logit': [
        'Logistic',
        'Σ_t y_it',
        'Does not depend on α_i',
        'Chamberlain (1980) Conditional MLE',
        'Consistent for fixed T',
        'FixedEffectsLogit'
    ],
    'Probit': [
        'Normal',
        'None (simple)',
        'Still depends on α_i',
        'No simple solution',
        'Requires T → ∞',
        'Not available'
    ]
}

comparison_df = pd.DataFrame(comparison_data)
print("\n", comparison_df.to_string(index=False))

print("\n" + "="*70)
print("CONCLUSION: Fixed Effects Probit does NOT have a simple estimator!")
print("="*70)

## Alternatives for Probit

If you need a Probit-like model with unobserved heterogeneity, consider:

### 1. Correlated Random Effects (CRE)
- Model: $\alpha_i = \bar{X}_i'\lambda + u_i$
- Allows correlation between $\alpha_i$ and time-varying $X_{it}$ through $\bar{X}_i$
- **See Notebook 03** for details

### 2. Bias-Corrected FE Estimators
- Fernández-Val (2009), Hahn & Kuersteiner (2002)
- Correct the $O(1/T)$ bias
- Advanced techniques, not yet in PanelBox

### 3. Semiparametric Estimators
- Manski (1987 Maximum score estimator
- Honoré & Kyriazidou (2000) for dynamic models
- Computationally intensive
- Not in PanelBox

**For this notebook, we focus on FE Logit - the workhorse for applied work.**

---

# Section 4: Chamberlain's Conditional MLE Solution

## The Conditioning Insight (Chamberlain, 1980)

Instead of maximizing the marginal likelihood:
$$L(\beta, \alpha) = \prod_{i=1}^{N} P(y_{i1}, ..., y_{iT} | X_i, \alpha_i, \beta)$$

Chamberlain proposed maximizing the **conditional likelihood**:
$$L_c(\beta) = \prod_{i=1}^{N} P(y_{i1}, ..., y_{iT} | \sum_t y_{it}, X_i, \beta)$$

**Key insight**: By conditioning on the sufficient statistic $\sum_t y_{it}$, the individual effect $\alpha_i$ cancels out!

## Mathematical Intuition: T=2 Case

Consider an individual with $\sum_{t=1}^{2} y_{it} = 1$ (exactly one success in two periods).

Two possible sequences:
- Pattern A: $(y_1=0, y_2=1)$
- Pattern B: $(y_1=1, y_2=0)$

The conditional probability is:
$$P(y_1=0, y_2=1 | \sum_t y_t = 1, X, \alpha_i) = \frac{P(y_1=0, y_2=1)}{P(y_1=0, y_2=1) + P(y_1=1, y_2=0)}$$

### The Magic: $\alpha_i$ Cancels!

After some algebra (see Wooldridge Ch. 15.8.3):

$$P(0,1 | sum=1) = \frac{\exp(X_2'\beta)}{\exp(X_1'\beta) + \exp(X_2'\beta)}$$

Notice: **No $\alpha_i$ appears!** It has been eliminated by the conditioning.

This allows us to estimate $\beta$ consistently without ever estimating $\alpha_i$.

In [None]:
print("="*70)
print("Chamberlain's Conditional MLE (Intuition for T=2)")
print("="*70)

print("\nSetup:")
print("  - Individual i has T=2 periods")
print("  - Outcome sum: Σ_t y_it = 1 (one success)")
print("  - Two possible patterns:")
print("      Pattern A: (y₁=0, y₂=1)")
print("      Pattern B: (y₁=1, y₂=0)")

print("\nStandard (Marginal) Likelihood:")
print("  P(y₁, y₂ | X, α_i, β) depends on both β AND α_i")
print("  → Cannot estimate β without estimating α_i")
print("  → Incidental parameters problem!")

print("\nConditional Likelihood:")
print("  P(y₁=0, y₂=1 | Σy=1, X, β) = exp(X₂'β) / [exp(X₁'β) + exp(X₂'β)]")
print("  → Depends ONLY on β, not α_i!")
print("  → α_i has been eliminated by conditioning")

print("\nKey Property (Logistic Distribution):")
print("  Σ_t y_it is a SUFFICIENT STATISTIC for α_i")
print("  → This is special to the logistic distribution")
print("  → Does not work for Probit (normal distribution)")

print("\n" + "="*70)
print("CONCLUSION: Conditional MLE gives consistent β without estimating α_i")
print("="*70)

## Generalization to Arbitrary T

The same principle extends to any $T$:
1. Condition on $s_i = \sum_{t=1}^{T} y_{it}$
2. Consider all possible sequences with that sum
3. Calculate conditional probabilities
4. $\alpha_i$ cancels in all cases!

The conditional MLE maximizes:
$$L_c(\beta) = \prod_{i: 0 < s_i < T} P(y_{i1}, ..., y_{iT} | s_i, X_i, \beta)$$

**Note**: Only individuals with $0 < s_i < T$ contribute! (More on this in Section 5)

---

# Section 5: Identification - Who Contributes to Estimation?

## Switchers vs Non-Switchers

Not all observations contribute to FE Logit estimation!

Individuals fall into three categories:

1. **Switchers**: $0 < \sum_t y_{it} < T_i$ (some 0s, some 1s)
   - Have temporal variation in $y_{it}$
   - **Contribute to estimation**

2. **Always 0**: $\sum_t y_{it} = 0$ (never $y=1$)
   - No variation in outcome
   - **Dropped from estimation**

3. **Always 1**: $\sum_t y_{it} = T_i$ (always $y=1$)
   - No variation in outcome
   - **Dropped from estimation**

## Why Non-Switchers Are Dropped

**Intuition**: Without temporal variation in $y_{it}$, we cannot separate:
- Individual effect $\alpha_i$ (time-invariant)
- Effect of changing $X_{it}$ (what $\beta$ measures)

FE models identify effects from **within-individual variation**. If $y_{it}$ never changes, there's nothing to identify!

## Practical Implications

Sample loss can be substantial if:
- $T$ is small (fewer opportunities to switch)
- $y_{it}$ is rare (few 1s) or very common (few 0s)
- Outcome is very persistent

### Analysis: Job Training Data

In [None]:
# Load job training data
data = pd.read_csv(DATA_DIR / "job_training.csv")

print("="*70)
print("Job Training Dataset")
print("="*70)
print(f"\nTotal observations: {len(data)}")
print(f"Individuals: {data['id'].nunique()}")
print(f"Time periods: {data['year'].nunique()}")
print(f"Years: {data['year'].min()} - {data['year'].max()}")

print("\nVariables:")
print("  training: 1 if participated in training, 0 otherwise")
print("  employed: 1 if employed, 0 otherwise")
print("  age: age in years")
print("  prior_wage: hourly wage in previous job")
print("  experience: years of work experience")
print("  education: years of education (time-invariant)")

print("\n" + data.head(10).to_string())

In [None]:
# Calculate sum of training by individual
switcher_analysis = data.groupby('id')['training'].agg([
    ('sum_y', 'sum'),
    ('n_periods', 'count')
]).reset_index()

# Classify individuals
switcher_analysis['type'] = 'Switcher'
switcher_analysis.loc[switcher_analysis['sum_y'] == 0, 'type'] = 'Always 0'
switcher_analysis.loc[switcher_analysis['sum_y'] == switcher_analysis['n_periods'], 'type'] = 'Always 1'

# Summary statistics
print("="*70)
print("Switcher Analysis: Training Participation")
print("="*70)

type_counts = switcher_analysis['type'].value_counts()
print("\nDistribution of Individuals:")
for idx in ['Switcher', 'Always 0', 'Always 1']:
    if idx in type_counts.index:
        count = type_counts[idx]
        pct = 100 * count / len(switcher_analysis)
        status = "✓ Used" if idx == 'Switcher' else "✗ Dropped"
        print(f"  {idx:12s}: {count:4d} ({pct:5.1f}%)  [{status}]")

n_switchers = (switcher_analysis['type'] == 'Switcher').sum()
n_total = len(switcher_analysis)
utilization = 100 * n_switchers / n_total

print(f"\nUtilization Rate: {utilization:.1f}% of individuals used")
print(f"Sample Loss: {100 - utilization:.1f}% of individuals dropped")

if utilization < 30:
    print("\n⚠️  WARNING: Very low utilization! Consider alternatives to FE Logit.")
elif utilization < 50:
    print("\n⚠️  CAUTION: Moderate utilization. Check robustness.")
else:
    print("\n✓ Good utilization rate for FE Logit.")

In [None]:
# Visualize switcher distribution
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Left: Pie chart
type_counts = switcher_analysis['type'].value_counts()
colors = {'Switcher': '#2ecc71', 'Always 0': '#e74c3c', 'Always 1': '#e67e22'}
pie_colors = [colors[t] for t in type_counts.index]

axes[0].pie(type_counts, labels=type_counts.index, autopct='%1.1f%%', 
            startangle=90, colors=pie_colors, textprops={'fontsize': 12})
axes[0].set_title('Distribution of Individual Types', fontsize=14, fontweight='bold')

# Right: Histogram of sum_y
max_periods = switcher_analysis['n_periods'].max()
bins = np.arange(-0.5, max_periods + 1.5, 1)
axes[1].hist(switcher_analysis['sum_y'], bins=bins, edgecolor='black', 
             alpha=0.7, color='steelblue')
axes[1].axvline(x=0, color='red', linestyle='--', linewidth=2, 
                label='Dropped: sum = 0')
axes[1].axvline(x=max_periods, color='orange', linestyle='--', linewidth=2,
                label=f'Dropped: sum = {max_periods}')
axes[1].set_xlabel('Σ_t y_it (Number of periods in training)', fontsize=11)
axes[1].set_ylabel('Number of Individuals', fontsize=11)
axes[1].set_title('Distribution of Training Sums\n(Red/Orange lines show dropped individuals)', 
                  fontsize=14, fontweight='bold')
axes[1].legend(fontsize=10)
axes[1].grid(True, alpha=0.3, axis='y')

plt.tight_layout()
plt.show()

print("\nKey Insight:")
print("  Only individuals with 0 < Σ_t y_it < T contribute to FE Logit")
print("  These 'switchers' provide within-individual variation")

### Visualizing Individual Trajectories

In [None]:
# Plot individual trajectories for each type
fig, axes = plt.subplots(1, 3, figsize=(16, 4))

for idx, (cat, ax) in enumerate(zip(['Switcher', 'Always 0', 'Always 1'], axes)):
    # Get individuals of this type
    ids_in_cat = switcher_analysis[switcher_analysis['type'] == cat]['id'].values[:10]
    
    for id_val in ids_in_cat:
        ind_data = data[data['id'] == id_val].sort_values('year')
        ax.plot(ind_data['year'], ind_data['training'], alpha=0.6, marker='o', markersize=4)
    
    ax.set_title(f'{cat}\n({type_counts.get(cat, 0)} individuals)', 
                 fontsize=12, fontweight='bold')
    ax.set_xlabel('Year')
    ax.set_ylabel('Training (1=Yes, 0=No)')
    ax.set_ylim(-0.1, 1.1)
    ax.grid(True, alpha=0.3)
    
    if cat == 'Switcher':
        ax.set_facecolor('#e8f8f5')
        ax.text(0.5, 0.95, 'USED in FE Logit', transform=ax.transAxes,
                ha='center', va='top', fontweight='bold', color='green')
    else:
        ax.set_facecolor('#fadbd8')
        ax.text(0.5, 0.95, 'DROPPED', transform=ax.transAxes,
                ha='center', va='top', fontweight='bold', color='red')

plt.suptitle('Individual Training Trajectories by Type (First 10 in each category)', 
             fontsize=14, fontweight='bold', y=1.02)
plt.tight_layout()
plt.show()

## Key Takeaways

1. **Only switchers contribute**: FE Logit uses only individuals with $0 < \sum_t y_{it} < T$
2. **Sample loss**: Can be substantial, especially with:
   - Short panels (small $T$)
   - Rare or very common outcomes
   - Persistent outcomes
3. **Check utilization**: Always report what fraction of individuals are used
4. **Within variation**: FE identifies effects from changes over time within same individual

---

# Section 6: Within vs Between Variation

## What FE Logit Identifies

FE Logit uses **only within-individual variation**:
- How changes in $X_{it}$ affect changes in $y_{it}$ **for the same individual**
- Between-individual variation is absorbed by $\alpha_i$

### Decomposition of Variation

For any variable $X_{it}$:
$$X_{it} = \underbrace{\bar{X}_i}_{\text{between}} + \underbrace{(X_{it} - \bar{X}_i)}_{\text{within}}$$

- **Between variation**: Differences across individuals ($\bar{X}_i$)
- **Within variation**: Changes over time within individual ($X_{it} - \bar{X}_i$)

**Pooled Logit**: Uses both within and between variation

**FE Logit**: Uses only within variation

## Time-Invariant Variables

Variables that don't change over time have:
- **Zero within variation**: $X_{it} - \bar{X}_i = 0$ for all $t$
- **Only between variation**: Differences across individuals

These are **perfectly collinear** with $\alpha_i$ and:
- Cannot be identified in FE models
- Will be automatically dropped or cause an error

### Examples of Time-Invariant Variables
- Gender, race, ethnicity
- Place of birth
- Education (if completed before panel starts)
- Industry (if firms don't switch industries)

### Demonstration

In [None]:
print("="*70)
print("Attempting to Include Time-Invariant Variable in FE Logit")
print("="*70)

# Education is time-invariant in our data
print("\nChecking if 'education' varies over time:")
edu_variation = data.groupby('id')['education'].agg(['min', 'max', 'std'])
print(f"  Max within-individual std dev: {edu_variation['std'].max():.6f}")
print(f"  → Education does NOT vary over time (time-invariant)")

print("\nAttempting FE Logit with time-invariant variable...")
try:
    # This will fail or drop the variable
    model_bad = FixedEffectsLogit(
        "training ~ age + prior_wage + education",  # education is time-invariant!
        data, "id", "year"
    )
    results_bad = model_bad.fit()
    
    # Check if education was dropped
    if 'education' not in results_bad.params.index:
        print("  ✗ Education was automatically DROPPED from the model")
        print("  Reason: No within-individual variation")
    else:
        print("  Education coefficient:", results_bad.params['education'])
        
except Exception as e:
    print(f"  ✗ Error: {type(e).__name__}")
    print(f"  Message: {str(e)}")
    
print("\n" + "="*70)
print("CONCLUSION: Time-invariant variables cannot be included in FE models")
print("They are absorbed by the individual fixed effect α_i")
print("="*70)

In [None]:
# Correct specification: Only time-varying variables
print("\n" + "="*70)
print("Correct FE Logit Specification (Time-Varying Only)")
print("="*70)

model_good = FixedEffectsLogit(
    "employed ~ training + age + prior_wage",  # all time-varying
    data, "id", "year"
)
results_good = model_good.fit()

print("\nModel: employed ~ training + age + prior_wage")
print("\nCoefficients:")
for var in results_good.params.index:
    print(f"  {var:12s}: {results_good.params[var]:7.4f}  (SE: {results_good.std_errors[var]:.4f})")
    
print(f"\nNumber of individuals used: {model_good.n_used_entities}")
print("\n✓ Model estimated successfully with time-varying variables only")

## Interpretation: Within Effects

FE Logit coefficients measure **within effects**:

**Example**: Coefficient on `training`
- **NOT**: Difference in employment between trained vs untrained people
- **YES**: Change in employment probability when **same person** starts training

This is a **causal interpretation** (under standard assumptions):
- Time-invariant confounders are controlled for
- Unobserved heterogeneity $\alpha_i$ is eliminated

### Comparison with Pooled Logit

| Model | Variation Used | Interpretation | Bias if $\alpha_i$ correlates with $X$ |
|-------|---------------|----------------|---------------------------------------|
| **Pooled** | Within + Between | Cross-sectional comparison | Biased |
| **FE** | Within only | Within-individual change | Unbiased |

**Trade-off**: FE removes bias but loses efficiency (larger standard errors)

---

# Section 7: Implementation in PanelBox

## Basic Usage

Estimating FE Logit in PanelBox is straightforward:

In [None]:
print("="*70)
print("Fixed Effects Logit in PanelBox")
print("="*70)

# Estimate FE Logit
fe_model = FixedEffectsLogit(
    formula="employed ~ training + age + prior_wage",
    data=data,
    entity_col="id",
    time_col="year"
)

fe_results = fe_model.fit()

# Display summary
print("\n", fe_results.summary())

In [None]:
# Diagnostic information
print("="*70)
print("Diagnostic Information")
print("="*70)

print(f"\nSample Information:")
print(f"  Total individuals in data: {data['id'].nunique()}")
print(f"  Individuals used (switchers): {fe_model.n_used_entities}")

n_dropped = data['id'].nunique() - fe_model.n_used_entities
utilization = 100 * fe_model.n_used_entities / data['id'].nunique()

print(f"  Individuals dropped: {n_dropped}")
print(f"  Utilization rate: {utilization:.1f}%")

# Dropped entities (first few)
if hasattr(fe_model, 'dropped_entities') and len(fe_model.dropped_entities) > 0:
    print(f"\nFirst 10 dropped entity IDs: {fe_model.dropped_entities[:10]}")

## Standard Error Options

PanelBox supports various standard error calculations:

In [None]:
print("="*70)
print("Standard Error Options")
print("="*70)

print("\nPanelBox supports various standard error types:")
print("  - 'cluster': Cluster-robust SE (clustered by entity) - RECOMMENDED")
print("  - 'robust': Heteroskedasticity-robust SE")
print("  - 'nonrobust': Classical SE (not recommended for panel data)")

print("\nOur FE model was estimated with cluster-robust SE:")
print("\nCoefficients and Standard Errors:")
se_table = pd.DataFrame({
    'Coefficient': fe_results.params,
    'SE (Cluster-Robust)': fe_results.std_errors,
    't-value': fe_results.tvalues,
    'p-value': fe_results.pvalues
})
print(se_table)

print("\nRecommendation: Use cluster-robust SE for panel data")
print("  - Accounts for within-individual correlation")
print("  - More conservative (larger SE)")
print("  - Reduces risk of over-rejection")

## Key Methods and Attributes

```python
# Model attributes
fe_model.n_entities          # Total number of entities
fe_model.n_used_entities     # Number of switchers (used)
fe_model.dropped_entities    # List of dropped entity IDs
fe_model.formula             # Model formula

# Results methods
fe_results.summary()         # Print summary table
fe_results.params            # Coefficient estimates
fe_results.std_errors        # Standard errors
fe_results.pvalues           # P-values
fe_results.conf_int()        # Confidence intervals
```

---

# Section 8: Comparing Pooled vs FE Logit

## Why Compare?

Comparing Pooled and FE estimates helps us understand:
1. **Magnitude of bias** from unobserved heterogeneity
2. **Whether FE is necessary** for this application
3. **Robustness** of findings

**Large differences** suggest important unobserved heterogeneity → FE is needed

**Small differences** suggest Pooled may be adequate (more efficient)

## Side-by-Side Comparison

In [None]:
print("="*70)
print("Pooled vs Fixed Effects Logit Comparison")
print("="*70)

# Estimate Pooled Logit
pooled_model = PooledLogit(
    "employed ~ training + age + prior_wage", 
    data, "id", "year"
)
pooled_results = pooled_model.fit(cov_type='cluster')

# FE Logit (already estimated)
fe_results = fe_model.fit(cov_type='cluster')

# Create comparison table
comparison = pd.DataFrame({
    'Pooled_β': pooled_results.params,
    'Pooled_SE': pooled_results.std_errors,
    'FE_β': fe_results.params,
    'FE_SE': fe_results.std_errors,
})
comparison['Difference'] = comparison['Pooled_β'] - comparison['FE_β']
comparison['Diff_%'] = 100 * comparison['Difference'] / comparison['Pooled_β'].abs()

print("\nCoefficient Comparison:")
print(comparison.to_string())

print("\n" + "="*70)
print("Interpretation:")
for var in comparison.index:
    diff_pct = comparison.loc[var, 'Diff_%']
    if abs(diff_pct) > 20:
        print(f"  {var}: LARGE difference ({diff_pct:.1f}%) → unobserved heterogeneity important!")
    else:
        print(f"  {var}: Small difference ({diff_pct:.1f}%) → Pooled may be adequate")

## Visual Comparison: Forest Plot

In [None]:
# Forest plot comparing Pooled vs FE estimates
fig, ax = plt.subplots(figsize=(12, 6))

variables = comparison.index
y_pos = np.arange(len(variables))

# Plot coefficients with 95% confidence intervals
offset = 0.15
ax.errorbar(comparison['Pooled_β'], y_pos - offset,
            xerr=1.96 * comparison['Pooled_SE'],
            fmt='o', label='Pooled Logit', capsize=5,
            color='#3498db', markersize=8, linewidth=2)
ax.errorbar(comparison['FE_β'], y_pos + offset,
            xerr=1.96 * comparison['FE_SE'],
            fmt='s', label='FE Logit', capsize=5,
            color='#e74c3c', markersize=8, linewidth=2)

ax.set_yticks(y_pos)
ax.set_yticklabels(variables, fontsize=11)
ax.set_xlabel('Coefficient Estimate', fontsize=12, fontweight='bold')
ax.set_title('Comparison: Pooled vs Fixed Effects Logit (95% CI)',
             fontsize=14, fontweight='bold')
ax.axvline(x=0, color='black', linestyle='--', linewidth=1, alpha=0.5)
ax.legend(fontsize=11, loc='best')
ax.grid(True, alpha=0.3, axis='x')

plt.tight_layout()
plt.show()

print("\nForest Plot Interpretation:")
print("  - Points show coefficient estimates")
print("  - Lines show 95% confidence intervals")
print("  - Non-overlapping intervals suggest significant difference")

## Informal Hausman-Type Test

The Hausman test formally tests whether Pooled and FE estimates differ systematically.

**Null hypothesis**: No unobserved heterogeneity correlated with X (Pooled is consistent)

**Note**: Formal Hausman test for nonlinear models is complex. Here we provide informal comparison.

In [None]:
print("="*70)
print("Informal Hausman-Type Comparison")
print("="*70)

print("\nCoefficient Differences:")
for var in comparison.index:
    pooled_coef = comparison.loc[var, 'Pooled_β']
    fe_coef = comparison.loc[var, 'FE_β']
    diff = comparison.loc[var, 'Difference']
    diff_pct = comparison.loc[var, 'Diff_%']
    
    # Approximate z-test (informal)
    se_diff = np.sqrt(comparison.loc[var, 'Pooled_SE']**2 + 
                      comparison.loc[var, 'FE_SE']**2)
    z_stat = diff / se_diff
    
    print(f"\n{var}:")
    print(f"  Difference: {diff:.4f} ({diff_pct:.1f}%)")
    print(f"  Informal z-stat: {z_stat:.2f}")
    
    if abs(z_stat) > 1.96:
        print(f"  → Statistically significant difference (|z| > 1.96)")
    else:
        print(f"  → Not statistically different")

print("\n" + "="*70)
print("Note: This is an INFORMAL test. Proper Hausman test for")
print("nonlinear models requires careful implementation.")
print("="*70)

## Decision Framework

**Use FE Logit if**:
- Coefficients differ substantially from Pooled (>20%)
- Strong theoretical reason to expect $\alpha_i$ correlates with $X_{it}$
- Sufficient switchers (>30% utilization)
- Causal interpretation is critical

**Consider Pooled Logit if**:
- Coefficients similar to FE
- Few switchers (low utilization)
- Need to estimate effects of time-invariant variables
- Efficiency more important than bias

**Best practice**: Report both and discuss differences!

---

# Section 9: Application - Technology Adoption by Firms

## Research Question

**Does firm size causally affect technology adoption, or is the correlation due to unobserved factors (e.g., management quality)?**

- $y_{it}$ = 1 if firm $i$ adopted new technology at time $t$
- $X_{it}$ = log(firm size), profit margin, firm age
- $\alpha_i$ = unobserved management quality, innovation culture

**Problem**: High-quality management may lead to both:
- Larger firm size
- Greater technology adoption

This confounds the causal effect of size.

**Solution**: FE Logit controls for time-invariant management quality.

## Data

In [None]:
# Load firm technology data
firm_data = pd.read_csv(DATA_DIR / "firm_technology.csv")

print("="*70)
print("Firm Technology Adoption Dataset")
print("="*70)

print(f"\nSample:")
print(f"  Total observations: {len(firm_data)}")
print(f"  Firms: {firm_data['firm_id'].nunique()}")
print(f"  Time periods: {firm_data['year'].nunique()}")
print(f"  Years: {firm_data['year'].min()} - {firm_data['year'].max()}")

print("\nVariables:")
print("  adopted: 1 if firm adopted new technology, 0 otherwise")
print("  log_size: log of firm size (number of employees)")
print("  profit_margin: profit as % of revenue")
print("  age: firm age in years")
print("  industry: industry code (time-invariant)")

print("\nDescriptive Statistics:")
print(firm_data[['adopted', 'log_size', 'profit_margin', 'age']].describe())

print("\nAdoption Rate:")
print(f"  Overall: {100*firm_data['adopted'].mean():.1f}%")
print("\nFirst 10 observations:")
print(firm_data.head(10))

## Switcher Analysis

In [None]:
# Analyze switchers for technology adoption
firm_switcher = firm_data.groupby('firm_id')['adopted'].agg([
    ('sum_y', 'sum'),
    ('n_periods', 'count')
]).reset_index()

firm_switcher['type'] = 'Switcher'
firm_switcher.loc[firm_switcher['sum_y'] == 0, 'type'] = 'Never Adopted'
firm_switcher.loc[firm_switcher['sum_y'] == firm_switcher['n_periods'], 'type'] = 'Always Adopted'

print("="*70)
print("Switcher Analysis: Technology Adoption")
print("="*70)

type_counts_firm = firm_switcher['type'].value_counts()
print("\nFirm Types:")
for idx in ['Switcher', 'Never Adopted', 'Always Adopted']:
    if idx in type_counts_firm.index:
        count = type_counts_firm[idx]
        pct = 100 * count / len(firm_switcher)
        status = "✓ Used" if idx == 'Switcher' else "✗ Dropped"
        print(f"  {idx:16s}: {count:4d} ({pct:5.1f}%)  [{status}]")

utilization_firm = 100 * type_counts_firm.get('Switcher', 0) / len(firm_switcher)
print(f"\nUtilization Rate: {utilization_firm:.1f}%")

## Pooled Logit Analysis

In [None]:
print("="*70)
print("Pooled Logit: Firm Technology Adoption")
print("="*70)

# Pooled Logit (can include industry)
pooled_tech = PooledLogit(
    "adopted ~ log_size + profit_margin + age",  # Removed C(industry) for simplicity
    firm_data, "firm_id", "year"
)
pooled_tech_results = pooled_tech.fit(cov_type='cluster')

print("\n", pooled_tech_results.summary())

# Odds ratio for log_size
or_size = np.exp(pooled_tech_results.params['log_size'])
print("\n" + "="*70)
print("Interpretation (Pooled):")
print("="*70)
print(f"\nlog_size coefficient: {pooled_tech_results.params['log_size']:.4f}")
print(f"Odds ratio: {or_size:.4f}")
print(f"\nInterpretation:")
print(f"  A 10% increase in firm size is associated with a")
pct_change = 100 * ((1.1 ** pooled_tech_results.params['log_size']) - 1)
print(f"  {pct_change:.2f}% change in the odds of technology adoption")
print(f"\n⚠️  But is this causal? Or driven by unobserved management quality?")

## Fixed Effects Logit Analysis

In [None]:
print("="*70)
print("Fixed Effects Logit: Firm Technology Adoption")
print("="*70)

# FE Logit (cannot include industry - time invariant)
fe_tech = FixedEffectsLogit(
    "adopted ~ log_size + profit_margin + age",
    firm_data, "firm_id", "year"
)
fe_tech_results = fe_tech.fit()

print("\n", fe_tech_results.summary())

print("\n" + "="*70)
print("Sample Information:")
print("="*70)
print(f"Firms in data: {firm_data['firm_id'].nunique()}")
print(f"Firms used (switchers): {fe_tech.n_used_entities}")
print(f"Firms dropped: {firm_data['firm_id'].nunique() - fe_tech.n_used_entities}")

## Comparison and Causal Interpretation

In [None]:
print("="*70)
print("Pooled vs FE: Technology Adoption")
print("="*70)

# Comparison table
tech_comparison = pd.DataFrame({
    'Pooled': pooled_tech_results.params,
    'Pooled_SE': pooled_tech_results.std_errors,
    'FE': fe_tech_results.params,
    'FE_SE': fe_tech_results.std_errors,
    'Difference': pooled_tech_results.params - fe_tech_results.params
})
tech_comparison['Diff_%'] = 100 * tech_comparison['Difference'] / tech_comparison['Pooled'].abs()

print("\n", tech_comparison.to_string())

print("\n" + "="*70)
print("Causal Interpretation")
print("="*70)

pooled_coef = pooled_tech_results.params['log_size']
fe_coef = fe_tech_results.params['log_size']
diff_pct = tech_comparison.loc['log_size', 'Diff_%']

print(f"\nEffect of Firm Size on Technology Adoption:")
print(f"  Pooled Logit: β = {pooled_coef:.4f}")
print(f"  FE Logit: β = {fe_coef:.4f}")
print(f"  Difference: {pooled_coef - fe_coef:.4f} ({diff_pct:.1f}%)")

if abs(diff_pct) > 20:
    print(f"\n✓ LARGE DIFFERENCE ({diff_pct:.1f}%)!")
    print(f"\nInterpretation:")
    print(f"  - Much of the pooled correlation is due to unobserved")
    print(f"    firm characteristics (e.g., management quality)")
    print(f"  - FE estimate gives the causal effect of size changes")
    print(f"    WITHIN the same firm (controlling for α_i)")
else:
    print(f"\n✓ Small difference ({diff_pct:.1f}%)")
    print(f"\nInterpretation:")
    print(f"  - Unobserved heterogeneity less important")
    print(f"  - Pooled and FE estimates are similar")

print(f"\n" + "="*70)
print(f"Economic Conclusion")
print("="*70)
print(f"\nWithin-firm causal effect (FE Logit):")
print(f"  As a firm grows by 10%, its probability of adopting")
print(f"  new technology changes by approximately:")
pct_effect_fe = 100 * ((1.1 ** fe_coef) - 1)
print(f"  {pct_effect_fe:.2f}% (in odds)")
print(f"\nThis is the causal effect, holding constant time-invariant")
print(f"firm characteristics like management quality.")

## Key Insights from Application

1. **Unobserved heterogeneity matters**: Large difference between Pooled and FE suggests $\alpha_i$ (management quality) is important

2. **Causal interpretation**: FE Logit identifies within-firm effect:
   - NOT: Large firms vs small firms (confounded by management)
   - YES: Same firm becoming larger over time

3. **Sample loss**: Some firms dropped due to no variation
   - Trade-off: lose observations but gain unbiased estimates

4. **Policy implications**: Results tell us whether firm **growth** (not just size) affects technology adoption

---

# Section 10: Limitations and When NOT to Use FE Logit

While FE Logit is a powerful tool, it has important limitations. Understanding when NOT to use it is as important as knowing when to use it.

## Limitation 1: Loss of Observations

### Problem
FE Logit drops all non-switchers, potentially losing many observations.

### When This Is Serious
- $T < 3$ (very short panels)
- Rare outcomes ($P(y=1)$ very small)
- Very persistent outcomes (little temporal variation)
- Can lose >50% of sample!

### Solution
- **Random Effects Logit**: Uses all observations (but assumes $\alpha_i \perp X_{it}$)
- **Correlated Random Effects**: Allows some correlation (see Notebook 03)
- **Pooled with controls**: Include rich set of observables

In [None]:
print("="*70)
print("Limitation 1: Sample Loss")
print("="*70)

# Calculate sample loss for our data
total_firms = firm_data['firm_id'].nunique()
used_firms = fe_tech.n_used_entities
lost_pct = 100 * (1 - used_firms / total_firms)

print(f"\nOur Data (Firm Technology):")
print(f"  Total firms: {total_firms}")
print(f"  Firms used in FE: {used_firms}")
print(f"  Sample loss: {lost_pct:.1f}%")

if lost_pct > 50:
    print(f"\n⚠️  WARNING: Losing more than half the sample!")
    print(f"     Consider alternatives:")
    print(f"       - Correlated Random Effects (Notebook 03)")
    print(f"       - Pooled Logit with rich controls")
elif lost_pct > 30:
    print(f"\n⚠️  CAUTION: Substantial sample loss")
    print(f"     Report as limitation and check robustness")
else:
    print(f"\n✓ Acceptable sample loss for FE Logit")

## Limitation 2: Time-Invariant Variables

### Problem
Cannot estimate effects of variables that don't vary over time.

### Examples That CANNOT Be Estimated
- Demographic: gender, race, ethnicity, place of birth
- Education (if completed before panel)
- Industry (if no switching)
- Geographic location (if no migration)

### Solution
If you need these effects:
- **Pooled Logit**: Can estimate them (but may be biased)
- **Correlated Random Effects**: Can estimate them while allowing some $\alpha_i$ correlation
- **Hybrid models**: Combine within and between variation

In [None]:
print("="*70)
print("Limitation 2: Time-Invariant Variables")
print("="*70)

print("\nVariables That CANNOT Be Estimated in FE Logit:")
print("  ✗ Gender")
print("  ✗ Race/Ethnicity")
print("  ✗ Place of birth")
print("  ✗ Education (if completed before panel)")
print("  ✗ Industry (if no switching)")
print("\nThese are absorbed by the individual fixed effect α_i")

print("\nAlternatives if You Need Time-Invariant Effects:")
print("  1. Correlated Random Effects (Notebook 03)")
print("     → Allows correlation through time-averages")
print("  2. Pooled Logit with controls")
print("     → Can estimate effects but may be biased")
print("  3. Hybrid models")
print("     → Combine within (FE) and between (RE) variation")

## Limitation 3: Remaining Bias for Small T

### Problem
FE Logit has bias of order $O(1/T)$:
- Smaller than Pooled or dummy variable bias
- But doesn't disappear for fixed $T$
- Only vanishes as $T \to \infty$

### Rule of Thumb
- $T \geq 8$: Bias usually negligible
- $5 \leq T < 8$: Bias may be small but check
- $T < 5$: Bias can be substantial

### Solution
- **Bias-corrected estimators**: Fernández-Val (2009), Hahn-Kuersteiner (2002)
  - Advanced, not yet in PanelBox
- **Jackknife**: Bias correction through resampling
- **Analytical correction**: Correct the $O(1/T)$ bias analytically

In [None]:
print("="*70)
print("Limitation 3: Finite-T Bias")
print("="*70)

T_in_data = data.groupby('id').size().mean()
print(f"\nAverage T in our data: {T_in_data:.1f}")

if T_in_data >= 8:
    print(f"✓ T ≥ 8: Bias likely negligible")
elif T_in_data >= 5:
    print(f"⚠️  5 ≤ T < 8: Some bias may remain")
    print(f"   Consider bias correction (advanced)")
else:
    print(f"⚠️  T < 5: Substantial bias possible")
    print(f"   FE Logit may not be appropriate")
    print(f"   Consider alternatives or bias correction")

print("\nNote: Bias is O(1/T), so:")
print(f"  Approximate bias proportion: ~{1/T_in_data:.3f}")
print(f"  This shrinks as panels get longer")

## Limitation 4: Computational Complexity

### Problem
Conditional likelihood computation grows with $T$:
- $T \leq 10$: Fast (enumeration of sequences)
- $10 < T \leq 20$: Moderate (dynamic programming)
- $T > 20$: Can be slow

### Solution
- **Approximations**: For very long panels
- **Subsampling**: If $T$ varies, focus on moderate-$T$ subsample
- **Patience**: Modern computers can handle $T \leq 30$ reasonably well

In [None]:
print("="*70)
print("Limitation 4: Computational Cost")
print("="*70)

print("\nComputational Complexity by T:")
print("  T ≤ 10:    Fast (seconds)")
print("  10 < T ≤ 20: Moderate (minutes for large N)")
print("  T > 20:    Slow (may require patience)")
print("  T > 30:    Very slow (consider approximations)")

print(f"\nYour data: T = {T_in_data:.1f}")
if T_in_data <= 10:
    print("  ✓ Should be fast")
elif T_in_data <= 20:
    print("  ✓ Moderate speed, acceptable")
else:
    print("  ⚠️  May be slow, be patient")

## Decision Framework: When to Use FE Logit?

Use this checklist to decide whether FE Logit is appropriate:

In [None]:
def recommend_estimator(T, pct_switchers, need_time_invariant, outcome_rare=False):
    """
    Decision framework for choosing binary choice estimator.
    
    Parameters:
    - T: Average panel length
    - pct_switchers: Percentage of individuals who are switchers (0-1)
    - need_time_invariant: Do you need to estimate time-invariant effects?
    - outcome_rare: Is the outcome very rare or very common?
    """
    reasons = []
    
    # Check conditions
    if T < 3:
        return "❌ FE Logit NOT recommended", ["T < 3: Too few periods", 
                                             "Use: Pooled or Random Effects"]
    
    if need_time_invariant:
        return "❌ FE Logit NOT recommended", ["Need time-invariant effects",
                                             "Use: Correlated Random Effects (Notebook 03)"]
    
    if pct_switchers < 0.20:
        return "⚠️  FE Logit PROBLEMATIC", [f"Only {100*pct_switchers:.0f}% switchers",
                                           "Sample loss too high",
                                           "Consider: Random Effects or Pooled"]
    
    if outcome_rare and pct_switchers < 0.30:
        return "⚠️  FE Logit QUESTIONABLE", ["Rare outcome + few switchers",
                                            "Check robustness carefully"]
    
    if T >= 5 and pct_switchers >= 0.30:
        reasons = [f"T = {T:.0f} ≥ 5: Sufficient periods",
                  f"{100*pct_switchers:.0f}% switchers: Good utilization",
                  "Controls for unobserved heterogeneity"]
        return "✓ FE Logit RECOMMENDED", reasons
    
    if T >= 3 and pct_switchers >= 0.25:
        reasons = [f"T = {T:.0f}: Acceptable but bias may remain",
                  f"{100*pct_switchers:.0f}% switchers: Moderate utilization",
                  "FE Logit possible, compare with Pooled"]
        return "⚠️  FE Logit POSSIBLE", reasons
    
    return "⚠️  FE Logit UNCERTAIN", ["Borderline case",
                                     "Estimate both Pooled and FE",
                                     "Compare and report both"]

# Example usage
print("="*70)
print("Decision Framework: Should You Use FE Logit?")
print("="*70)

# Our firm data
T_example = firm_data.groupby('firm_id').size().mean()
switchers_example = utilization_firm / 100
time_invariant_example = False  # We don't need industry effects

recommendation, reasons = recommend_estimator(
    T_example, switchers_example, time_invariant_example
)

print(f"\nYour Data:")
print(f"  T (avg) = {T_example:.1f}")
print(f"  Switchers = {100*switchers_example:.0f}%")
print(f"  Need time-invariant effects = {time_invariant_example}")

print(f"\n{recommendation}")
for reason in reasons:
    print(f"  • {reason}")

# Test other scenarios
print("\n" + "="*70)
print("Other Scenarios:")
print("="*70)

scenarios = [
    ("Short panel, few switchers", 2, 0.15, False, False),
    ("Long panel, many switchers", 10, 0.60, False, False),
    ("Need time-invariant effects", 7, 0.50, True, False),
    ("Rare outcome, moderate switchers", 6, 0.28, False, True),
]

for name, T, sw, ti, rare in scenarios:
    rec, _ = recommend_estimator(T, sw, ti, rare)
    print(f"\n{name}: {rec}")

## Summary: FE Logit Checklist

### ✓ Use FE Logit When:
- $T \geq 5$ (preferably $T \geq 8$)
- $\geq 30\%$ of individuals are switchers
- Strong theoretical reason to expect $\alpha_i$ correlates with $X_{it}$
- Only need effects of time-varying variables
- Causal interpretation is critical

### ✗ Do NOT Use FE Logit When:
- $T < 3$ (too short)
- $< 20\%$ switchers (too much sample loss)
- Need to estimate time-invariant variable effects
- Outcome is extremely rare/common with few switchers

### ⚠️  Use with Caution When:
- $3 \leq T < 5$ (check finite-T bias)
- $20\% \leq$ switchers $< 30\%$ (check robustness)
- $T > 20$ (computational cost)

### Best Practice:
1. **Always report switcher statistics**
2. **Compare Pooled vs FE** and discuss differences
3. **Report both** if differences are substantial
4. **Acknowledge limitations** in interpretation

---

# Exercises

## Exercise 1: Switcher Analysis (Easy)

**Goal**: Understand who contributes to FE Logit estimation.

**Tasks**:
1. Load the job training dataset
2. Calculate $\sum_t employed_{it}$ for each individual
3. Classify individuals as: "Always 0", "Switcher", or "Always 1"
4. Create:
   - Pie chart showing distribution of types
   - Histogram of $\sum_t y_{it}$
5. Calculate utilization rate for employment outcome

**Questions**:
- What percentage of individuals are switchers for employment?
- How does this compare to training participation?
- Would FE Logit be appropriate for modeling employment?

In [None]:
# Exercise 1: Your code here
print("Exercise 1: Switcher Analysis")
print("="*70)

# TODO: Your solution here


## Exercise 2: Incidental Parameters Simulation (Medium)

**Goal**: Replicate the incidental parameters bias simulation.

**Tasks**:
1. Generate panel data with:
   - $N = 1000$ individuals
   - $T \in \{3, 5, 10, 20\}$ time periods
   - True $\beta = 1.0$
   - Individual effects $\alpha_i \sim N(0, 1)$
   - $X_{it} \sim N(0, 1)$ independent
   - $y_{it}^* = X_{it}\beta + \alpha_i + \varepsilon_{it}$, where $\varepsilon_{it} \sim$ Logistic(0,1)
2. For each $T$:
   - Estimate Logit with individual dummies ("naive FE")
   - Estimate Pooled Logit
   - Estimate FE Logit (PanelBox)
3. Calculate bias for each estimator
4. Create plot showing bias vs $T$ for all three methods

**Expected Result**:
- Naive FE: bias decreases with $T$ but substantial for small $T$
- Pooled: constant bias (doesn't improve with $T$)
- FE Logit: smallest bias, decreases with $T$

In [None]:
# Exercise 2: Your code here  
print("Exercise 2: Incidental Parameters Simulation")
print("="*70)

# TODO: Your solution here


## Exercise 3: Pooled vs FE Comparison (Medium)

**Goal**: Conduct comprehensive comparison of Pooled vs FE Logit.

**Tasks**:
1. Using the firm technology data, estimate:
   - Pooled Logit: `adopted ~ log_size + profit_margin + age`
   - FE Logit: same specification
2. Create comparison table with:
   - Coefficients
   - Standard errors (cluster-robust)
   - Difference (Pooled - FE)
   - Percent difference
3. Create forest plot showing both estimates with 95% CIs
4. Calculate informal z-statistic for differences
5. Interpret results:
   - Which variables show large differences?
   - What does this imply about unobserved heterogeneity?
   - Which model would you recommend?

**Bonus**: Repeat for employment outcome in job training data and compare results.

In [None]:
# Exercise 3: Your code here
print("Exercise 3: Pooled vs FE Comparison")
print("="*70)

# TODO: Your solution here


## Exercise 4: Decision Framework Application (Hard)

**Goal**: Apply the decision framework to determine appropriate estimator.

**Scenario**: You have data on individual health insurance choices:
- $y_{it}$ = 1 if person $i$ has private insurance at time $t$
- $X_{it}$ = (age, income, health status, employment)
- Time-invariant: gender, education
- $N = 5000$, $T = 4$, years 2018-2021

**Tasks**:
1. Simulate data matching this scenario with:
   - Unobserved health consciousness $\alpha_i$ correlated with income
   - Moderate switcher rate (~35%)
   - Insurance rate ~60%
2. Calculate switcher statistics
3. Apply decision framework:
   - Check $T$ requirement
   - Check switcher percentage
   - Consider need for time-invariant effects (gender, education)
4. Estimate all reasonable models:
   - Pooled (with and without gender/education)
   - FE (only time-varying)
5. Compare results and make recommendation
6. Write 1-paragraph interpretation for applied paper

**Key Question**: Even if FE Logit is technically feasible, should you use it if you care about gender/education effects?

In [None]:
# Exercise 4: Your code here
print("Exercise 4: Decision Framework Application")
print("="*70)

# TODO: Your solution here


---

# Summary and Next Steps

## What You've Learned

In this notebook, you learned:

1. **The Problem**: Unobserved heterogeneity causes bias in Pooled Logit
2. **Why Simple FE Fails**: Incidental parameters problem in nonlinear models
3. **Chamberlain's Solution**: Conditional MLE eliminates $\alpha_i$
4. **Logit vs Probit**: Why FE works for Logit but not Probit
5. **Identification**: Only switchers contribute to estimation
6. **Within Variation**: FE identifies within-individual effects
7. **Implementation**: How to use PanelBox FixedEffectsLogit
8. **Comparison**: How to compare Pooled and FE results
9. **Application**: Real-world example with firm technology adoption
10. **Limitations**: When NOT to use FE Logit

## Key Takeaways

- **FE Logit controls for time-invariant unobserved heterogeneity**
- **Only switchers contribute** - check utilization rate!
- **Cannot estimate time-invariant effects**
- **Compare with Pooled** to understand magnitude of bias
- **Report both** if they differ substantially

## Next Steps

### Notebook 03: Random Effects and Correlated Random Effects
- Alternative to FE when sample loss is too high
- Can estimate time-invariant effects
- Allows correlation through Mundlak-Chamberlain device

### Notebook 04: Marginal Effects
- Calculate and interpret marginal effects
- Average Marginal Effects (AME)
- Marginal Effects at the Mean (MEM)
- Special considerations for FE Logit

### Notebook 08: Dynamic Binary Choice
- Models with lagged dependent variable
- Initial conditions problem
- Dynamic panel bias

---

## References

### Essential
1. **Chamberlain, G. (1980)**: "Analysis of Covariance with Qualitative Data", *Review of Economic Studies*, 47(1), 225-238.
   - Original conditional MLE paper

2. **Wooldridge, J.M. (2010)**: *Econometric Analysis of Cross Section and Panel Data*, 2nd ed., MIT Press, Chapter 15.8.
   - Excellent textbook treatment

3. **Cameron, A.C. & Trivedi, P.K. (2005)**: *Microeconometrics: Methods and Applications*, Cambridge University Press, Chapter 23.4.
   - Comprehensive coverage with applications

### Advanced
4. **Neyman, J. & Scott, E.L. (1948)**: "Consistent Estimates Based on Partially Consistent Observations", *Econometrica*, 16(1), 1-32.
   - Original incidental parameters problem paper

5. **Fernández-Val, I. (2009)**: "Fixed Effects Estimation of Structural Parameters and Marginal Effects in Panel Probit Models", *Journal of Econometrics*, 150(1), 71-85.
   - Bias-corrected FE for Probit

6. **Hahn, J. & Kuersteiner, G. (2002)**: "Asymptotically Unbiased Inference for a Dynamic Panel Model with Fixed Effects When Both $n$ and $T$ Are Large", *Econometrica*, 70(4), 1639-1657.
   - Bias correction methods

---

## Feedback

Questions or suggestions? Please open an issue on the PanelBox GitHub repository!

---

**End of Tutorial**