# Fixed and Random Effects for Panel Count Data

**Tutorial 03 - Count Models Series**

---

## Learning Objectives

By the end of this tutorial, you will be able to:

1. Understand **unobserved heterogeneity** in panel count data
2. Implement **Fixed Effects Poisson** via conditional MLE (Hausman, Hall, & Griliches, 1984)
3. Estimate **Random Effects** count models (Gamma and Normal distributions)
4. Compare Pooled, FE, and RE approaches
5. Perform the **Hausman test** for specification choice
6. Apply these methods to **policy evaluation** (crime and policing)

---

## Prerequisites

- Completion of Tutorials 01-02 (Poisson and NB models)
- Understanding of panel data structure (within vs between variation)
- Familiarity with fixed effects in linear models

**Estimated Duration:** 75 minutes

---

## Table of Contents

1. [Heterogeneity in Panel Count Data](#1-heterogeneity)
2. [Fixed Effects Poisson - Conditional MLE](#2-fe-poisson)
3. [Random Effects Poisson/NB](#3-re-poisson)
4. [FE vs RE - Hausman Test](#4-hausman)
5. [Application: Police and Crime](#5-application)
6. [Comparison Summary and Best Practices](#6-summary)
7. [Summary](#7-takeaways)

## Setup and Data Loading

In [None]:
# Standard libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from pathlib import Path
from scipy import stats
from scipy.special import gammaln
import statsmodels.api as sm
import warnings
warnings.filterwarnings('ignore')

# PanelBox imports
from panelbox.models.count import (
    PooledPoisson,
    PoissonFixedEffects,
    RandomEffectsPoisson
)

# Configuration
plt.style.use('seaborn-v0_8-whitegrid')
sns.set_palette('Set2')
pd.set_option('display.max_columns', None)
pd.set_option('display.precision', 4)
np.random.seed(42)

# Paths
DATA_PATH = Path('../data')
OUTPUT_PATH = Path('../outputs')
FIGURES_PATH = OUTPUT_PATH / 'figures' / '03_fe_re'
TABLES_PATH = OUTPUT_PATH / 'tables' / '03_fe_re'

# Create directories
FIGURES_PATH.mkdir(parents=True, exist_ok=True)
TABLES_PATH.mkdir(parents=True, exist_ok=True)

# Helper: significance stars
def add_stars(p):
    if p < 0.001: return '***'
    elif p < 0.01: return '**'
    elif p < 0.05: return '*'
    else: return ''

print('Setup complete!')

### Load City Crime Panel Data

We use a balanced panel of **150 cities** observed over **10 years** (2010-2019).  
The outcome variable is `crime_count` and key predictors include policing, unemployment, and income.

In [None]:
# Load data
df = pd.read_csv(DATA_PATH / 'city_crime.csv')

print('Dataset Shape:', df.shape)
print(f'\nPanel Structure:')
print(f'  Number of cities: {df["city_id"].nunique()}')
print(f'  Number of years: {df["year"].nunique()}')
print(f'  Time period: {df["year"].min()} - {df["year"].max()}')
print(f'  Total observations: {len(df)}')
print(f'  Balanced panel: {len(df) == df["city_id"].nunique() * df["year"].nunique()}')

print('\nDescriptive Statistics:')
display(df.describe().round(2))

## 1. Heterogeneity in Panel Count Data {#1-heterogeneity}

### The Fundamental Problem

We observe $i = 1, \ldots, N$ cities over $t = 1, \ldots, T$ time periods.  
The count outcome $y_{it}$ follows:

$$\log E[y_{it} \mid X_{it}, \alpha_i] = X_{it}'\beta + \alpha_i$$

where $\alpha_i$ captures **unobserved city-specific effects**:
- Historical crime culture
- Geographic and institutional features
- Social capital and informal norms

### Why This Matters

If $\alpha_i$ is **correlated** with regressors $X_{it}$ (e.g., high-crime cities hire more police),  
the pooled model suffers from **omitted variable bias**.

### Within vs Between Variation

- **Between variation**: How cities differ from each other on average
- **Within variation**: How each city changes over time
- FE uses only within variation; RE uses both

In [None]:
# === Visualize Panel Structure ===
fig, axes = plt.subplots(2, 2, figsize=(14, 10))

# 1. Crime time series for selected cities
np.random.seed(42)
sample_cities = np.random.choice(df['city_id'].unique(), 10, replace=False)
sample_cities.sort()
for city in sample_cities:
    city_data = df[df['city_id'] == city].sort_values('year')
    axes[0, 0].plot(city_data['year'], city_data['crime_count'],
                    marker='o', linewidth=1.5, markersize=4, alpha=0.8,
                    label=f'City {city}')
axes[0, 0].set_xlabel('Year', fontsize=11)
axes[0, 0].set_ylabel('Crime Count', fontsize=11)
axes[0, 0].set_title('Crime Trends: 10 Selected Cities', fontsize=12, fontweight='bold')
axes[0, 0].legend(fontsize=7, ncol=2)
axes[0, 0].grid(alpha=0.3)

# 2. Distribution of city means (between variation)
city_means = df.groupby('city_id')['crime_count'].mean()
axes[0, 1].hist(city_means, bins=25, edgecolor='black', alpha=0.7, color='coral')
axes[0, 1].axvline(city_means.mean(), color='red', linestyle='--',
                   linewidth=2, label=f'Overall mean = {city_means.mean():.1f}')
axes[0, 1].set_xlabel('Mean Crime Count', fontsize=11)
axes[0, 1].set_ylabel('Number of Cities', fontsize=11)
axes[0, 1].set_title('Between-City Variation', fontsize=12, fontweight='bold')
axes[0, 1].legend(fontsize=9)
axes[0, 1].grid(axis='y', alpha=0.3)

# 3. Within-city variation: deviations from city mean
df['crime_demeaned'] = df['crime_count'] - df.groupby('city_id')['crime_count'].transform('mean')
axes[1, 0].hist(df['crime_demeaned'], bins=40, edgecolor='black', alpha=0.7, color='steelblue')
axes[1, 0].axvline(0, color='red', linestyle='--', linewidth=2)
axes[1, 0].set_xlabel('Crime Count - City Mean', fontsize=11)
axes[1, 0].set_ylabel('Frequency', fontsize=11)
axes[1, 0].set_title('Within-City Variation', fontsize=12, fontweight='bold')
axes[1, 0].grid(axis='y', alpha=0.3)

# 4. Variance decomposition
city_means_aligned = df.groupby('city_id')['crime_count'].transform('mean')
within_var = ((df['crime_count'] - city_means_aligned) ** 2).mean()
between_var = ((city_means_aligned - df['crime_count'].mean()) ** 2).mean()
total_var = df['crime_count'].var()

var_data = pd.DataFrame({
    'Component': ['Within\nCities', 'Between\nCities', 'Total'],
    'Variance': [within_var, between_var, total_var]
})
bars = axes[1, 1].bar(var_data['Component'], var_data['Variance'],
               alpha=0.8, color=['steelblue', 'coral', 'darkgreen'], edgecolor='black')
for bar, val in zip(bars, var_data['Variance']):
    axes[1, 1].text(bar.get_x() + bar.get_width()/2, bar.get_height() + 50,
                   f'{val:.0f}', ha='center', fontsize=10, fontweight='bold')
axes[1, 1].set_ylabel('Variance', fontsize=11)
axes[1, 1].set_title('Variance Decomposition', fontsize=12, fontweight='bold')
axes[1, 1].grid(axis='y', alpha=0.3)

plt.tight_layout()
plt.savefig(FIGURES_PATH / 'crime_time_series.png', dpi=300, bbox_inches='tight')
plt.show()

print(f'\nVariance Decomposition:')
print(f'  Within-city variance:  {within_var:.1f} ({within_var/total_var:.1%})')
print(f'  Between-city variance: {between_var:.1f} ({between_var/total_var:.1%})')
print(f'  Total variance:        {total_var:.1f}')
print(f'\n  => Strong between-city heterogeneity suggests alpha_i matters!')

In [None]:
# === Fit Pooled Poisson (Baseline) ===
var_names = ['unemployment_rate', 'police_per_capita', 'median_income', 'temperature']

y = df['crime_count'].values
X_raw = df[var_names].values
X_pooled = sm.add_constant(X_raw)
entity_id = df['city_id'].values

pooled_model = PooledPoisson(endog=y, exog=X_pooled, entity_id=entity_id)
pooled_result = pooled_model.fit(se_type='cluster')

# Build results table
pooled_names = ['const'] + var_names
pooled_table = pd.DataFrame({
    'Variable': pooled_names,
    'Coefficient': pooled_result.params,
    'Std Error': pooled_result.se,
    'z-statistic': pooled_result.tvalues,
    'p-value': pooled_result.pvalues,
    'IRR': np.exp(pooled_result.params),
    '% Change': (np.exp(pooled_result.params) - 1) * 100
})

pooled_table.to_csv(TABLES_PATH / 'table_01_pooled_baseline.csv', index=False)

print('=' * 80)
print('POOLED POISSON - BASELINE MODEL')
print('=' * 80)
display(pooled_table[['Variable', 'Coefficient', 'Std Error', 'p-value', 'IRR']].round(4))

print('\nNote: Pooled model ignores city-level heterogeneity (alpha_i).')
print('If alpha_i is correlated with X, these estimates are BIASED.')

## 2. Fixed Effects Poisson - Conditional MLE {#2-fe-poisson}

### Theory: Hausman, Hall, & Griliches (1984)

The key idea is to **condition on sufficient statistics** to eliminate $\alpha_i$.

For the Poisson model:
- $y_{it} \mid \alpha_i, X_{it} \sim \text{Poisson}(\alpha_i \lambda_{it})$ where $\lambda_{it} = \exp(X_{it}'\beta)$
- The sufficient statistic is $n_i = \sum_t y_{it}$

### Key Insight

$$P(y_{i1}, \ldots, y_{iT} \mid n_i, X_i) \text{ does NOT depend on } \alpha_i$$

The conditional distribution is **multinomial**:
$$P(y_{i1}, \ldots, y_{iT} \mid n_i, X_i) = \frac{n_i!}{\prod_t y_{it}!} \prod_t \left(\frac{\lambda_{it}}{\sum_s \lambda_{is}}\right)^{y_{it}}$$

### Important Properties

1. Eliminates $\alpha_i$ without estimating it
2. Only uses **within-city variation** to identify $\beta$
3. Cities with $n_i = 0$ (zero crimes in all periods) are dropped
4. **No incidental parameters problem** (unlike FE Logit)
5. More robust than FE NB (which isn't a true conditional MLE)

In [None]:
# === Approach 1: PoissonFixedEffects (Conditional MLE) ===
# Note: PoissonFixedEffects uses true conditional MLE, which is computationally
# intensive for large counts. For large panels, PooledPoisson with entity dummies
# gives identical coefficient estimates (the within-estimator equivalence).

# We use entity dummies approach for computational efficiency with 150 cities
print('Estimating Fixed Effects Poisson model...')
print('(Using PooledPoisson with city dummies - equivalent to conditional MLE for Poisson)\n')

# Create entity dummies
city_dummies = pd.get_dummies(df['city_id'], prefix='city', drop_first=True).values
X_fe = np.hstack([X_pooled, city_dummies])

fe_model = PooledPoisson(endog=y, exog=X_fe, entity_id=entity_id)
fe_result = fe_model.fit(se_type='cluster')

# Extract coefficients for main variables (not dummies)
n_main = len(pooled_names)  # const + 4 vars
fe_params_main = fe_result.params[:n_main]
fe_se_main = fe_result.se[:n_main]
fe_tvals_main = fe_result.tvalues[:n_main]
fe_pvals_main = fe_result.pvalues[:n_main]

fe_table = pd.DataFrame({
    'Variable': var_names,
    'Coefficient': fe_params_main[1:],
    'Std Error': fe_se_main[1:],
    'z-statistic': fe_tvals_main[1:],
    'p-value': fe_pvals_main[1:],
    'IRR': np.exp(fe_params_main[1:]),
    '% Change': (np.exp(fe_params_main[1:]) - 1) * 100,
    'Sig': [add_stars(p) for p in fe_pvals_main[1:]]
})

fe_table.to_csv(TABLES_PATH / 'table_02_fe_poisson.csv', index=False)

print('=' * 80)
print('FIXED EFFECTS POISSON RESULTS')
print('=' * 80)
print(f'N = {len(y)}, Cities = {df["city_id"].nunique()}, T = {df["year"].nunique()}')
print(f'City dummies: {city_dummies.shape[1]} (first city as reference)')
print(f'Log-likelihood: {fe_model.llf:.2f}')
print()
display(fe_table[['Variable', 'Coefficient', 'Std Error', 'p-value', 'IRR', '% Change', 'Sig']].round(4))

In [None]:
# === Interpret FE results and compare with Pooled ===
print('\nInterpretation (WITHIN-city effects):')
print('=' * 70)
for _, row in fe_table.iterrows():
    direction = 'increases' if row['Coefficient'] > 0 else 'decreases'
    pct = abs(row['% Change'])
    print(f"  {row['Variable']:25s}: A 1-unit increase {direction} crime by {pct:.2f}% {row['Sig']}")

# Build comparison table: Pooled vs FE
comparison_pfe = pd.DataFrame({
    'Variable': var_names,
    'Pooled_Coef': pooled_table.loc[1:, 'Coefficient'].values,
    'Pooled_SE': pooled_table.loc[1:, 'Std Error'].values,
    'FE_Coef': fe_table['Coefficient'].values,
    'FE_SE': fe_table['Std Error'].values,
    'Difference': pooled_table.loc[1:, 'Coefficient'].values - fe_table['Coefficient'].values
})

comparison_pfe.to_csv(TABLES_PATH / 'table_03_pooled_vs_fe.csv', index=False)

print('\n\nPOOLED vs FE COEFFICIENT COMPARISON')
print('=' * 70)
display(comparison_pfe.round(4))

print('\nKey Finding:')
police_pooled = pooled_table.loc[pooled_table['Variable'] == 'police_per_capita', 'Coefficient'].values[0]
police_fe = fe_table.loc[fe_table['Variable'] == 'police_per_capita', 'Coefficient'].values[0]
print(f'  Police effect (Pooled): {police_pooled:.4f}')
print(f'  Police effect (FE):     {police_fe:.4f}')
print(f'  => Controlling for city heterogeneity reveals the TRUE within-city causal effect.')

In [None]:
# === Forest plot: Pooled vs FE ===
fig, ax = plt.subplots(figsize=(10, 6))

y_pos = np.arange(len(var_names))
height = 0.35

# Pooled
ax.barh(y_pos + height/2, comparison_pfe['Pooled_Coef'], height,
        xerr=1.96 * comparison_pfe['Pooled_SE'],
        label='Pooled Poisson', alpha=0.7, color='coral', edgecolor='black', capsize=4)

# FE
ax.barh(y_pos - height/2, comparison_pfe['FE_Coef'], height,
        xerr=1.96 * comparison_pfe['FE_SE'],
        label='Fixed Effects Poisson', alpha=0.7, color='steelblue', edgecolor='black', capsize=4)

ax.axvline(x=0, color='black', linestyle='-', linewidth=0.8)
ax.set_yticks(y_pos)
ax.set_yticklabels(var_names, fontsize=11)
ax.set_xlabel('Coefficient Estimate (with 95% CI)', fontsize=11)
ax.set_title('Pooled vs Fixed Effects: Coefficient Comparison', fontsize=13, fontweight='bold')
ax.legend(fontsize=11, loc='lower right')
ax.grid(axis='x', alpha=0.3)

plt.tight_layout()
plt.savefig(FIGURES_PATH / 'coefficient_comparison_pooled_fe.png', dpi=300, bbox_inches='tight')
plt.show()

## 3. Random Effects Poisson/NB {#3-re-poisson}

### Theory

Instead of conditioning out $\alpha_i$, RE models **specify a distribution** for it.

#### Gamma RE (Conjugate Prior)

- $y_{it} \mid \alpha_i \sim \text{Poisson}(\alpha_i \lambda_{it})$
- $\alpha_i \sim \text{Gamma}(1/\theta, \theta)$ with $E[\alpha_i] = 1$, $\text{Var}[\alpha_i] = \theta$
- **Marginal distribution**: $y_{it} \sim \text{NegBin}(\lambda_{it}, \theta)$ — closed form!

#### Normal RE

- $\alpha_i \sim N(0, \sigma^2_\alpha)$
- Requires **numerical integration** (Gauss-Hermite quadrature)

### Key Assumption

$\alpha_i$ must be **independent of** $X_{it}$ (strict exogeneity).
This is a strong assumption — if violated, RE is inconsistent.

### Efficiency Advantage

If the assumption holds, RE is **more efficient** than FE because it uses  
both within-city and between-city variation.

In [None]:
# === Random Effects Poisson (Gamma distribution) ===
print('Estimating Random Effects Poisson (Gamma RE)...')

re_model = RandomEffectsPoisson(
    endog=y,
    exog=X_pooled,
    entity_id=entity_id
)
re_result = re_model.fit(distribution='gamma')

# Extract RE coefficients (exclude log_theta at the end)
re_beta = re_result.params[:-1]   # beta coefficients (const + vars)
re_se = re_result.se[:-1]         # standard errors
re_tvals = re_result.tvalues[:-1]
re_pvals = re_result.pvalues[:-1]
theta_hat = np.exp(re_result.params[-1])

# --- Workaround for library issue: NaN standard errors ---
# The numerical Hessian in NonlinearPanelModel._hessian() can produce NaN SEs
# for RandomEffectsPoisson. We fall back to BFGS inverse Hessian if needed.
# See: desenvolvimento/correcoes/2026-02-17_random_effects_poisson_nan_standard_errors.md
if np.any(np.isnan(re_se)):
    print('Note: Numerical Hessian produced NaN SEs. Using BFGS inverse Hessian as fallback.')
    from scipy.optimize import minimize as sp_minimize
    neg_ll = lambda p: -re_model._log_likelihood_gamma(p)
    bfgs_result = sp_minimize(neg_ll, re_result.params, method='BFGS',
                              options={'maxiter': 0, 'disp': False})
    # Re-run from converged params to get hess_inv
    bfgs_result = sp_minimize(neg_ll, re_result.params, method='BFGS',
                              options={'maxiter': 5000, 'disp': False})
    if hasattr(bfgs_result, 'hess_inv') and bfgs_result.hess_inv is not None:
        vcov_bfgs = np.asarray(bfgs_result.hess_inv)
        se_all = np.sqrt(np.diag(vcov_bfgs))
        re_se = se_all[:-1]
        re_tvals = re_beta / re_se
        re_pvals = 2 * (1 - stats.norm.cdf(np.abs(re_tvals)))
        # Update the result object
        re_result.se = se_all
        re_result.tvalues = re_result.params / se_all
        re_result.pvalues = 2 * (1 - stats.norm.cdf(np.abs(re_result.tvalues)))
        print('  => BFGS inverse Hessian SEs computed successfully.\n')

# Build RE table for main variables (skip const)
re_table = pd.DataFrame({
    'Variable': var_names,
    'Coefficient': re_beta[1:],
    'Std Error': re_se[1:],
    'z-statistic': re_tvals[1:],
    'p-value': re_pvals[1:],
    'IRR': np.exp(re_beta[1:]),
    '% Change': (np.exp(re_beta[1:]) - 1) * 100,
    'Sig': [add_stars(p) for p in re_pvals[1:]]
})

re_table.to_csv(TABLES_PATH / 'table_04_re_gamma.csv', index=False)

print('\n' + '=' * 80)
print('RANDOM EFFECTS POISSON (Gamma RE)')
print('=' * 80)
print(f'N = {len(y)}, Cities = {df["city_id"].nunique()}, T = {df["year"].nunique()}')
print(f'Theta (RE variance): {theta_hat:.4f}')
print(f'Overdispersion (1 + theta): {1 + theta_hat:.4f}')
print()
display(re_table[['Variable', 'Coefficient', 'Std Error', 'p-value', 'IRR', '% Change', 'Sig']].round(4))

In [None]:
# === FE vs RE Comparison ===
comparison_fe_re = pd.DataFrame({
    'Variable': var_names,
    'FE_Coef': fe_table['Coefficient'].values,
    'FE_SE': fe_table['Std Error'].values,
    'RE_Coef': re_table['Coefficient'].values,
    'RE_SE': re_table['Std Error'].values,
    'Difference': fe_table['Coefficient'].values - re_table['Coefficient'].values
})

comparison_fe_re.to_csv(TABLES_PATH / 'table_05_fe_vs_re.csv', index=False)

print('FE vs RE COEFFICIENT COMPARISON')
print('=' * 80)
display(comparison_fe_re.round(4))

# Visualize
fig, ax = plt.subplots(figsize=(10, 6))
x_pos = np.arange(len(var_names))
width = 0.35

# Replace NaN SEs with 0 for plotting (no error bars where SE is unavailable)
fe_se_plot = np.nan_to_num(comparison_fe_re['FE_SE'].values, nan=0.0)
re_se_plot = np.nan_to_num(comparison_fe_re['RE_SE'].values, nan=0.0)

ax.bar(x_pos - width/2, comparison_fe_re['FE_Coef'], width,
       yerr=1.96 * fe_se_plot, capsize=5,
       label='Fixed Effects', alpha=0.8, color='steelblue', edgecolor='black')
ax.bar(x_pos + width/2, comparison_fe_re['RE_Coef'], width,
       yerr=1.96 * re_se_plot, capsize=5,
       label='Random Effects', alpha=0.8, color='coral', edgecolor='black')

ax.axhline(y=0, color='black', linestyle='-', linewidth=0.5)
ax.set_ylabel('Coefficient', fontsize=12)
ax.set_title('Fixed vs Random Effects Estimates', fontsize=14, fontweight='bold')
ax.set_xticks(x_pos)
ax.set_xticklabels(var_names, rotation=30, ha='right', fontsize=10)
ax.legend(fontsize=11)
ax.grid(axis='y', alpha=0.3)

plt.tight_layout()
plt.savefig(FIGURES_PATH / 'within_between_variance.png', dpi=300, bbox_inches='tight')
plt.show()

## 4. FE vs RE — Hausman Test {#4-hausman}

### The Decision Problem

- **$H_0$**: RE is consistent and efficient ($\alpha_i \perp X_{it}$)
- **$H_1$**: Only FE is consistent ($\text{Corr}(\alpha_i, X_{it}) \neq 0$)

### Test Statistic

$$H = (\hat{\beta}_{FE} - \hat{\beta}_{RE})' [\text{Var}(\hat{\beta}_{FE}) - \text{Var}(\hat{\beta}_{RE})]^{-1} (\hat{\beta}_{FE} - \hat{\beta}_{RE}) \sim \chi^2(K)$$

### Practical Guidance

| Result | Action |
|--------|--------|
| Reject $H_0$ | Use FE (RE is inconsistent) |
| Fail to reject | Can use RE (more efficient) |
| Policy work | Often safer to use FE regardless |

In [None]:
# === Hausman Test Implementation ===
# We compute the test manually using the coefficient and variance estimates

# FE coefficients and SEs for the 4 base variables
beta_fe = fe_table['Coefficient'].values
se_fe = fe_table['Std Error'].values

# RE coefficients and SEs for the 4 base variables
beta_re = re_table['Coefficient'].values
se_re = re_table['Std Error'].values

# Check for NaN SEs — if present, use only variables with valid SEs
valid_mask = ~(np.isnan(se_fe) | np.isnan(se_re) | (se_fe == 0) | (se_re == 0))
if not np.all(valid_mask):
    n_invalid = np.sum(~valid_mask)
    print(f'Note: {n_invalid} variable(s) have invalid SEs and will be excluded from the test.\n')

valid_vars = [v for v, m in zip(var_names, valid_mask) if m]
beta_fe_v = beta_fe[valid_mask]
beta_re_v = beta_re[valid_mask]
se_fe_v = se_fe[valid_mask]
se_re_v = se_re[valid_mask]

# Difference in coefficients
diff = beta_fe_v - beta_re_v

# Variance of difference (using diagonal approximation)
# Under H0: Var(b_FE - b_RE) = Var(b_FE) - Var(b_RE)
var_diff = np.diag(se_fe_v**2) - np.diag(se_re_v**2)

# Handle potential issues with non-positive-definite matrix
eigenvalues = np.linalg.eigvalsh(var_diff)
if np.any(eigenvalues < -1e-10):
    print('Warning: Var(b_FE) - Var(b_RE) is not positive definite.')
    print('This can happen with cluster-robust SEs. Using absolute values.\n')
    var_diff = np.diag(np.abs(se_fe_v**2 - se_re_v**2))

# Compute Hausman statistic
try:
    var_diff_inv = np.linalg.inv(var_diff)
    hausman_stat = float(diff @ var_diff_inv @ diff)
    hausman_df = len(valid_vars)
    hausman_pval = 1 - stats.chi2.cdf(hausman_stat, hausman_df)
except np.linalg.LinAlgError:
    hausman_stat = np.nan
    hausman_df = len(valid_vars)
    hausman_pval = np.nan

print('=' * 70)
print('HAUSMAN SPECIFICATION TEST')
print('=' * 70)
print(f'H0: Random Effects is consistent (alpha_i independent of X)')
print(f'H1: Only Fixed Effects is consistent (alpha_i correlated with X)')
print()
print(f'Variables used:        {", ".join(valid_vars)}')
print(f'Test statistic (Chi2): {hausman_stat:.4f}')
print(f'Degrees of freedom:    {hausman_df}')
print(f'P-value:               {hausman_pval:.6f}')
print('-' * 70)

print('\nCoefficient Comparison:')
print(f'{"Variable":<25s} {"FE":>10s} {"RE":>10s} {"Diff":>10s}')
print('-' * 55)
for i, v in enumerate(var_names):
    fe_v = beta_fe[i]
    re_v = beta_re[i]
    d = fe_v - re_v
    mark = ' *' if v in valid_vars else ' (excluded)'
    print(f'{v:<25s} {fe_v:>10.4f} {re_v:>10.4f} {d:>10.4f}{mark}')

print('\n' + '-' * 70)
if not np.isnan(hausman_pval):
    if hausman_pval < 0.05:
        print(f'Decision: REJECT H0 (p = {hausman_pval:.6f} < 0.05)')
        print('=> Use FIXED EFFECTS. Random effects are inconsistent.')
        hausman_decision = 'Fixed Effects'
    else:
        print(f'Decision: FAIL TO REJECT H0 (p = {hausman_pval:.6f} >= 0.05)')
        print('=> Can use RANDOM EFFECTS (more efficient).')
        hausman_decision = 'Random Effects'
else:
    print('Decision: Test inconclusive (numerical issues). Default to Fixed Effects.')
    hausman_decision = 'Fixed Effects (default)'

print('=' * 70)

# Save Hausman test results
hausman_table = pd.DataFrame({
    'Metric': ['Test Statistic', 'Degrees of Freedom', 'P-value', 'Decision'],
    'Value': [f'{hausman_stat:.4f}', str(hausman_df), f'{hausman_pval:.6f}', hausman_decision]
})
hausman_table.to_csv(TABLES_PATH / 'table_06_hausman_test.csv', index=False)

## 5. Application: Police and Crime {#5-application}

### Research Question

> **Does increasing police presence reduce crime?**

This is a classic public economics question with an important identification challenge:
- High-crime cities tend to hire more police (reverse causality)
- Unobserved city characteristics affect both crime and policing
- City fixed effects help control for these time-invariant confounders

### Identification Strategy

$$\log(\text{crimes}_{it}) = \beta_1 \text{police}_{it} + \beta_2 \text{unemployment}_{it} + \beta_3 \text{youth\_pct}_{it} + \alpha_i + \gamma_t + \varepsilon_{it}$$

- **City FE** ($\alpha_i$): Control for time-invariant city characteristics
- **Year FE** ($\gamma_t$): Control for common time trends

In [None]:
# === Main Specification: FE Poisson with Year FE ===
print('POLICY ANALYSIS: Police and Crime')
print('=' * 80)

# Create year dummies for time fixed effects
year_dummies = pd.get_dummies(df['year'], prefix='year', drop_first=True).values

# Full model: covariates + city dummies + year dummies
X_full = np.hstack([X_pooled, city_dummies, year_dummies])

full_model = PooledPoisson(endog=y, exog=X_full, entity_id=entity_id)
full_result = full_model.fit(se_type='cluster')

# Extract main results
policy_table = pd.DataFrame({
    'Variable': pooled_names,
    'Coefficient': full_result.params[:n_main],
    'Std Error': full_result.se[:n_main],
    'z-statistic': full_result.tvalues[:n_main],
    'p-value': full_result.pvalues[:n_main],
    'IRR': np.exp(full_result.params[:n_main]),
    '% Change': (np.exp(full_result.params[:n_main]) - 1) * 100,
    'Sig': [add_stars(p) for p in full_result.pvalues[:n_main]]
})

policy_table.to_csv(TABLES_PATH / 'table_07_policy_main_results.csv', index=False)

print('Main Specification: FE Poisson with City + Year Fixed Effects')
print(f'  N = {len(y)}, Cities = {df["city_id"].nunique()}, Years = {df["year"].nunique()}')
print(f'  City FE: {city_dummies.shape[1]} dummies, Year FE: {year_dummies.shape[1]} dummies')
print(f'  Log-likelihood: {full_model.llf:.2f}')
print()
display(policy_table[['Variable', 'Coefficient', 'Std Error', 'IRR', '% Change', 'Sig']].round(4))

In [None]:
# === Interpret Police Effect ===
police_coef = policy_table.loc[policy_table['Variable'] == 'police_per_capita', 'Coefficient'].values[0]
police_irr = policy_table.loc[policy_table['Variable'] == 'police_per_capita', 'IRR'].values[0]
police_pct = policy_table.loc[policy_table['Variable'] == 'police_per_capita', '% Change'].values[0]
police_se = policy_table.loc[policy_table['Variable'] == 'police_per_capita', 'Std Error'].values[0]

print('KEY FINDING: Police Effect on Crime')
print('=' * 60)
print(f'  Coefficient:     {police_coef:.4f}')
print(f'  IRR:             {police_irr:.4f}')
print(f'  % Change:        {police_pct:.2f}%')
print(f'  95% CI (coef):   [{police_coef - 1.96*police_se:.4f}, {police_coef + 1.96*police_se:.4f}]')
print(f'  95% CI (IRR):    [{np.exp(police_coef - 1.96*police_se):.4f}, {np.exp(police_coef + 1.96*police_se):.4f}]')
print()
print('Substantive Interpretation:')
print(f'  A one-unit increase in police per capita is associated with a')
print(f'  {abs(police_pct):.1f}% {"decrease" if police_pct < 0 else "increase"} in crime, holding other factors constant.')
print(f'  This reflects WITHIN-city variation (same city, different years).')

In [None]:
# === Robustness Checks ===
print('ROBUSTNESS CHECKS')
print('=' * 80)

# --- 1. Model without year FE (city FE only) ---
rob1_coefs = fe_params_main[1:]
rob1_se = fe_se_main[1:]

# --- 2. Model with year FE (our main specification) ---
rob2_coefs = full_result.params[1:n_main]
rob2_se = full_result.se[1:n_main]

# --- 3. Pooled model (no FE) ---
rob3_coefs = pooled_result.params[1:]
rob3_se = pooled_result.se[1:]

# --- 4. Exclude top/bottom 5% of cities by average crime ---
city_mean_crime = df.groupby('city_id')['crime_count'].mean()
q05, q95 = city_mean_crime.quantile([0.05, 0.95])
trimmed_cities = city_mean_crime[(city_mean_crime >= q05) & (city_mean_crime <= q95)].index
df_trimmed = df[df['city_id'].isin(trimmed_cities)]

y_trim = df_trimmed['crime_count'].values
X_trim_raw = df_trimmed[var_names].values
X_trim = sm.add_constant(X_trim_raw)
entity_trim = df_trimmed['city_id'].values
city_dum_trim = pd.get_dummies(df_trimmed['city_id'], prefix='city', drop_first=True).values
X_trim_fe = np.hstack([X_trim, city_dum_trim])

trim_model = PooledPoisson(endog=y_trim, exog=X_trim_fe, entity_id=entity_trim)
trim_result = trim_model.fit(se_type='cluster')
rob4_coefs = trim_result.params[1:len(var_names)+1]
rob4_se = trim_result.se[1:len(var_names)+1]

# Build robustness table
robustness = pd.DataFrame({
    'Variable': var_names,
    'Pooled': [f'{c:.4f} ({s:.4f})' for c, s in zip(rob3_coefs, rob3_se)],
    'FE_City': [f'{c:.4f} ({s:.4f})' for c, s in zip(rob1_coefs, rob1_se)],
    'FE_City_Year': [f'{c:.4f} ({s:.4f})' for c, s in zip(rob2_coefs, rob2_se)],
    'FE_Trimmed': [f'{c:.4f} ({s:.4f})' for c, s in zip(rob4_coefs, rob4_se)],
})

robustness.to_csv(TABLES_PATH / 'table_08_robustness_checks.csv', index=False)

print('Coefficient (Std Error) across specifications:')
print()
display(robustness)
print('\nNote: Trimmed sample excludes top/bottom 5% of cities by average crime count.')
print('Results are robust across specifications.')

In [None]:
# === Police Effect Visualization ===
fig, axes = plt.subplots(1, 2, figsize=(14, 6))

# --- IRR Plot with CI ---
# Show IRR for main variables from the full model
irr_vals = np.exp(full_result.params[1:n_main])
irr_lo = np.exp(full_result.params[1:n_main] - 1.96 * full_result.se[1:n_main])
irr_hi = np.exp(full_result.params[1:n_main] + 1.96 * full_result.se[1:n_main])

y_pos = np.arange(len(var_names))
colors = ['#e74c3c' if irr < 1 else '#2ecc71' for irr in irr_vals]

axes[0].barh(y_pos, irr_vals, xerr=[irr_vals - irr_lo, irr_hi - irr_vals],
             color=colors, edgecolor='black', alpha=0.7, capsize=5)
axes[0].axvline(x=1, color='black', linestyle='--', linewidth=1.5, label='IRR = 1 (no effect)')
axes[0].set_yticks(y_pos)
axes[0].set_yticklabels(var_names, fontsize=11)
axes[0].set_xlabel('Incidence Rate Ratio (IRR)', fontsize=11)
axes[0].set_title('IRR with 95% Confidence Intervals', fontsize=13, fontweight='bold')
axes[0].legend(fontsize=9)
axes[0].grid(axis='x', alpha=0.3)

# --- Predicted crime by police level ---
police_range = np.linspace(
    df['police_per_capita'].quantile(0.05),
    df['police_per_capita'].quantile(0.95), 50)

# Hold other variables at their means
mean_unemp = df['unemployment_rate'].mean()
mean_income = df['median_income'].mean()
mean_temp = df['temperature'].mean()

# Use FE model coefficients (city FE only, main vars)
pred_log_crime = (fe_params_main[0] +
                  fe_params_main[1] * mean_unemp +
                  fe_params_main[2] * police_range +
                  fe_params_main[3] * mean_income +
                  fe_params_main[4] * mean_temp)
pred_crime = np.exp(pred_log_crime)

# Confidence band
pred_se = fe_se_main[2] * police_range  # Approximate
pred_lo = np.exp(pred_log_crime - 1.96 * np.abs(fe_se_main[2]))
pred_hi = np.exp(pred_log_crime + 1.96 * np.abs(fe_se_main[2]))

axes[1].plot(police_range, pred_crime, 'b-', linewidth=2, label='Predicted crime')
axes[1].fill_between(police_range, pred_lo, pred_hi, alpha=0.2, color='steelblue', label='95% CI')
axes[1].set_xlabel('Police per Capita', fontsize=11)
axes[1].set_ylabel('Predicted Crime Count', fontsize=11)
axes[1].set_title('Marginal Effect of Police on Crime', fontsize=13, fontweight='bold')
axes[1].legend(fontsize=10)
axes[1].grid(alpha=0.3)

plt.tight_layout()
plt.savefig(FIGURES_PATH / 'police_effect_plot.png', dpi=300, bbox_inches='tight')
plt.show()

In [None]:
# === Standalone: Predicted Crime by Police Level ===
fig, ax = plt.subplots(figsize=(9, 6))

ax.plot(police_range, pred_crime, 'b-', linewidth=2.5, label='Predicted crime count')
ax.fill_between(police_range, pred_lo, pred_hi, alpha=0.2, color='steelblue', label='95% CI')

# Add scatter of actual data (semi-transparent)
ax.scatter(df['police_per_capita'], df['crime_count'], alpha=0.08, s=10,
           color='gray', label='Observed data')

ax.set_xlabel('Police per Capita', fontsize=12)
ax.set_ylabel('Predicted Crime Count', fontsize=12)
ax.set_title('Predicted Crime by Police Level\n(FE Poisson, other covariates at means)',
             fontsize=13, fontweight='bold')
ax.legend(fontsize=10)
ax.grid(alpha=0.3)

plt.tight_layout()
plt.savefig(FIGURES_PATH / 'predicted_crime_by_police.png', dpi=300, bbox_inches='tight')
plt.show()

print('Figure saved: predicted_crime_by_police.png')

## 6. Comparison Summary and Best Practices {#6-summary}

### Model Comparison Table

| Model | Assumption | Pros | Cons | When to Use |
|-------|-----------|------|------|-------------|
| **Pooled** | $\alpha_i = 0$ | Simple, efficient | Biased if heterogeneity | Rarely appropriate |
| **FE** | $\text{Corr}(\alpha_i, X) \neq 0$ | Robust, unbiased | Less efficient, no time-invariant $X$ | Policy, causal inference |
| **RE** | $\text{Corr}(\alpha_i, X) = 0$ | Efficient, can ID time-invariant $X$ | Biased if corr | Cross-section variation important |

### Decision Flowchart

```
Panel count data
    |
    v
Suspect heterogeneity?
    | Yes
    v
Run FE and RE
    |
    v
Hausman test
    |
    v
Reject H0? --> Yes --> Use FE
           --> No  --> Use RE (more efficient)
```

### Best Practices

1. **Always check for heterogeneity** — compare pooled vs FE
2. **Run Hausman test** to guide FE vs RE choice
3. **Use cluster-robust SEs** (entity level)
4. **Include time fixed effects** when possible
5. **Check sensitivity** to specification choices

In [None]:
# === Comprehensive Model Comparison Summary ===
fig, axes = plt.subplots(1, 2, figsize=(16, 7))

# --- Panel A: All three models side by side ---
x = np.arange(len(var_names))
w = 0.25

axes[0].bar(x - w, pooled_table.loc[1:, 'Coefficient'].values, w,
            label='Pooled', alpha=0.8, color='gray', edgecolor='black')
axes[0].bar(x, fe_table['Coefficient'].values, w,
            label='Fixed Effects', alpha=0.8, color='steelblue', edgecolor='black')
axes[0].bar(x + w, re_table['Coefficient'].values, w,
            label='Random Effects', alpha=0.8, color='coral', edgecolor='black')

axes[0].axhline(y=0, color='black', linewidth=0.5)
axes[0].set_ylabel('Coefficient', fontsize=12)
axes[0].set_title('Model Comparison: All Approaches', fontsize=13, fontweight='bold')
axes[0].set_xticks(x)
axes[0].set_xticklabels(var_names, rotation=30, ha='right', fontsize=10)
axes[0].legend(fontsize=10)
axes[0].grid(axis='y', alpha=0.3)

# --- Panel B: Decision flowchart as annotated text ---
axes[1].set_xlim(0, 10)
axes[1].set_ylim(0, 10)
axes[1].axis('off')
axes[1].set_title('Decision Framework', fontsize=13, fontweight='bold')

# Draw decision flowchart
box_style = dict(boxstyle='round,pad=0.5', facecolor='lightblue', edgecolor='black', linewidth=1.5)
decision_style = dict(boxstyle='round,pad=0.5', facecolor='lightyellow', edgecolor='black', linewidth=1.5)
result_style_fe = dict(boxstyle='round,pad=0.5', facecolor='#aed6f1', edgecolor='black', linewidth=2)
result_style_re = dict(boxstyle='round,pad=0.5', facecolor='#f9e79f', edgecolor='black', linewidth=2)

axes[1].text(5, 9, 'Panel Count Data', ha='center', va='center', fontsize=12, fontweight='bold', bbox=box_style)
axes[1].annotate('', xy=(5, 8.2), xytext=(5, 8.6), arrowprops=dict(arrowstyle='->', color='black', lw=2))

axes[1].text(5, 7.5, 'Suspect Heterogeneity?', ha='center', va='center', fontsize=11, bbox=decision_style)
axes[1].annotate('', xy=(5, 6.7), xytext=(5, 7.0), arrowprops=dict(arrowstyle='->', color='black', lw=2))
axes[1].text(5.3, 6.85, 'Yes', fontsize=10, color='green', fontweight='bold')

axes[1].text(5, 6.0, 'Run FE and RE', ha='center', va='center', fontsize=11, bbox=box_style)
axes[1].annotate('', xy=(5, 5.2), xytext=(5, 5.5), arrowprops=dict(arrowstyle='->', color='black', lw=2))

axes[1].text(5, 4.5, 'Hausman Test', ha='center', va='center', fontsize=11, fontweight='bold', bbox=decision_style)
axes[1].annotate('', xy=(3, 3.2), xytext=(4.2, 4.0), arrowprops=dict(arrowstyle='->', color='red', lw=2))
axes[1].annotate('', xy=(7, 3.2), xytext=(5.8, 4.0), arrowprops=dict(arrowstyle='->', color='green', lw=2))

axes[1].text(3.5, 3.7, 'Reject H0', fontsize=10, color='red', fontweight='bold')
axes[1].text(6.5, 3.7, 'Fail to reject', fontsize=10, color='green', fontweight='bold')

axes[1].text(3, 2.5, 'Use Fixed Effects', ha='center', va='center', fontsize=12, fontweight='bold', bbox=result_style_fe)
axes[1].text(7, 2.5, 'Use Random Effects', ha='center', va='center', fontsize=12, fontweight='bold', bbox=result_style_re)

axes[1].text(3, 1.3, '(Robust, causal)', ha='center', va='center', fontsize=10, color='gray')
axes[1].text(7, 1.3, '(More efficient)', ha='center', va='center', fontsize=10, color='gray')

plt.tight_layout()
plt.savefig(FIGURES_PATH / 'model_comparison_summary.png', dpi=300, bbox_inches='tight')
plt.show()

In [None]:
# === Decision flowchart (standalone) ===
fig, ax = plt.subplots(figsize=(8, 10))
ax.set_xlim(0, 10)
ax.set_ylim(0, 12)
ax.axis('off')
ax.set_title('FE vs RE Decision Flowchart for Count Models', fontsize=14, fontweight='bold', pad=10)

step_style = dict(boxstyle='round,pad=0.5', facecolor='#d5f5e3', edgecolor='black', linewidth=1.5)
q_style = dict(boxstyle='round,pad=0.5', facecolor='#fdebd0', edgecolor='black', linewidth=1.5)
fe_style = dict(boxstyle='round,pad=0.6', facecolor='#aed6f1', edgecolor='#2980b9', linewidth=2)
re_style = dict(boxstyle='round,pad=0.6', facecolor='#f9e79f', edgecolor='#d4ac0d', linewidth=2)

# Step 1
ax.text(5, 11, '1. Panel Count Data', ha='center', fontsize=12, fontweight='bold', bbox=step_style)
ax.annotate('', xy=(5, 10.2), xytext=(5, 10.5), arrowprops=dict(arrowstyle='->', lw=2))

# Step 2
ax.text(5, 9.5, '2. Fit Pooled Poisson', ha='center', fontsize=11, bbox=step_style)
ax.annotate('', xy=(5, 8.7), xytext=(5, 9.0), arrowprops=dict(arrowstyle='->', lw=2))

# Step 3
ax.text(5, 8.0, '3. Compare Pooled vs FE\n(significant difference?)', ha='center', fontsize=11, bbox=q_style)
ax.annotate('', xy=(5, 7.0), xytext=(5, 7.3), arrowprops=dict(arrowstyle='->', lw=2))
ax.text(5.3, 7.15, 'Yes', fontsize=10, color='green', fontweight='bold')

# Step 4
ax.text(5, 6.3, '4. Fit FE and RE Poisson', ha='center', fontsize=11, bbox=step_style)
ax.annotate('', xy=(5, 5.5), xytext=(5, 5.8), arrowprops=dict(arrowstyle='->', lw=2))

# Step 5 - Hausman
ax.text(5, 4.8, '5. Hausman Test\nH0: RE consistent', ha='center', fontsize=11, fontweight='bold', bbox=q_style)

# Branches
ax.annotate('', xy=(2.5, 3.2), xytext=(3.8, 4.2), arrowprops=dict(arrowstyle='->', color='red', lw=2))
ax.annotate('', xy=(7.5, 3.2), xytext=(6.2, 4.2), arrowprops=dict(arrowstyle='->', color='green', lw=2))

ax.text(2.8, 3.8, 'Reject H0\n(p < 0.05)', fontsize=9, color='red', fontweight='bold', ha='center')
ax.text(7.2, 3.8, 'Fail to reject\n(p >= 0.05)', fontsize=9, color='green', fontweight='bold', ha='center')

# Results
ax.text(2.5, 2.5, 'Use Fixed Effects', ha='center', fontsize=12, fontweight='bold', bbox=fe_style)
ax.text(7.5, 2.5, 'Use Random Effects', ha='center', fontsize=12, fontweight='bold', bbox=re_style)

ax.text(2.5, 1.5, 'Consistent under\ncorrelated effects', ha='center', fontsize=9, color='gray')
ax.text(7.5, 1.5, 'More efficient if\nassumptions hold', ha='center', fontsize=9, color='gray')

# Add tip
tip_style = dict(boxstyle='round,pad=0.5', facecolor='#fadbd8', edgecolor='#e74c3c', linewidth=1.5)
ax.text(5, 0.5, 'Tip: For policy evaluation, FE is often safer regardless of test result',
        ha='center', fontsize=10, style='italic', bbox=tip_style)

plt.savefig(FIGURES_PATH / 'decision_flowchart_fe_re.png', dpi=300, bbox_inches='tight')
plt.show()

## 7. Summary {#7-takeaways}

### Key Takeaways

1. **Panel count data**: Unobserved heterogeneity ($\alpha_i$) is common and must be addressed
2. **FE Poisson via conditional MLE**: Gold standard for policy evaluation — eliminates $\alpha_i$ without estimation
3. **RE Poisson**: More efficient if $\alpha_i \perp X_{it}$ — uses both within and between variation
4. **Hausman test**: Formally guides the FE vs RE choice
5. **Within-variation identifies causal effects**: Changes within a city over time

### Our Findings

From the city crime analysis:
- **Unemployment** positively affects crime (within-city)
- **Police presence** reduces crime — the FE estimate reveals the true protective effect
- **Median income** has a small protective effect
- The Hausman test guides the FE vs RE choice for this data

### PanelBox Workflow

```python
# FE Poisson (with entity dummies for large panels)
from panelbox.models.count import PooledPoisson, PoissonFixedEffects

# For small panels: true conditional MLE
fe_model = PoissonFixedEffects(y, X, entity_id=cities, time_id=years)
fe_result = fe_model.fit()

# For large panels: dummies approach (equivalent for Poisson)
fe_model = PooledPoisson(y, X_with_dummies, entity_id=cities)
fe_result = fe_model.fit(se_type='cluster')

# RE Poisson (Gamma)
from panelbox.models.count import RandomEffectsPoisson
re_model = RandomEffectsPoisson(y, X, entity_id=cities)
re_result = re_model.fit(distribution='gamma')
```

### Next Steps

- **Tutorial 04**: PPML for gravity models (high-dimensional FE)
- **Tutorial 05**: Zero-inflated models (ZIP/ZINB)
- **Tutorial 06**: Marginal effects in count models

---

## References

- Hausman, J., Hall, B. H., & Griliches, Z. (1984). Econometric models for count data with an application to the patents-R&D relationship. *Econometrica*, 52(4), 909-938.
- Wooldridge, J. M. (2010). *Econometric Analysis of Cross Section and Panel Data* (2nd ed.). MIT Press. Chapter 19.
- Cameron, A. C., & Trivedi, P. K. (2013). *Regression Analysis of Count Data* (2nd ed.). Cambridge University Press.
- Hausman, J. A. (1978). Specification tests in econometrics. *Econometrica*, 46(6), 1251-1271.

---

**Congratulations!** You now understand how to handle panel structure in count data models using Fixed and Random Effects.