# Marginal Effects in Count Models
## Computing and Interpreting Marginal Effects Correctly

### Learning Objectives

1. Understand why coefficients $\neq$ marginal effects in nonlinear models
2. Distinguish between IRRs and marginal effects
3. Compute Average Marginal Effects (AME) for count models
4. Compute Marginal Effects at Means (MEM) and at Representative Values (MER)
5. Handle discrete changes for binary variables
6. Compute standard errors via the delta method
7. Visualize marginal effects and their heterogeneity
8. Apply to all count models: Poisson, NB, ZIP, ZINB

### Duration
65 minutes

### Prerequisites
- Count models (Notebooks 01–05)
- Understanding of derivatives and calculus
- Delta method (introduced here)

### Dataset
**Policy Impact** (`policy_impact.csv`): Evaluating the effect of a policy intervention on a count outcome.
- N = 1,200 individuals
- Outcome: `outcome_count` (0–30 events)
- Treatment: `policy` (binary, 0/1)
- Controls: `income`, `age`, `education`, `female`, `urban`

In [None]:
# Standard libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from pathlib import Path
import warnings
warnings.filterwarnings('ignore')

# Statistical libraries
from scipy import stats

# PanelBox imports
import statsmodels.api as sm
from panelbox.models.count import (
    PooledPoisson,
    PoissonFixedEffects,
    NegativeBinomial,
    ZeroInflatedPoisson,
    ZeroInflatedNegativeBinomial,
)
from panelbox.marginal_effects.count_me import (
    compute_poisson_ame,
    compute_poisson_mem,
    compute_negbin_ame,
    compute_negbin_mem,
)

# Visualization configuration
plt.style.use('seaborn-v0_8-whitegrid')
sns.set_palette('husl')
plt.rcParams['figure.figsize'] = (10, 6)
plt.rcParams['font.size'] = 11
pd.set_option('display.max_columns', None)
pd.set_option('display.precision', 4)

np.random.seed(42)

# Paths (relative to notebook location in examples/count/notebooks/)
BASE_DIR = Path('..')
DATA_DIR = BASE_DIR / 'data'
OUTPUT_DIR = BASE_DIR / 'outputs'
FIGURES_DIR = OUTPUT_DIR / 'figures' / '06_marginal_effects'
TABLES_DIR = OUTPUT_DIR / 'tables' / '06_marginal_effects'

FIGURES_DIR.mkdir(parents=True, exist_ok=True)
TABLES_DIR.mkdir(parents=True, exist_ok=True)

print('Setup complete!')
print(f'Data directory: {DATA_DIR}')
print(f'Output directory: {OUTPUT_DIR}')

In [None]:
# Load policy impact data (1,200 individuals)
df = pd.read_csv(DATA_DIR / 'policy_impact.csv')

print('Dataset shape:', df.shape)
print('\nFirst few rows:')
display(df.head(10))

print('\nVariable types:')
print(df.dtypes)

print('\nSummary statistics:')
display(df.describe())

print(f'\nOutcome range: {df["outcome_count"].min()} to {df["outcome_count"].max()}')
print(f'Policy proportion: {df["policy"].mean():.2%}')

---

## Section 1: Coefficients vs Marginal Effects (12 min)

### The Fundamental Difference

**Linear Model (OLS):**

$$E[y \mid X] = X'\beta \quad \Rightarrow \quad \frac{\partial E[y]}{\partial X_k} = \beta_k \quad \text{(constant, same for everyone)}$$

**Poisson Model:**

$$E[y \mid X] = \exp(X'\beta) \quad \Rightarrow \quad \frac{\partial E[y]}{\partial X_k} = \beta_k \times \exp(X'\beta) = \beta_k \times \lambda \quad \text{(varies with } X\text{!)}$$

### Key Insight

- In **linear models**: $\beta$ is the marginal effect
- In **nonlinear models**: $\beta$ is **NOT** the marginal effect
- The marginal effect depends on the level of $X$ — it is **heterogeneous**

Let’s demonstrate this concretely.

In [None]:
# Fit a Poisson model
y = df['outcome_count'].values
var_names = ['const', 'policy', 'age', 'education', 'income', 'female', 'urban']
X_raw = df[['policy', 'age', 'education', 'income', 'female', 'urban']].values
X = sm.add_constant(X_raw)

poisson_model = PooledPoisson(endog=y, exog=X)
poisson_result = poisson_model.fit(se_type='robust')

print('Poisson Regression Coefficients')
print('=' * 60)
coef_df = pd.DataFrame({
    'Variable': var_names,
    'Coefficient (\u03b2)': poisson_result.params,
    'Std. Error': poisson_result.se,
    'p-value': poisson_result.pvalues,
})
display(coef_df)

In [None]:
# Demonstrate that ME varies across observations
# ME_k = beta_k * exp(X'beta) = beta_k * lambda

params = poisson_result.params
lambda_hat = np.exp(X @ params)  # Predicted rate for each person

# Focus on the 'policy' variable (index 1)
beta_policy = params[1]

# ME of policy for each observation
me_policy_all = beta_policy * lambda_hat

print(f'Coefficient of policy: \u03b2 = {beta_policy:.4f} (same for everyone)')
print(f'\nBut the marginal effect (\u03b2 \u00d7 \u03bb) varies:')
print(f'  Min ME:    {me_policy_all.min():.4f}')
print(f'  Mean ME:   {me_policy_all.mean():.4f}')
print(f'  Median ME: {np.median(me_policy_all):.4f}')
print(f'  Max ME:    {me_policy_all.max():.4f}')
print(f'  Std ME:    {me_policy_all.std():.4f}')
print(f'\n=> The effect of policy ranges from {me_policy_all.min():.2f} to '
      f'{me_policy_all.max():.2f} additional events!')

In [None]:
# Visualize: ME varies with predicted rate (lambda)
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Left: ME vs predicted lambda
sort_idx = np.argsort(lambda_hat)
axes[0].scatter(lambda_hat[sort_idx], me_policy_all[sort_idx], alpha=0.3, s=15)
axes[0].set_xlabel(r'Predicted Rate ($\hat{\lambda}_i$)', fontsize=12)
axes[0].set_ylabel('Marginal Effect of Policy', fontsize=12)
axes[0].set_title(r'ME = $\beta_{policy} \times \hat{\lambda}_i$' + '\n(Varies with predicted rate)',
                  fontsize=13, fontweight='bold')
axes[0].axhline(me_policy_all.mean(), color='red', linestyle='--', linewidth=2,
                label=f'AME = {me_policy_all.mean():.2f}')
axes[0].legend(fontsize=11)
axes[0].grid(alpha=0.3)

# Right: distribution of individual MEs
axes[1].hist(me_policy_all, bins=40, alpha=0.7, edgecolor='black', color='steelblue')
axes[1].axvline(me_policy_all.mean(), color='red', linestyle='--', linewidth=2,
                label=f'AME = {me_policy_all.mean():.2f}')
axes[1].axvline(beta_policy, color='green', linestyle=':', linewidth=2,
                label=f'\u03b2 = {beta_policy:.4f}')
axes[1].set_xlabel('Marginal Effect of Policy', fontsize=12)
axes[1].set_ylabel('Frequency', fontsize=12)
axes[1].set_title('Distribution of Individual MEs\n(\u03b2 is NOT the marginal effect!)',
                  fontsize=13, fontweight='bold')
axes[1].legend(fontsize=11)
axes[1].grid(alpha=0.3)

plt.tight_layout()
plt.savefig(FIGURES_DIR / 'me_vs_x_plot.png', dpi=300, bbox_inches='tight')
plt.show()

*Figure: Left panel shows the marginal effect of policy varies linearly with the predicted rate. Right panel shows the distribution of individual marginal effects — the coefficient $\beta$ (green line) is far from the average marginal effect (red line) because it ignores the scaling by $\lambda$.*

In [None]:
# Compare: beta, IRR, and ME at different covariate levels
quantiles = [0.10, 0.25, 0.50, 0.75, 0.90]
lambda_quantiles = np.quantile(lambda_hat, quantiles)

comparison_rows = []
for q, lam_q in zip(quantiles, lambda_quantiles):
    comparison_rows.append({
        'Quantile': f'{q:.0%}',
        '\u03bb (predicted rate)': f'{lam_q:.2f}',
        '\u03b2 (coefficient)': f'{beta_policy:.4f}',
        'IRR = exp(\u03b2)': f'{np.exp(beta_policy):.4f}',
        'ME = \u03b2 \u00d7 \u03bb': f'{beta_policy * lam_q:.4f}',
    })

table_01 = pd.DataFrame(comparison_rows)
print('Table 1: \u03b2 vs IRR vs ME at Different \u03bb Levels')
print('=' * 80)
display(table_01)
print('\n\u2022 \u03b2: Same for everyone (log-linear coefficient)')
print('\u2022 IRR: Same for everyone (multiplicative effect)')
print('\u2022 ME: VARIES \u2014 higher \u03bb means larger absolute effect')

table_01.to_csv(TABLES_DIR / 'table_01_beta_vs_me.csv', index=False)

---

## Section 2: Incidence Rate Ratios (IRR) Revisited (10 min)

### When IRRs are Useful

Recall that the IRR $= \exp(\beta_k)$ gives a **multiplicative** interpretation:

- IRR = 1.20 $\Rightarrow$ 20% increase in expected count
- **Constant across all levels of $X$** (relative effect)
- Good for relative comparisons

### Limitations of IRRs

- Does not tell the **absolute magnitude** of the effect
- Example with IRR = 1.20:
  - Baseline $\lambda = 2$ $\Rightarrow$ $1.20 \times 2 = 2.4$ (increase of **0.4 events**)
  - Baseline $\lambda = 10$ $\Rightarrow$ $1.20 \times 10 = 12$ (increase of **2 events**)
- **Policy decisions** often require absolute numbers, not percentages

In [None]:
# Compute IRRs for all variables
irr_values = np.exp(params)
irr_ci_low = np.exp(poisson_result.conf_int_lower)
irr_ci_high = np.exp(poisson_result.conf_int_upper)

irr_table = pd.DataFrame({
    'Variable': var_names,
    '\u03b2': params,
    'IRR': irr_values,
    '% Change': (irr_values - 1) * 100,
    'IRR CI Low': irr_ci_low,
    'IRR CI High': irr_ci_high,
})

print('Incidence Rate Ratios')
print('=' * 80)
display(irr_table)

In [None]:
# Show the limitation: same IRR, very different absolute impacts
irr_policy = np.exp(beta_policy)

# Compute absolute change for each person: lambda_after - lambda_before
abs_change = (irr_policy - 1) * lambda_hat

limitation_rows = []
for label, mask in [('Control group (policy=0)', df['policy'] == 0),
                    ('Treated group (policy=1)', df['policy'] == 1),
                    ('Low income (Q1)', df['income'] <= df['income'].quantile(0.25)),
                    ('High income (Q4)', df['income'] >= df['income'].quantile(0.75))]:
    limitation_rows.append({
        'Subgroup': label,
        'IRR': f'{irr_policy:.4f}',
        'Mean \u03bb': f'{lambda_hat[mask].mean():.2f}',
        'Mean Absolute Change': f'{abs_change[mask].mean():.2f}',
    })

table_02 = pd.DataFrame(limitation_rows)
print('Table 2: Same IRR, Different Absolute Impacts')
print('=' * 70)
display(table_02)
print(f'\nIRR of policy = {irr_policy:.4f} (constant for everyone)')
print('But the absolute change varies across subgroups!')
print('=> We need MARGINAL EFFECTS for absolute interpretation.')

table_02.to_csv(TABLES_DIR / 'table_02_irr_limitations.csv', index=False)

---

## Section 3: Average Marginal Effects (AME) (15 min)

### Definition

**For a continuous variable $X_k$:**

$$\text{AME}_k = \frac{1}{N} \sum_{i=1}^{N} \frac{\partial E[y_i \mid X_i]}{\partial X_k} = \frac{1}{N} \sum_{i=1}^{N} \beta_k \times \exp(X_i'\beta) = \beta_k \times \bar{\hat{\lambda}}$$

**For a binary variable $d$ (discrete change):**

$$\text{AME}_d = \frac{1}{N} \sum_{i=1}^{N} \left[ E[y_i \mid X_i, d=1] - E[y_i \mid X_i, d=0] \right] = \frac{1}{N} \sum_{i=1}^{N} \left[ \exp(X_i'\beta + \beta_d) - \exp(X_i'\beta) \right]$$

### Interpretation

- **Units**: Same as $y$ (count units)
- **AME = 2.3** means: on average across all observations, the variable increases the expected count by 2.3 events

### Standard Errors via Delta Method

The delta method approximates the variance of a nonlinear transformation of an estimator:

$$\text{Var}[g(\hat{\beta})] \approx \nabla g(\hat{\beta})' \times \text{Var}[\hat{\beta}] \times \nabla g(\hat{\beta})$$

This is computed automatically by PanelBox.

In [None]:
# Step 1: Compute AME manually to understand the mechanics
print('Manual AME Computation')
print('=' * 60)

# For each variable, AME_k = beta_k * mean(lambda_hat)
mean_lambda = lambda_hat.mean()
print(f'Mean predicted rate: {mean_lambda:.4f}\n')

manual_ame = {}
for i, var in enumerate(var_names[1:], start=1):  # Skip intercept
    ame_k = params[i] * mean_lambda
    manual_ame[var] = ame_k
    print(f'  AME({var:>12s}) = {params[i]:.4f} \u00d7 {mean_lambda:.4f} = {ame_k:.4f}')

print(f'\nInterpretation: Policy increases the expected count by '
      f'{manual_ame["policy"]:.2f} events on average.')

In [None]:
# Step 2: Compute AME using PanelBox (with delta method SEs)
poisson_model.exog_names = var_names

ame_result = compute_poisson_ame(poisson_result, varlist=var_names[1:])

# Display summary
ame_summary = ame_result.summary()
display(ame_summary)

In [None]:
# Build a comprehensive AME results table
ci = ame_result.conf_int()


def add_stars(p):
    """Convert p-value to significance stars."""
    if p < 0.001: return '***'
    elif p < 0.01: return '**'
    elif p < 0.05: return '*'
    elif p < 0.1: return '.'
    else: return ''


table_03 = pd.DataFrame({
    'Variable': ame_result.marginal_effects.index,
    'AME': ame_result.marginal_effects.values,
    'Std. Error': ame_result.std_errors.values,
    'z': ame_result.z_stats.values,
    'P>|z|': ame_result.pvalues.values,
    'CI Lower': ci['lower'].values,
    'CI Upper': ci['upper'].values,
})
table_03['Sig'] = table_03['P>|z|'].apply(add_stars)

print('Table 3: Average Marginal Effects (AME)')
print('=' * 80)
display(table_03)
print('\nSignificance: *** p<0.001, ** p<0.01, * p<0.05, . p<0.1')

table_03.to_csv(TABLES_DIR / 'table_03_ame_results.csv', index=False)

In [None]:
# Validate delta method SEs: manual computation vs PanelBox
# The delta method approximation: SE(AME_k) = sqrt(g'Σg)
# where g = gradient of AME w.r.t. β, Σ = Var(β̂)

# Manual delta method for 'policy' (var_idx=1)
cov_matrix = poisson_result.cov_params if hasattr(poisson_result, 'cov_params') else poisson_result.vcov

# Gradient: ∂AME_k/∂β_j
n_obs = X.shape[0]
beta_policy_val = params[1]
gradient_manual = np.zeros(len(params))

for j in range(len(params)):
    if j == 1:  # policy (var_idx)
        gradient_manual[j] = np.mean(lambda_hat * (1 + beta_policy_val * X[:, 1]))
    else:
        gradient_manual[j] = np.mean(beta_policy_val * X[:, j] * lambda_hat)

# Delta method SE
se_manual = np.sqrt(gradient_manual @ cov_matrix @ gradient_manual)
se_panelbox = ame_result.std_errors['policy']

print('Delta Method SE Validation')
print('=' * 50)
print(f'Manual SE(AME_policy):   {se_manual:.6f}')
print(f'PanelBox SE(AME_policy): {se_panelbox:.6f}')
print(f'Match: {np.isclose(se_manual, se_panelbox, rtol=1e-6)}')
print(f'\nThe delta method formula:')
print(f'  SE = sqrt(g\' Σ_β g)')
print(f'  where g = ∇AME (gradient vector, length {len(gradient_manual)})')
print(f'  and Σ_β = Var(β̂) ({cov_matrix.shape[0]}×{cov_matrix.shape[1]} matrix)')

In [None]:
# Create forest plot of AMEs with confidence intervals
fig, ax = plt.subplots(figsize=(10, 6))

plot_data = table_03.copy()
plot_data = plot_data.sort_values('AME', key=abs)

y_pos = np.arange(len(plot_data))

ax.errorbar(
    plot_data['AME'], y_pos,
    xerr=[plot_data['AME'] - plot_data['CI Lower'],
          plot_data['CI Upper'] - plot_data['AME']],
    fmt='o', markersize=8, capsize=5, capthick=2, linewidth=2,
    color='steelblue'
)

ax.axvline(0, color='red', linestyle='--', linewidth=2, label='No Effect (AME=0)')
ax.set_yticks(y_pos)
ax.set_yticklabels(plot_data['Variable'])
ax.set_xlabel('Average Marginal Effect (count units)', fontsize=12)
ax.set_title('Average Marginal Effects with 95% CIs\n(Poisson Model)', fontsize=14,
             fontweight='bold')
ax.legend(fontsize=11)
ax.grid(alpha=0.3, axis='x')

plt.tight_layout()
plt.savefig(FIGURES_DIR / 'ame_plot.png', dpi=300, bbox_inches='tight')
plt.show()

*Figure: Forest plot showing AMEs for all covariates. The policy variable shows the largest positive effect, followed by urban and education. Female has a negative effect. Error bars represent 95% confidence intervals; CIs that do not cross zero are statistically significant.*

In [None]:
# Discrete change for the binary policy variable
# AME_d = (1/N) * sum[ exp(X'b + b_d) - exp(X'b) ]

# Set policy=0 for everyone, then policy=1 for everyone
X_d0 = X.copy()
X_d1 = X.copy()
policy_idx = 1  # Index of policy in X
X_d0[:, policy_idx] = 0
X_d1[:, policy_idx] = 1

lambda_d0 = np.exp(X_d0 @ params)
lambda_d1 = np.exp(X_d1 @ params)

discrete_change = lambda_d1 - lambda_d0
ame_discrete = discrete_change.mean()

print('Discrete Change for Binary Policy Variable')
print('=' * 60)
print(f'Mean E[y | policy=0]: {lambda_d0.mean():.4f}')
print(f'Mean E[y | policy=1]: {lambda_d1.mean():.4f}')
print(f'\nDiscrete change AME: {ame_discrete:.4f}')
print(f'\nInterpretation: Switching policy from 0 to 1 increases the')
print(f'expected count by {ame_discrete:.2f} events on average.')
print(f'\nNote: This differs slightly from the continuous ME ({manual_ame["policy"]:.4f})')
print(f'because discrete change uses the exact difference, not the derivative.')

---

## Section 4: Marginal Effects at Means (MEM) and Representative Values (15 min)

### MEM Definition

$$\text{MEM}_k = \beta_k \times \exp(\bar{X}'\beta)$$

Evaluate the marginal effect at the **sample means** $\bar{X}$ of all covariates.

### Pros and Cons

| | AME | MEM |
|---|---|---|
| **Definition** | Average of individual MEs | ME at the average person |
| **Pros** | Represents actual population | Simple, one number |
| **Cons** | More complex computation | “Average person” may not exist |
| **When to use** | Typical reporting choice | Quick summary |

### MER (Marginal Effects at Representative values)

Choose **meaningful values** (e.g., medians, quartiles, policy-relevant profiles) rather than means. Often more interpretable.

In [None]:
# Compute MEM using PanelBox
mem_result = compute_poisson_mem(poisson_result, varlist=var_names[1:])

mem_summary = mem_result.summary()
display(mem_summary)

In [None]:
# Build MEM results table
ci_mem = mem_result.conf_int()

table_04 = pd.DataFrame({
    'Variable': mem_result.marginal_effects.index,
    'MEM': mem_result.marginal_effects.values,
    'Std. Error': mem_result.std_errors.values,
    'z': mem_result.z_stats.values,
    'P>|z|': mem_result.pvalues.values,
    'CI Lower': ci_mem['lower'].values,
    'CI Upper': ci_mem['upper'].values,
})
table_04['Sig'] = table_04['P>|z|'].apply(add_stars)

print('Table 4: Marginal Effects at Means (MEM)')
print('=' * 80)
display(table_04)

table_04.to_csv(TABLES_DIR / 'table_04_mem_results.csv', index=False)

In [None]:
# Compare AME vs MEM
table_05 = pd.DataFrame({
    'Variable': ame_result.marginal_effects.index,
    'AME': ame_result.marginal_effects.values,
    'SE(AME)': ame_result.std_errors.values,
    'MEM': mem_result.marginal_effects.values,
    'SE(MEM)': mem_result.std_errors.values,
    'Ratio (AME/MEM)': ame_result.marginal_effects.values / mem_result.marginal_effects.values,
})

print('Table 5: AME vs MEM Comparison')
print('=' * 80)
display(table_05)
print('\nRatio close to 1.0 means AME and MEM are similar.')
print('Differences arise when covariate distributions are skewed or heterogeneous.')

table_05.to_csv(TABLES_DIR / 'table_05_ame_vs_mem.csv', index=False)

In [None]:
# MER: Marginal Effects at Representative values (specific profiles)
profiles = {
    'Young, low income, no policy': {
        'policy': 0, 'age': 30, 'education': 12, 'income': 25, 'female': 0, 'urban': 0
    },
    'Young, low income, with policy': {
        'policy': 1, 'age': 30, 'education': 12, 'income': 25, 'female': 0, 'urban': 0
    },
    'Middle-aged, median, no policy': {
        'policy': 0, 'age': 45, 'education': 14, 'income': 50, 'female': 0, 'urban': 1
    },
    'Middle-aged, median, with policy': {
        'policy': 1, 'age': 45, 'education': 14, 'income': 50, 'female': 0, 'urban': 1
    },
    'Older, high income, no policy': {
        'policy': 0, 'age': 60, 'education': 18, 'income': 100, 'female': 0, 'urban': 1
    },
    'Older, high income, with policy': {
        'policy': 1, 'age': 60, 'education': 18, 'income': 100, 'female': 0, 'urban': 1
    },
}

mer_rows = []
for label, values in profiles.items():
    x_profile = np.array([1] + [values[v] for v in var_names[1:]])
    lambda_profile = np.exp(x_profile @ params)
    me_policy_profile = beta_policy * lambda_profile
    mer_rows.append({
        'Profile': label,
        'E[y]': f'{lambda_profile:.2f}',
        'ME(policy)': f'{me_policy_profile:.4f}',
    })

table_06 = pd.DataFrame(mer_rows)

print('Table 6: Marginal Effects at Representative Values (MER)')
print('=' * 70)
display(table_06)
print('\nThe ME of policy is larger for individuals with higher baseline E[y].')
print('This demonstrates the heterogeneity of marginal effects in nonlinear models.')

table_06.to_csv(TABLES_DIR / 'table_06_mer_profiles.csv', index=False)

---

## Section 5: Visualizing Marginal Effect Heterogeneity (10 min)

A key advantage of computing marginal effects in nonlinear models is that we can examine **how effects vary** across covariate levels. This is critical for understanding who benefits most from a policy intervention.

In [None]:
# ME of policy as a function of income
# Fix other variables at their means, vary income
income_grid = np.linspace(df['income'].min(), df['income'].max(), 50)

# Mean values for other variables
x_means = {
    'policy': 1,
    'age': df['age'].mean(),
    'education': df['education'].mean(),
    'female': 0,
    'urban': 1,
}

me_by_income = []
for inc in income_grid:
    x_temp = np.array([1, x_means['policy'], x_means['age'], x_means['education'],
                       inc, x_means['female'], x_means['urban']])
    lam = np.exp(x_temp @ params)
    me = beta_policy * lam
    me_by_income.append(me)

me_by_income = np.array(me_by_income)

fig, ax = plt.subplots(figsize=(10, 6))
ax.plot(income_grid, me_by_income, 'b-', linewidth=2.5)
ax.fill_between(income_grid, me_by_income * 0.85, me_by_income * 1.15,
                alpha=0.15, color='blue')
ax.axhline(ame_result.marginal_effects['policy'], color='red', linestyle='--',
           linewidth=2, label=f'AME = {ame_result.marginal_effects["policy"]:.2f}')
ax.set_xlabel('Income ($1,000s)', fontsize=12)
ax.set_ylabel('Marginal Effect of Policy', fontsize=12)
ax.set_title('How the Policy Effect Varies with Income\n(Higher income \u2192 larger absolute effect)',
             fontsize=14, fontweight='bold')
ax.legend(fontsize=11)
ax.grid(alpha=0.3)

plt.tight_layout()
plt.savefig(FIGURES_DIR / 'me_by_income.png', dpi=300, bbox_inches='tight')
plt.show()

*Figure: The marginal effect of policy increases with income level. The AME (red dashed line) represents the average across the income distribution, but the actual effect is smaller for low-income individuals and larger for high-income individuals.*

In [None]:
# ME of policy by age quartiles
df['age_quartile'] = pd.qcut(df['age'], 4, labels=['Q1 (young)', 'Q2', 'Q3', 'Q4 (old)'])

me_by_age_group = []
for q in ['Q1 (young)', 'Q2', 'Q3', 'Q4 (old)']:
    mask = df['age_quartile'] == q
    mean_me = me_policy_all[mask].mean()
    se_me = me_policy_all[mask].std() / np.sqrt(mask.sum())
    me_by_age_group.append({
        'Age Group': q,
        'Mean ME': mean_me,
        'SE': se_me,
        'CI Lower': mean_me - 1.96 * se_me,
        'CI Upper': mean_me + 1.96 * se_me,
        'N': mask.sum(),
    })

me_age_df = pd.DataFrame(me_by_age_group)

fig, ax = plt.subplots(figsize=(10, 6))
x_pos = np.arange(len(me_age_df))
bars = ax.bar(x_pos, me_age_df['Mean ME'], yerr=1.96 * me_age_df['SE'],
              capsize=5, alpha=0.7, color=['#4393C3', '#92C5DE', '#F4A582', '#D6604D'],
              edgecolor='black')

ax.axhline(me_policy_all.mean(), color='red', linestyle='--', linewidth=2,
           label=f'Overall AME = {me_policy_all.mean():.2f}')
ax.set_xticks(x_pos)
ax.set_xticklabels(me_age_df['Age Group'])
ax.set_xlabel('Age Group', fontsize=12)
ax.set_ylabel('Marginal Effect of Policy', fontsize=12)
ax.set_title('Policy Effect by Age Group\n(Older individuals experience larger absolute effects)',
             fontsize=14, fontweight='bold')
ax.legend(fontsize=11)
ax.grid(alpha=0.3, axis='y')

plt.tight_layout()
plt.savefig(FIGURES_DIR / 'me_by_age_groups.png', dpi=300, bbox_inches='tight')
plt.show()

*Figure: Bar chart showing the average ME of policy within each age quartile. The effect increases monotonically with age, indicating that the policy has a larger absolute impact on older individuals. Error bars represent 95% confidence intervals.*

In [None]:
# Heterogeneity summary table
df['income_quartile'] = pd.qcut(df['income'], 4, labels=['Q1', 'Q2', 'Q3', 'Q4'])
df['education_group'] = pd.cut(df['education'], bins=[7, 12, 16, 21],
                               labels=['Low (8-12)', 'Medium (13-16)', 'High (17-20)'])

het_rows = []
for group_var, group_name in [('age_quartile', 'Age'),
                               ('income_quartile', 'Income'),
                               ('education_group', 'Education'),
                               ('female', 'Female'),
                               ('urban', 'Urban')]:
    for val in sorted(df[group_var].dropna().unique(), key=str):
        mask = df[group_var] == val
        if mask.sum() < 10:
            continue
        het_rows.append({
            'Group': group_name,
            'Value': str(val),
            'N': mask.sum(),
            'Mean ME': me_policy_all[mask].mean(),
            'Std ME': me_policy_all[mask].std(),
        })

table_07 = pd.DataFrame(het_rows)

print('Table 7: Marginal Effect Heterogeneity')
print('=' * 70)
display(table_07)

table_07.to_csv(TABLES_DIR / 'table_07_me_heterogeneity.csv', index=False)

---

## Section 6: Marginal Effects for All Count Models (8 min)

### Model-Specific Considerations

| Model | Mean Function | ME Formula | Notes |
|-------|--------------|------------|-------|
| **Poisson** | $\mu = \exp(X'\beta)$ | $\text{ME}_k = \beta_k \times \mu$ | Baseline |
| **Negative Binomial** | $\mu = \exp(X'\beta)$ | $\text{ME}_k = \beta_k \times \mu$ | Same formula! |
| **FE Poisson** | $\mu = \alpha_i \exp(X'\beta)$ | $\text{ME}_k = \beta_k \times \mu$ | Within-entity effect |
| **ZIP** | $E[y] = (1-\pi) \times \lambda$ | Combines inflate and count parts | Two components |
| **ZINB** | $E[y] = (1-\pi) \times \lambda$ | Combines inflate and count parts | Two components |

For zero-inflated models, the marginal effect on $E[y]$ decomposes into:

$$\frac{\partial E[y]}{\partial X_k} = -\frac{\partial \pi}{\partial Z_k} \times \lambda + (1-\pi) \times \frac{\partial \lambda}{\partial X_k}$$

In [None]:
# Fit Negative Binomial model
# Use Poisson estimates as starting values for NB optimization
print('Fitting Negative Binomial Model...')
nb_start = np.append(poisson_result.params, np.log(0.1))  # beta + log(alpha)
nb_model = NegativeBinomial(endog=y, exog=X)
nb_result = nb_model.fit(start_params=nb_start)
nb_model.exog_names = var_names

# Compute AME for NB
nb_ame = compute_negbin_ame(nb_result, varlist=var_names[1:])

print('\nNegative Binomial AME:')
nb_ame_summary = nb_ame.summary()
display(nb_ame_summary)

In [None]:
# Fit Zero-Inflated Poisson (ZIP) model
print('Fitting Zero-Inflated Poisson Model...')

X_count = X.copy()
X_inflate = sm.add_constant(df[['income', 'female', 'urban']].values)

zip_model = ZeroInflatedPoisson(
    endog=y,
    exog_count=X_count,
    exog_inflate=X_inflate,
    exog_count_names=var_names,
    exog_inflate_names=['const_z', 'income_z', 'female_z', 'urban_z'],
)
zip_result = zip_model.fit()

print('\nZIP Model Summary:')
print(zip_result.summary(
    count_names=var_names,
    inflate_names=['const_z', 'income_z', 'female_z', 'urban_z'],
))

In [None]:
# Compute marginal effects for ZIP model manually
# For ZIP: E[y] = (1-pi) * lambda
# ME on E[y] combines both parts

params_count = zip_result.params_count
params_inflate = zip_result.params_inflate

# Predicted values from each component
lambda_zip = np.exp(X_count @ params_count)
z_inflate = X_inflate @ params_inflate
pi_zip = 1 / (1 + np.exp(-z_inflate))  # P(structural zero)

# ME of policy on E[y] via count part only (policy not in inflate model)
# dE[y]/d(policy) = (1-pi) * beta_policy * lambda
beta_policy_zip = params_count[1]
me_policy_zip = (1 - pi_zip) * beta_policy_zip * lambda_zip

print('ZIP Model: Marginal Effect of Policy')
print('=' * 50)
print(f'Count part \u03b2(policy): {beta_policy_zip:.4f}')
print(f'Mean P(structural zero): {pi_zip.mean():.4f}')
print(f'Mean \u03bb (count rate): {lambda_zip.mean():.4f}')
print(f'\nME on E[y]: {me_policy_zip.mean():.4f}')
print(f'  = (1 - {pi_zip.mean():.4f}) \u00d7 {beta_policy_zip:.4f} \u00d7 {lambda_zip.mean():.4f}')

In [None]:
# Fit Zero-Inflated Negative Binomial (ZINB) model
print('Fitting Zero-Inflated Negative Binomial Model...')

zinb_model = ZeroInflatedNegativeBinomial(
    endog=y,
    exog_count=X_count,
    exog_inflate=X_inflate,
    exog_count_names=var_names,
    exog_inflate_names=['const_z', 'income_z', 'female_z', 'urban_z'],
)
zinb_result = zinb_model.fit()

# ZINB ME of policy
params_count_zinb = zinb_result.params_count
params_inflate_zinb = zinb_result.params_inflate

lambda_zinb = np.exp(X_count @ params_count_zinb)
z_inflate_zinb = X_inflate @ params_inflate_zinb
pi_zinb = 1 / (1 + np.exp(-z_inflate_zinb))

beta_policy_zinb = params_count_zinb[1]
me_policy_zinb = (1 - pi_zinb) * beta_policy_zinb * lambda_zinb

print(f'\nZINB ME of policy on E[y]: {me_policy_zinb.mean():.4f}')

### Fixed Effects Poisson

For **panel data**, the FE Poisson model (Hausman, Hall, Griliches 1984) controls for unobserved entity heterogeneity using conditional MLE. The marginal effect interpretation focuses on **within-entity** changes:

$$\text{ME}_k = \beta_k \times \exp(X_{it}'\beta)$$

Note: The fixed effect $\alpha_i$ is eliminated via conditioning, so MEs are computed without it. We demonstrate using the `city_crime.csv` panel dataset.

In [None]:
# Fixed Effects Poisson using a small simulated panel
# (FE Poisson conditional MLE requires small counts for tractability)
print('Fitting Fixed Effects Poisson Model...')

np.random.seed(42)
n_entities, n_periods = 50, 5
entity_ids = np.repeat(np.arange(1, n_entities + 1), n_periods)
time_ids = np.tile(np.arange(1, n_periods + 1), n_entities)

# Simulate panel data with entity heterogeneity
alpha_i = np.repeat(np.random.normal(0, 0.3, n_entities), n_periods)
x1 = np.random.normal(0, 1, n_entities * n_periods)
x2 = np.random.normal(0, 1, n_entities * n_periods)

# Small counts (mean ~3) for computational feasibility
lam_fe = np.exp(0.5 + 0.3 * x1 + 0.2 * x2 + alpha_i)
y_fe = np.random.poisson(lam_fe)

fe_var_names = ['const', 'x1', 'x2']
X_fe = sm.add_constant(np.column_stack([x1, x2]))

fe_model = PoissonFixedEffects(
    endog=y_fe, exog=X_fe,
    entity_id=entity_ids, time_id=time_ids,
)
fe_result = fe_model.fit()
fe_model.exog_names = fe_var_names

# Compute AME for FE Poisson
fe_ame = compute_poisson_ame(fe_result, varlist=['x1', 'x2'])

print('\nFE Poisson AME (within-entity effects):')
fe_ame_summary = fe_ame.summary()
display(fe_ame_summary)

print(f'\nTrue DGP: β_x1 = 0.30, β_x2 = 0.20')
print(f'Estimated: β_x1 = {fe_result.params[1]:.4f}, β_x2 = {fe_result.params[2]:.4f}')
print('\nInterpretation: AMEs represent the average within-entity effect,')
print('controlling for unobserved time-invariant heterogeneity (α_i).')

In [None]:
# Comparison table: AME across all models
model_ame_rows = []

# Poisson AME
model_ame_rows.append({
    'Model': 'Poisson',
    'AME(policy)': ame_result.marginal_effects['policy'],
    'SE': ame_result.std_errors['policy'],
    'AME(age)': ame_result.marginal_effects['age'],
    'AME(education)': ame_result.marginal_effects['education'],
    'AME(income)': ame_result.marginal_effects['income'],
})

# NB AME
model_ame_rows.append({
    'Model': 'Negative Binomial',
    'AME(policy)': nb_ame.marginal_effects['policy'],
    'SE': nb_ame.std_errors['policy'],
    'AME(age)': nb_ame.marginal_effects['age'],
    'AME(education)': nb_ame.marginal_effects['education'],
    'AME(income)': nb_ame.marginal_effects['income'],
})

# ZIP AME (manual computation)
model_ame_rows.append({
    'Model': 'ZIP',
    'AME(policy)': me_policy_zip.mean(),
    'SE': np.nan,
    'AME(age)': ((1 - pi_zip) * params_count[2] * lambda_zip).mean(),
    'AME(education)': ((1 - pi_zip) * params_count[3] * lambda_zip).mean(),
    'AME(income)': ((1 - pi_zip) * params_count[4] * lambda_zip).mean(),
})

# ZINB AME
model_ame_rows.append({
    'Model': 'ZINB',
    'AME(policy)': me_policy_zinb.mean(),
    'SE': np.nan,
    'AME(age)': ((1 - pi_zinb) * params_count_zinb[2] * lambda_zinb).mean(),
    'AME(education)': ((1 - pi_zinb) * params_count_zinb[3] * lambda_zinb).mean(),
    'AME(income)': ((1 - pi_zinb) * params_count_zinb[4] * lambda_zinb).mean(),
})

table_08 = pd.DataFrame(model_ame_rows)

print('Table 8: AME Comparison Across Count Models')
print('=' * 80)
display(table_08)
print('\nNote: ZIP and ZINB standard errors require numerical delta method (not shown).')
print('Poisson and NB have the same mean function, so AMEs are similar.')
print('ZIP/ZINB adjust for the probability of structural zeros.')

# Also show FE Poisson AME separately (different data)
print('\n\nFE Poisson AME (simulated panel, within-entity effects):')
fe_ame_table = pd.DataFrame({
    'Variable': fe_ame.marginal_effects.index,
    'AME': fe_ame.marginal_effects.values,
    'SE': fe_ame.std_errors.values,
})
display(fe_ame_table)
print('Note: FE Poisson uses simulated panel data (different variables)')
print('to demonstrate within-entity marginal effects.')

table_08.to_csv(TABLES_DIR / 'table_08_ame_all_models.csv', index=False)

---

## Section 7: Practical Application and Reporting (5 min)

### How to Report Marginal Effects

**Bad:**
> $\beta_{\text{policy}} = 0.18$ ($p < 0.01$)

\u2192 Not directly interpretable for a nonlinear model.

**Better:**
> IRR$_{\text{policy}} = 1.20$, indicating a 20% increase in the expected count.

\u2192 Interpretable, but only a relative effect.

**Best:**
> Policy increases the expected count by 2.3 events on average (AME = 2.3, SE = 0.4, $p < 0.001$).

\u2192 Absolute, substantive, and policy-relevant.

### Checklist for Reporting

1. Report $\beta$ and SE (for completeness)
2. Report IRR with interpretation
3. **Report AME** with units and interpretation
4. Provide context (baseline mean)
5. If heterogeneity is important, show ME profiles

In [None]:
# Create a publication-ready table combining all interpretations
pub_rows = []
for i, var in enumerate(var_names[1:], start=1):
    beta = params[i]
    se_beta = poisson_result.se[i]
    irr = np.exp(beta)
    pct_change = (irr - 1) * 100
    ame_val = ame_result.marginal_effects[var]
    se_ame = ame_result.std_errors[var]
    p_val = ame_result.pvalues[var]

    direction = '+' if ame_val > 0 else ''
    interp = f'{direction}{ame_val:.2f} events'

    pub_rows.append({
        'Variable': var,
        '\u03b2': f'{beta:.4f}',
        'SE(\u03b2)': f'{se_beta:.4f}',
        'IRR': f'{irr:.4f}',
        '% Change': f'{pct_change:+.1f}%',
        'AME': f'{ame_val:.4f}',
        'SE(AME)': f'{se_ame:.4f}',
        'p-value': f'{p_val:.4f}' if p_val >= 0.0001 else '<0.0001',
        'Interpretation': interp,
    })

table_09 = pd.DataFrame(pub_rows)

print('Table 9: Publication-Ready Results Table')
print('=' * 110)
display(table_09)
print(f'\nBaseline mean outcome: {y.mean():.2f} events')
print(f'N = {len(y):,} observations')
print('\nNote: AME units are "events" (same as outcome variable).')
print('SE(AME) computed via delta method.')

table_09.to_csv(TABLES_DIR / 'table_09_publication_ready.csv', index=False)

---

## Section 8: Summary

### Key Takeaways

1. **$\beta \neq$ Marginal Effect** in nonlinear models. The coefficient only tells you the direction and log-linear effect.

2. **IRR = exp($\beta$)**: Gives the relative (multiplicative) effect. Constant across all individuals. Good for percentage change interpretation.

3. **AME**: The absolute effect averaged across all observations. Units are the same as $y$. This is typically what you should report.

4. **MEM**: Simpler (evaluated at means), but the \u201caverage person\u201d may not exist.

5. **MER**: Most interpretable when you have specific policy-relevant profiles in mind.

6. **ME varies across $X$**: In nonlinear models, effects are heterogeneous. Visualize this heterogeneity!

7. **Delta method** provides standard errors automatically. No need for bootstrapping in most cases.

### PanelBox Workflow

```python
from panelbox.models.count import PooledPoisson
from panelbox.marginal_effects.count_me import (
    compute_poisson_ame,
    compute_poisson_mem,
    compute_negbin_ame,
)

# Fit model
model = PooledPoisson(endog=y, exog=X)
result = model.fit(se_type='robust')

# Compute AME
ame = compute_poisson_ame(result)
print(ame.summary())

# Compute MEM
mem = compute_poisson_mem(result)
print(mem.summary())
```

### Next Steps

In **Notebook 07** (Innovation Case Study), we integrate everything from all previous notebooks into a complete real-world analysis with multiple model specifications, marginal effects, and policy implications.

### References

- Cameron, A. C., & Trivedi, P. K. (2013). *Regression Analysis of Count Data* (2nd ed.). Cambridge University Press.
- Wooldridge, J. M. (2010). *Econometric Analysis of Cross Section and Panel Data* (2nd ed.). MIT Press.
- Long, J. S., & Freese, J. (2014). *Regression Models for Categorical Dependent Variables Using Stata* (3rd ed.). Stata Press.

---

## Exercises

### Exercise 1: AME for a Different Specification
Add an interaction term `policy \u00d7 education` to the model and compute AMEs. How does the interaction affect the marginal effect of policy?

### Exercise 2: MER for Policy Targeting
A policymaker wants to know where the policy is most effective. Compute MER for four profiles:
- Young urban female with low education
- Young urban male with high education
- Old rural female with low education
- Old rural male with high education

### Exercise 3: Visualize ME by Two Variables
Create a heatmap or contour plot showing how the ME of policy varies as a function of both `income` and `education` simultaneously.

### Exercise 4: Compare Poisson and NB AMEs
Fit both Poisson and NB models. Compare the AMEs and their standard errors. When do they differ?

In [None]:
# Exercise solutions

# Exercise 1: Interaction model
# [Your code here]

# Exercise 2: MER for policy targeting
# [Your code here]

# Exercise 3: Heatmap of ME
# [Your code here]

# Exercise 4: Poisson vs NB AME comparison
# [Your code here]