# Marginal Effects in Nonlinear Models: Why Coefficients Are Not Enough

**Series:** Marginal Effects Tutorial Series — Notebook 1 of 6  
**Level:** Intermediate  
**Estimated Duration:** 45–60 minutes  
**Dataset:** Mroz (1987) — Women's Labor Force Participation

---

## The Problem

In **linear regression**, the coefficient $\beta_k$ *is* the marginal effect:
$$
E[Y \mid X] = X\beta \implies \frac{\partial E[Y \mid X]}{\partial x_k} = \beta_k
$$

In **nonlinear models** (Logit, Probit, Poisson, Tobit), this relationship breaks down. The coefficient $\beta_k$ is **not** the marginal effect. Failing to compute and report proper marginal effects is one of the most common mistakes in applied econometrics.

> **Motivating example:** Suppose you estimated a Logit model of women's labor force participation and found $\hat{\beta}_{education} = 0.82$. What does that mean? Does education increase the probability of working by 82 percentage points? **No.** The true marginal effect depends on all other variables — and it varies across individuals.

---

## Table of Contents

1. [The Interpretation Problem in Nonlinear Models](#section-1)
2. [Formal Definition of a Marginal Effect](#section-2)
3. [AME, MEM, and MER — Three Strategies](#section-3)
4. [Standard Errors for Marginal Effects (Delta Method)](#section-4)
5. [Complete Hands-on Example](#section-5)
6. [Key Takeaways](#section-6)

---

## Learning Objectives

By the end of this notebook you will be able to:

1. Explain why coefficients in nonlinear models are not directly interpretable as marginal effects.
2. State the formal definition of a marginal effect: $\partial E[Y \mid X] / \partial x_k$.
3. Distinguish AME, MEM, and MER, and identify when to use each.
4. Compute AME and MEM for a Pooled Logit using PanelBox.
5. Interpret the magnitude of a marginal effect in plain language.
6. Understand that marginal effects depend on $X$ (heterogeneity).
7. Explain why marginal effects also have standard errors (delta method).

In [None]:
# Cell 2 — Setup / Imports
import sys
import os

sys.path.insert(0, '/home/guhaase/projetos/panelbox')

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.ticker as mtick
from scipy import stats

import panelbox as pb
from panelbox.models.discrete.binary import PooledLogit, PooledProbit
from panelbox.marginal_effects.discrete_me import compute_ame, compute_mem, compute_mer
from panelbox.core import PanelData

# Utilities from the series — add utils directory to path
_notebook_dir = os.path.dirname(os.path.abspath('__file__'))
_utils_dir = os.path.join(
    '/home/guhaase/projetos/panelbox/examples/marginal_effects', 'utils'
)
if _utils_dir not in sys.path:
    sys.path.insert(0, _utils_dir)

from data_loaders import load_dataset
from me_helpers import plot_forest, format_me_table

# Plot style
try:
    plt.style.use('seaborn-v0_8-whitegrid')
except OSError:
    plt.style.use('seaborn-whitegrid')

pd.set_option('display.float_format', '{:.4f}'.format)

# Output directories
_outputs_plots  = '/home/guhaase/projetos/panelbox/examples/marginal_effects/outputs/plots'
_outputs_tables = '/home/guhaase/projetos/panelbox/examples/marginal_effects/outputs/tables'
os.makedirs(_outputs_plots,  exist_ok=True)
os.makedirs(_outputs_tables, exist_ok=True)

print('Setup complete.')
print(f'PanelBox version: {pb.__version__ if hasattr(pb, "__version__") else "(unknown)"}')

<a id='section-1'></a>
## Section 1: The Interpretation Problem in Nonlinear Models

### OLS: the easy case

In Ordinary Least Squares the conditional expectation is *linear*:
$$E[Y \mid X] = \beta_0 + \beta_1 x_1 + \cdots + \beta_K x_K$$

Taking the derivative with respect to $x_k$ yields simply $\beta_k$ — a **constant** that does not depend on $X$. One number, one interpretation.

### Logit: the nonlinear case

In the binary Logit model:
$$P(Y=1 \mid X) = \Lambda(X\beta) = \frac{e^{X\beta}}{1 + e^{X\beta}}$$

The marginal effect of $x_k$ is:
$$\frac{\partial P(Y=1 \mid X)}{\partial x_k} = \beta_k \cdot \Lambda(X\beta)\bigl[1 - \Lambda(X\beta)\bigr]$$

This expression depends on **all** explanatory variables through the index $X\beta$. The same $\beta_k$ therefore produces **different** effects for different individuals.

### Probit: analogously

$$\frac{\partial P(Y=1 \mid X)}{\partial x_k} = \beta_k \cdot \phi(X\beta)$$

where $\phi$ is the standard normal PDF. Again: depends on $X$.

The cell below illustrates this numerically.

In [None]:
# Cell 4 — Numerical illustration: ME varies with the linear index Xb

xb = np.linspace(-4, 4, 300)
logistic_pdf = np.exp(xb) / (1 + np.exp(xb))**2   # Λ(xb)[1 - Λ(xb)]

beta = 1.5   # a fixed coefficient — same for everyone
me_at_point = beta * logistic_pdf

fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 4))

# Left panel — the S-curve (logistic CDF)
ax1.plot(xb, 1 / (1 + np.exp(-xb)), color='steelblue', lw=2)
ax1.set_xlabel('Linear Index ($X\\beta$)')
ax1.set_ylabel('$P(Y=1 \\mid X)$')
ax1.set_title("Logistic CDF — the 'S curve'")
ax1.axhline(0.5, color='gray', ls='--', alpha=0.5, label='P = 0.5')
ax1.legend()

# Right panel — how the marginal effect changes with Xb
ax2.plot(xb, me_at_point, color='tomato', lw=2)
ax2.axhline(beta * 0.25, color='gray', ls='--', alpha=0.5,
            label=f'Max ME = $\\beta$/4 = {beta/4:.3f}')
ax2.set_xlabel('Linear Index ($X\\beta$)')
ax2.set_ylabel(f'ME of $x$ ($\\beta$ = {beta})')
ax2.set_title('Marginal Effect Varies with $X\\beta$')
ax2.legend()

plt.suptitle(
    f'Same coefficient $\\beta$ = {beta}, but the marginal effect is NOT constant',
    y=1.02, fontsize=12
)
plt.tight_layout()
plt.savefig(
    os.path.join(_outputs_plots, '01_me_varies_with_xb.png'),
    dpi=150, bbox_inches='tight'
)
plt.show()

# Three example values
idx_neg2 = np.argmin(np.abs(xb - (-2)))
idx_zero = np.argmin(np.abs(xb - 0))
idx_pos2 = np.argmin(np.abs(xb - 2))

print(f"\nFor beta = {beta}:")
print(f"  Maximum ME (at Xβ = 0):  {beta * 0.25:.4f}")
print(f"  ME at Xβ = -2:            {me_at_point[idx_neg2]:.4f}")
print(f"  ME at Xβ =  0:            {me_at_point[idx_zero]:.4f}")
print(f"  ME at Xβ = +2:            {me_at_point[idx_pos2]:.4f}")
print()
print("Notice: the same beta produces very different effects depending on where")
print("individuals are located on the S-curve.")

### Key insight from Section 1

The marginal effect in a nonlinear model **depends on the value of $X$ for each individual**. This heterogeneity is unavoidable — it is built into the functional form.

Because the effect is different for every observation, we need a **summarization strategy**. Three standard approaches exist:

| Strategy | Name | Description |
|----------|------|-------------|
| **AME** | Average Marginal Effect | Compute the ME at every observation, then average. |
| **MEM** | Marginal Effect at Means | Evaluate the ME at $\bar{X}$ (sample means). |
| **MER** | Marginal Effect at Representative values | Evaluate the ME at a user-chosen $X^*$. |

We explore all three in Section 3.

<a id='section-2'></a>
## Section 2: Formal Definition of a Marginal Effect

### Continuous explanatory variables

For a continuous variable $x_k$:

$$\text{ME}_k(X) = \frac{\partial E[Y \mid X]}{\partial x_k}$$

Model-specific formulas:

| Model | $E[Y \mid X]$ | $\text{ME}_k(X)$ |
|-------|--------------|------------------|
| OLS | $X\beta$ | $\beta_k$ |
| Logit | $\Lambda(X\beta)$ | $\beta_k \cdot \Lambda(X\beta)[1 - \Lambda(X\beta)]$ |
| Probit | $\Phi(X\beta)$ | $\beta_k \cdot \phi(X\beta)$ |
| Poisson | $\exp(X\beta)$ | $\beta_k \cdot \exp(X\beta)$ |

### Binary (dummy) explanatory variables

When $x_k \in \{0, 1\}$, the derivative does not apply. Use the **discrete change** instead:

$$\text{DC}_k = E[Y \mid X, x_k = 1] - E[Y \mid X, x_k = 0]$$

This is the change in probability when the dummy switches from 0 to 1, holding all other variables constant.

### Notation

Throughout this series:
- $\Lambda(\cdot)$ = logistic CDF: $\Lambda(z) = e^z / (1 + e^z)$
- $\lambda(\cdot)$ = logistic PDF: $\lambda(z) = \Lambda(z)[1 - \Lambda(z)]$
- $\Phi(\cdot)$ = standard normal CDF
- $\phi(\cdot)$ = standard normal PDF

In [None]:
# Cell 7 — Load Mroz dataset

df = load_dataset('mroz')
print(f"Dataset shape: {df.shape}")
print(f"\nColumn names: {list(df.columns)}")
print("\nDescriptive statistics:")
print(df.describe().T[['count', 'mean', 'std', 'min', 'max']].round(4))

print(f"\nOutcome variable (inlf):")
vc = df['inlf'].value_counts().sort_index()
print(f"  Not in labor force (0): {vc.get(0, 0):4d}  ({100*vc.get(0,0)/len(df):.1f}%)")
print(f"  In labor force     (1): {vc.get(1, 0):4d}  ({100*vc.get(1,0)/len(df):.1f}%)")

In [None]:
# Cell 8 — Estimate Pooled Logit (cross-section treated as single-period panel)

# Add panel identifiers (cross-section: each observation is its own entity)
df = df.copy()
df['id']   = range(len(df))
df['time'] = 1

panel = PanelData(df, entity='id', time='time')

# Estimate model
model = PooledLogit(
    data=panel,
    formula='inlf ~ educ + age + kidslt6 + kidsge6 + nwifeinc'
)
result = model.fit()

print(result.summary())

# Pseudo-R2
try:
    prsq = result.prsquared
except AttributeError:
    try:
        prsq = 1 - result.llf / result.llnull
    except AttributeError:
        prsq = float('nan')

print(f"\nPseudo R² (McFadden): {prsq:.4f}")

<a id='section-3'></a>
## Section 3: AME, MEM, and MER — Three Strategies

Because the marginal effect varies across individuals, we need a single summary number. Three standard strategies exist.

---

### AME — Average Marginal Effects

**Definition:** Compute the ME at *every* observation, then take the average:
$$\text{AME}_k = \frac{1}{N} \sum_{i=1}^{N} \beta_k \cdot \lambda(X_i \beta)$$

**Pros:**
- Accounts for the full distribution of $X$ in the sample.
- Population-average interpretation: "the average individual's response".
- Most common in published research and required by some journals.

**When to use:** General-purpose; the recommended default.

---

### MEM — Marginal Effects at Means

**Definition:** Evaluate the ME at the sample mean $\bar{X}$:
$$\text{MEM}_k = \beta_k \cdot \lambda(\bar{X}\beta)$$

**Pros:** Simple to compute; useful for quick comparisons.

**Cons:** The "average individual" ($\bar{X}$) may not exist in the data. For example, if `kidslt6` has mean 0.4, no woman actually has 0.4 young children.

**When to use:** Exploratory analysis; homogeneous samples where the mean is representative.

---

### MER — Marginal Effects at Representative values

**Definition:** Evaluate the ME at a user-specified profile $X^*$:
$$\text{MER}_k = \beta_k \cdot \lambda(X^* \beta)$$

Unspecified variables are set to their sample means.

**When to use:** Scenario analysis; comparing subgroups (e.g., "a college-educated woman with no young children").

---

### Comparison table

| Method | Formula | Typical use case |
|--------|---------|------------------|
| **AME** | $\mathbb{E}[\partial P/\partial x]$ | General; recommended default |
| **MEM** | $\partial P(\bar{X})/\partial x$ | Quick check; low heterogeneity |
| **MER** | $\partial P(X^*)/\partial x$ | Specific profiles or scenarios |

In [None]:
# Cell 10 — Compute AME

ame = compute_ame(result)

print("=" * 65)
print("  Average Marginal Effects (AME)")
print("=" * 65)
ame_summary = ame.summary()
print(ame_summary)

In [None]:
# Cell 11 — Compute MEM

mem = compute_mem(result)

print("=" * 65)
print("  Marginal Effects at Means (MEM)")
print("=" * 65)
mem_summary = mem.summary()
print(mem_summary)

In [None]:
# Cell 12 — Compare AME vs MEM side by side

ame_vals = ame.marginal_effects
mem_vals = mem.marginal_effects

# Align on common index
common_vars = ame_vals.index.intersection(mem_vals.index)

comparison = pd.DataFrame({
    'AME':          ame_vals[common_vars],
    'MEM':          mem_vals[common_vars],
    'Difference':   ame_vals[common_vars] - mem_vals[common_vars],
    'Rel. Diff (%)': 100 * (ame_vals[common_vars] - mem_vals[common_vars])
                    / ame_vals[common_vars].abs()
})

print("\n" + "=" * 65)
print("  AME vs MEM Comparison")
print("=" * 65)
print(comparison.round(5))

print("""
Interpretation:
  - AME and MEM are close but not identical.
  - Differences arise because the logistic density is concave: it attenuates
    the effect of large values more than the linear average.
  - AME is preferred because it integrates over the actual distribution
    of X in the data, rather than relying on the (possibly non-existent)
    'average individual'.
""")

In [None]:
# Cell 13 — Compute MER: college-educated woman, age 35, no young children

# Representative profile
# Variables not listed here will default to sample means inside compute_mer
representative = {
    'educ':    16,    # college degree (16 years of education)
    'age':     35,
    'kidslt6': 0,     # no children under 6
    'kidsge6': 1,     # one school-age child
    'nwifeinc': 20.0  # moderate non-wife household income
}

mer = compute_mer(result, at=representative)

print("=" * 65)
print("  MER: College-Educated Woman, Age 35, No Young Children")
print("=" * 65)
mer_summary = mer.summary()
print(mer_summary)

print("\nNote: variables not specified in 'at' are set to their sample means.")
print("The MER answers: 'What is the marginal effect for *this specific person*?'")

<a id='section-4'></a>
## Section 4: Standard Errors for Marginal Effects (Delta Method)

### Why do marginal effects have standard errors?

Marginal effects are **functions of estimated parameters** $\hat{\beta}$. Because $\hat{\beta}$ is a random variable with estimation uncertainty, so are the marginal effects.

### The Delta Method

Let $g(\theta)$ be any differentiable function of the parameter vector. The delta method gives:

$$\text{Var}\bigl[g(\hat{\theta})\bigr] \approx \nabla g(\hat{\theta})' \cdot \text{Var}[\hat{\theta}] \cdot \nabla g(\hat{\theta})$$

where:
- $g(\theta)$ is the marginal effect as a function of all parameters,
- $\nabla g(\theta)$ is the gradient vector (computed numerically in PanelBox),
- $\text{Var}[\hat{\theta}]$ is the estimated parameter covariance matrix.

### Important warning

> **A statistically significant coefficient does NOT guarantee a statistically significant marginal effect** (and vice versa). The transformations can amplify or dampen uncertainty differently. Always test the marginal effect directly.

### What PanelBox does

The function `compute_ame` (and `compute_mem`, `compute_mer`) internally calls `delta_method_se()` from `panelbox.marginal_effects.delta_method`. The gradient is computed numerically using finite differences. This is transparent to the user — you simply inspect the `std_errors` attribute.

In [None]:
# Cell 15 — Confidence intervals from the AME

ci = ame.conf_int(alpha=0.05)

print("=" * 55)
print("  95% Confidence Intervals for AME")
print("=" * 55)
print(ci.round(5))

print("\nAME with SEs and p-values:")
ame_full = pd.DataFrame({
    'AME':       ame.marginal_effects,
    'Std. Err.': ame.std_errors,
    'z':         ame.z_stats,
    'P>|z|':     ame.pvalues,
    'CI Lower':  ci['lower'],
    'CI Upper':  ci['upper'],
})
print(ame_full.round(5))

print("""
Significance codes:  *** p < 0.001  ** p < 0.01  * p < 0.05  . p < 0.1
""")

<a id='section-5'></a>
## Section 5: Complete Hands-on Example

Now let us run the **full pipeline** in one place and produce a complete, publication-ready interpretation.

The pipeline is:
1. Load data.
2. Create panel structure (cross-section).
3. Estimate Pooled Logit.
4. Compute AME with standard errors.
5. Inspect confidence intervals.
6. Communicate results in plain language.
7. Produce a forest plot.

Steps 1–5 are already done above. Below we focus on steps 6–7.

In [None]:
# Cell 17 — Full pipeline: plain-language interpretation

print("=" * 65)
print("  COMPLETE MARGINAL EFFECTS PIPELINE")
print("=" * 65)

# ── Step 1: Model summary ───────────────────────────────────────────────────
print("\n[1] Model: Pooled Logit — Women's Labor Force Participation (Mroz 1987)")
print(f"    Observations: {len(df)}")
try:
    prsq = result.prsquared
except AttributeError:
    try:
        prsq = 1 - result.llf / result.llnull
    except AttributeError:
        prsq = float('nan')
print(f"    Pseudo R² (McFadden): {prsq:.4f}")

# ── Step 2: AME summary ─────────────────────────────────────────────────────
print("\n[2] Average Marginal Effects (AME):")
ame_summary = ame.summary()
print(ame_summary)

# ── Step 3: Plain-language interpretation ──────────────────────────────────
print("\n[3] Plain-language interpretation:")
print("-" * 55)

vars_to_report = ['educ', 'age', 'kidslt6', 'kidsge6', 'nwifeinc']
ci95 = ame.conf_int(alpha=0.05)

for var in vars_to_report:
    if var not in ame.marginal_effects.index:
        continue
    me_val  = float(ame.marginal_effects[var])
    se_val  = float(ame.std_errors[var])
    pv_val  = float(ame.pvalues[var])
    lo_val  = float(ci95.loc[var, 'lower'])
    hi_val  = float(ci95.loc[var, 'upper'])
    sig_str = '***' if pv_val < 0.001 else ('**' if pv_val < 0.01 else
              ('*' if pv_val < 0.05 else ('.' if pv_val < 0.1 else 'n.s.')))

    direction = 'increases' if me_val > 0 else 'decreases'
    print(f"  {var} ({sig_str}):")
    print(f"    A one-unit increase {direction} P(LFP=1) by {abs(me_val):.1%}")
    print(f"    [95% CI: {lo_val:.1%} to {hi_val:.1%}; SE = {se_val:.4f}]")
    print()

# Focused interpretation for the two most salient variables
if 'educ' in ame.marginal_effects.index and 'kidslt6' in ame.marginal_effects.index:
    educ_ame = float(ame.marginal_effects['educ'])
    educ_se  = float(ame.std_errors['educ'])
    klt6_ame = float(ame.marginal_effects['kidslt6'])

    print("-" * 55)
    print("  Highlight — Education:")
    print(f"    An additional year of education raises the probability")
    print(f"    of labor force participation by {educ_ame:.1%} on average")
    print(f"    (SE = {educ_se:.4f}).")
    print()
    print("  Highlight — Young children:")
    print(f"    Having one more child under age 6 reduces participation")
    print(f"    probability by {abs(klt6_ame):.1%} on average.")

In [None]:
# Cell 18 — Forest plot: AME with 95% confidence intervals

fig, ax = plt.subplots(figsize=(9, 4))

variables = list(ame.marginal_effects.index)
me_vals   = ame.marginal_effects.values
se_vals   = ame.std_errors.values
ci95_plot = ame.conf_int(alpha=0.05)

# Colour by sign
colors = ['tomato' if v < 0 else 'steelblue' for v in me_vals]

# Horizontal bar chart with error bars
ax.barh(
    variables, me_vals,
    xerr=1.96 * se_vals,
    color=colors,
    alpha=0.75,
    capsize=5,
    error_kw={'lw': 1.5, 'ecolor': 'black'}
)
ax.axvline(0, color='black', lw=0.8, ls='--')
ax.set_xlabel('Average Marginal Effect on P(LFP = 1)')
ax.set_title("AME — Women's Labor Force Participation (Mroz 1987)")

# Format x-axis as percentages
ax.xaxis.set_major_formatter(mtick.PercentFormatter(xmax=1, decimals=1))

# Significance stars
pvals = ame.pvalues
for i, var in enumerate(variables):
    pv = float(pvals[var])
    stars = '***' if pv < 0.001 else ('**' if pv < 0.01 else
            ('*' if pv < 0.05 else ''))
    if stars:
        x_offset = me_vals[i] + 1.96 * se_vals[i]
        ax.text(x_offset * 1.05, i, stars, va='center', fontsize=9, color='darkred')

plt.tight_layout()
plt.savefig(
    os.path.join(_outputs_plots, '01_ame_forest_plot.png'),
    dpi=150, bbox_inches='tight'
)
plt.show()
print("Forest plot saved to outputs/plots/01_ame_forest_plot.png")

<a id='section-6'></a>
## Section 6: Key Takeaways

1. **In nonlinear models, $\beta \neq$ marginal effect.** The Logit coefficient $\beta_k$ must be multiplied by $\lambda(X\beta) = \Lambda(X\beta)[1-\Lambda(X\beta)]$ to obtain the actual marginal effect.

2. **The marginal effect depends on $X$.** It is different for every individual in the sample. Reporting a single coefficient as if it were a marginal effect is incorrect.

3. **AME averages over all observations** → preferred in most applications. It respects the actual distribution of covariates in the data.

4. **MEM evaluates at the mean** → fast but may be misleading if the mean $\bar{X}$ does not represent a real individual (e.g., fractional children).

5. **MER evaluates at any specified point** → use for scenario analysis or subgroup comparisons.

6. **Marginal effects have standard errors (delta method) — always report them.** A significant coefficient does not guarantee a significant marginal effect.

7. **Communicate in plain language.** Instead of "$\hat{\beta}_{educ} = 0.28$", say "An additional year of education increases the probability of labor force participation by approximately 3.8 percentage points (95% CI: ...)".

---

### Bridge to Notebook 02

In **Notebook 02** (`02_discrete_me_complete.ipynb`) we extend these concepts to the full range of discrete choice models:
- Binary Probit (comparison with Logit)
- Multinomial Logit (unordered multiple alternatives)
- Ordered Logit / Probit (ordered categories)

Each model type requires specific formulas for computing marginal effects, and PanelBox handles them all through a unified API.

## Knowledge Check

Test your understanding with these four questions. Answers are in the solution notebook.

---

**Question 1**

A Probit coefficient for `education` is $\hat{\beta} = 0.15$. Can you directly interpret this as "one more year of education increases the probability by 15 percentage points"? Why or why not?

*(Hint: think about the functional form $\Phi(X\beta)$ and what its derivative looks like.)*

---

**Question 2**

In the Mroz example above, why might AME and MEM differ for `kidslt6` but be nearly identical for `age`?

*(Hint: consider how the distribution of `kidslt6` differs from that of `age`, and how that interacts with the nonlinearity of the logistic function.)*

---

**Question 3**

When would you prefer MER over AME? Give a concrete research scenario.

*(Example starting point: a policy analyst needs to predict the effect of a subsidy for a specific demographic group...)*

---

**Question 4**

Suppose the AME for `educ` is $0.038$ and its standard error is $0.010$.

(a) What is the 95% confidence interval?  
(b) Is the effect statistically significant at the 5% level?  
(c) Is the effect economically significant? How would you judge this?

*(Recall: 95% CI = estimate $\pm$ 1.96 $\times$ SE.)*

In [None]:
# Cell 21 — Export results to CSV and LaTeX

# ── Format table using the series helper ────────────────────────────────────
ame_df = format_me_table(ame)

# ── Save CSV ────────────────────────────────────────────────────────────────
csv_path = os.path.join(_outputs_tables, '01_ame_logit_mroz.csv')
ame_df.to_csv(csv_path, index=False)
print(f"CSV table saved to:\n  {csv_path}")

# ── Print LaTeX ─────────────────────────────────────────────────────────────
print("\nLaTeX version (copy into your paper):")
print("-" * 65)
try:
    latex_str = ame_df.to_latex(
        index=False,
        caption="Average Marginal Effects — Pooled Logit, Mroz (1987)",
        label="tab:ame_logit_mroz",
        escape=True
    )
except TypeError:
    # Older pandas may not support caption/label in to_latex
    latex_str = ame_df.to_latex(index=False)
print(latex_str)

# ── Summary ─────────────────────────────────────────────────────────────────
print("-" * 65)
print("Outputs written in this notebook:")
print(f"  plots/01_me_varies_with_xb.png")
print(f"  plots/01_ame_forest_plot.png")
print(f"  tables/01_ame_logit_mroz.csv")