# Multinomial Logit: Multiple Unordered Alternatives

**Tutorial Series**: Discrete Choice Econometrics with PanelBox

**Notebook**: 06 - Multinomial Logit

**Author**: PanelBox Contributors

**Date**: 2026-02-17

**Estimated Duration**: 75 minutes

**Difficulty Level**: Intermediate

---

## Learning Objectives

By the end of this notebook, you will be able to:

1. Extend binary logit to J > 2 unordered alternatives
2. Understand and choose the reference category for identification
3. Estimate Multinomial Logit models using PanelBox
4. Interpret coefficients as log-odds ratios relative to the base category
5. Perform and interpret the Hausman-McFadden IIA test
6. Calculate multi-category marginal effects (AME)
7. Distinguish Multinomial Logit from Conditional Logit

---

## Table of Contents

1. [From Binary to Multinomial](#section1)
2. [Identification — Reference Category](#section2)
3. [Estimation with PanelBox](#section3)
4. [IIA Test — Hausman-McFadden](#section4)
5. [Predicted Probabilities and Classification](#section5)
6. [Marginal Effects in Multinomial Logit](#section6)
7. [Application — Career Choice](#section7)
8. [Exercises](#exercises)

## Setup

Import all required libraries and configure the environment.

In [None]:
# Standard library imports
import warnings
from pathlib import Path

# Data manipulation and numerical computing
import numpy as np
import pandas as pd

# Visualization
import matplotlib.pyplot as plt
import seaborn as sns

# Statistical functions
from scipy.stats import norm, chi2

# PanelBox models
from panelbox.models.discrete.multinomial import MultinomialLogit

# Configuration
warnings.filterwarnings('ignore')
np.random.seed(42)
pd.set_option('display.max_columns', None)
pd.set_option('display.precision', 4)

# Matplotlib configuration
plt.style.use('seaborn-v0_8-darkgrid')
plt.rcParams['figure.figsize'] = (12, 6)
plt.rcParams['font.size'] = 11
plt.rcParams['axes.labelsize'] = 12
plt.rcParams['axes.titlesize'] = 14
plt.rcParams['xtick.labelsize'] = 10
plt.rcParams['ytick.labelsize'] = 10
plt.rcParams['legend.fontsize'] = 10

# Paths
DATA_DIR = Path("..") / "data"
OUTPUT_DIR = Path("..") / "outputs"
FIG_DIR = OUTPUT_DIR / "figures"
TABLE_DIR = OUTPUT_DIR / "tables"

# Create output directories if needed
FIG_DIR.mkdir(parents=True, exist_ok=True)
TABLE_DIR.mkdir(parents=True, exist_ok=True)

# Career labels for readability
CAREER_LABELS = {0: 'Manual', 1: 'Technical', 2: 'Managerial'}
CAREER_COLORS = {'Manual': '#e74c3c', 'Technical': '#3498db', 'Managerial': '#2ecc71'}

print("All libraries imported successfully")
print(f"Random seed set to: 42")
print(f"Working directory: {Path.cwd()}")

<a id='section1'></a>
## 1. From Binary to Multinomial

### 1.1 Review: Binary Logit

In **Binary Logit** (Notebook 01), we modeled a binary outcome $y \in \{0, 1\}$:

$$P(y_i = 1 | X_i) = \frac{\exp(X_i' \beta)}{1 + \exp(X_i' \beta)}$$

This gives the log-odds:

$$\log \frac{P(y=1)}{P(y=0)} = X' \beta$$

### 1.2 The Multinomial Extension

When the outcome has **J > 2 unordered categories** (e.g., career choice: manual, technical, managerial), we extend to the **Multinomial Logit**.

$$P(y_i = j | X_i) = \frac{\exp(X_i' \beta_j)}{\sum_{k=0}^{J-1} \exp(X_i' \beta_k)}$$

Key features:
- Covariates $X_i$ are **individual-specific** (same person, same $X$)
- Each alternative $j$ has its **own coefficient vector** $\beta_j$
- One alternative is normalized ($\beta_{\text{base}} = 0$) for identification
- Number of parameters: $(J-1) \times K$

### 1.3 MNL vs Conditional Logit

| Feature | Multinomial Logit (MNL) | Conditional Logit (CL) |
|---------|------------------------|------------------------|
| Covariates | Individual-specific ($X_i$) | Alternative-specific ($Z_{ij}$) |
| Coefficients | Alternative-specific ($\beta_j$) | Common ($\gamma$) |
| Parameters | $(J-1) \times K$ | $K$ |
| Example | Age, education affect career | Cost, time of each mode affect transport choice |
| "Who chooses" | Characteristics of the person | Attributes of the alternatives |

### 1.4 Load the Data

In [None]:
# Load career choice panel data
data = pd.read_csv(DATA_DIR / "career_choice.csv")

print("Dataset loaded successfully!")
print(f"\nShape: {data.shape}")
print(f"Number of individuals: {data['id'].nunique()}")
print(f"Number of periods: {data['year'].nunique()}")
print(f"\nFirst 10 rows:")
data.head(10)

In [None]:
# Career choice distribution
print("=== Career Choice Distribution ===")
career_dist = data['career'].value_counts().sort_index()
career_prop = data['career'].value_counts(normalize=True).sort_index()

for code, label in CAREER_LABELS.items():
    print(f"  {code} ({label:10s}): {career_dist[code]:5d} obs  ({career_prop[code]:.1%})")

print(f"\nTotal observations: {len(data)}")

In [None]:
# Summary statistics by career choice
print("=== Characteristics by Career Choice ===")
summary = data.groupby('career')[['educ', 'exper', 'age', 'female', 'income', 'urban']].mean()
summary.index = [CAREER_LABELS[c] for c in summary.index]
print(summary.round(2))

In [None]:
# Visualize career distribution and characteristics
fig, axes = plt.subplots(2, 2, figsize=(14, 10))

# 1. Career distribution
labels = [CAREER_LABELS[i] for i in range(3)]
colors = [CAREER_COLORS[l] for l in labels]
bars = axes[0, 0].bar(labels, career_dist.values, color=colors, alpha=0.8, edgecolor='black')
for bar, count in zip(bars, career_dist.values):
    axes[0, 0].text(bar.get_x() + bar.get_width()/2, bar.get_height() + 20,
                    f'{count}', ha='center', va='bottom', fontsize=11)
axes[0, 0].set_title('Career Choice Distribution', fontweight='bold')
axes[0, 0].set_ylabel('Count')
axes[0, 0].grid(True, alpha=0.3, axis='y')

# 2. Education by career
for code, label in CAREER_LABELS.items():
    subset = data[data['career'] == code]['educ']
    axes[0, 1].hist(subset, bins=20, alpha=0.5, label=label, color=CAREER_COLORS[label])
axes[0, 1].set_title('Education Distribution by Career', fontweight='bold')
axes[0, 1].set_xlabel('Years of Education')
axes[0, 1].set_ylabel('Frequency')
axes[0, 1].legend()
axes[0, 1].grid(True, alpha=0.3)

# 3. Career by gender
gender_career = data.groupby(['female', 'career']).size().unstack(fill_value=0)
gender_career.columns = [CAREER_LABELS[c] for c in gender_career.columns]
gender_career.index = ['Male', 'Female']
gender_career_pct = gender_career.div(gender_career.sum(axis=1), axis=0)
gender_career_pct.plot(kind='bar', stacked=True, ax=axes[1, 0],
                       color=[CAREER_COLORS[c] for c in gender_career_pct.columns],
                       alpha=0.8, edgecolor='black')
axes[1, 0].set_title('Career Distribution by Gender', fontweight='bold')
axes[1, 0].set_ylabel('Proportion')
axes[1, 0].set_xlabel('')
axes[1, 0].tick_params(axis='x', rotation=0)
axes[1, 0].legend(title='Career')
axes[1, 0].grid(True, alpha=0.3, axis='y')

# 4. Experience by career
career_data = [data[data['career'] == c]['exper'] for c in range(3)]
bp = axes[1, 1].boxplot(career_data, labels=labels, patch_artist=True, notch=True)
for patch, color in zip(bp['boxes'], colors):
    patch.set_facecolor(color)
    patch.set_alpha(0.6)
axes[1, 1].set_title('Experience Distribution by Career', fontweight='bold')
axes[1, 1].set_ylabel('Years of Experience')
axes[1, 1].grid(True, alpha=0.3)

plt.suptitle('Career Choice: Exploratory Analysis', fontsize=16, fontweight='bold', y=1.01)
plt.tight_layout()
plt.savefig(FIG_DIR / '06_career_exploration.png', dpi=150, bbox_inches='tight')
plt.show()

print("Figure saved to outputs/figures/06_career_exploration.png")

<a id='section2'></a>
## 2. Identification — Reference Category

### 2.1 The Over-Parameterization Problem

If we estimate $\beta_j$ for **all** J alternatives, the model is **not identified**. Adding a constant $c$ to all utilities doesn't change the probabilities:

$$\frac{\exp(X'\beta_j + c)}{\sum_k \exp(X'\beta_k + c)} = \frac{\exp(c) \cdot \exp(X'\beta_j)}{\exp(c) \cdot \sum_k \exp(X'\beta_k)} = \frac{\exp(X'\beta_j)}{\sum_k \exp(X'\beta_k)}$$

### 2.2 Normalization: Set $\beta_{\text{base}} = 0$

To identify the model, we **normalize** one alternative's coefficients to zero. This gives:

$$\log \frac{P(y=j)}{P(y=\text{base})} = X' \beta_j$$

**Interpretation**: $\beta_j$ captures the **log-odds ratio** of choosing alternative $j$ vs the base, for a unit change in $X$.

### 2.3 Choice of Base Category

Common strategies:
- **Most common category**: Maximizes precision
- **Theoretically relevant reference**: E.g., "manual" as the default career path

**Key insight**: Changing the base changes the $\beta$ values but **not** the predicted probabilities.

### 2.4 Demonstration: Base Category Invariance

In [None]:
# Demonstrate reference category invariance
# Estimate with each base alternative and compare predictions

exog_vars = ['educ', 'exper', 'age', 'female']
X = data[exog_vars].values
y = data['career'].values

predictions = {}
coefficients = {}

for base in range(3):
    model = MultinomialLogit(
        endog=y,
        exog=X,
        n_alternatives=3,
        base_alternative=base
    )
    results = model.fit()
    
    # Store predictions and coefficients
    predictions[base] = results.predict_proba()
    coefficients[base] = results.params_matrix
    
    print(f"\n=== Base alternative = {base} ({CAREER_LABELS[base]}) ===")
    print(f"Parameters shape: {results.params_matrix.shape}")
    for idx, j in enumerate(model.non_base_alts):
        print(f"  {CAREER_LABELS[j]} vs {CAREER_LABELS[base]}: "
              f"educ={results.params_matrix[idx, 0]:+.4f}, "
              f"exper={results.params_matrix[idx, 1]:+.4f}, "
              f"age={results.params_matrix[idx, 2]:+.4f}, "
              f"female={results.params_matrix[idx, 3]:+.4f}")

In [None]:
# Verify that predicted probabilities are identical regardless of base
print("=== Prediction Invariance Check ===")
print("\nMax absolute difference in predicted probabilities:")
for base_a in range(3):
    for base_b in range(base_a + 1, 3):
        diff = np.max(np.abs(predictions[base_a] - predictions[base_b]))
        print(f"  Base {base_a} vs Base {base_b}: {diff:.2e}")

print("\nConclusion: Predictions are IDENTICAL regardless of base category.")
print("The base category is a normalization choice, not a substantive one.")

<a id='section3'></a>
## 3. Estimation with PanelBox

### 3.1 Model Specification

We use career = 0 (Manual) as the base category. The model estimates:
- $\beta_1$: coefficients for **Technical vs Manual**
- $\beta_2$: coefficients for **Managerial vs Manual**

Each coefficient set has K parameters (one per covariate).

### 3.2 Estimate the Model

In [None]:
# Estimate Multinomial Logit with base = Manual (0)
exog_vars = ['educ', 'exper', 'age', 'female']
X = data[exog_vars].values
y = data['career'].values

model = MultinomialLogit(
    endog=y,
    exog=X,
    n_alternatives=3,
    base_alternative=0,  # Manual = reference
)

# Store variable names for later use
model.exog_names = exog_vars

results = model.fit()

print("=" * 70)
print(" " * 15 + "MULTINOMIAL LOGIT: CAREER CHOICE")
print("=" * 70)
print(results.summary())

### 3.3 Coefficient Interpretation

Each coefficient $\beta_{j,k}$ tells us how a one-unit increase in variable $k$ changes the **log-odds** of choosing alternative $j$ vs the base (Manual):

$$\log \frac{P(\text{career} = j)}{P(\text{career} = 0)} = X' \beta_j$$

- $\beta_{1, \text{educ}} > 0$: More education increases log-odds of Technical vs Manual
- $\beta_{2, \text{educ}} > \beta_{1, \text{educ}}$: Education is **more important** for Managerial than Technical

In [None]:
# Detailed coefficient interpretation
print("=== Coefficient Interpretation ===")
print("\nAll coefficients are relative to the base category: Manual (career=0)\n")

for idx, j in enumerate(model.non_base_alts):
    print(f"--- {CAREER_LABELS[j]} vs {CAREER_LABELS[model.base_alternative]} ---")
    for k, var in enumerate(exog_vars):
        coef = results.params_matrix[idx, k]
        se = results.bse_matrix[idx, k]
        z = coef / se if not np.isnan(se) else np.nan
        p = 2 * (1 - norm.cdf(abs(z))) if not np.isnan(z) else np.nan
        sig = '***' if p < 0.01 else '**' if p < 0.05 else '*' if p < 0.1 else ''
        
        # Odds ratio
        odds_ratio = np.exp(coef)
        
        print(f"  {var:8s}: beta = {coef:+.4f} {sig:3s}  (OR = {odds_ratio:.4f})")
        if var == 'educ':
            print(f"            1 extra year of education multiplies odds of {CAREER_LABELS[j]} by {odds_ratio:.2f}")
        elif var == 'female':
            if coef < 0:
                print(f"            Women have {(1-odds_ratio)*100:.1f}% lower odds of {CAREER_LABELS[j]} vs Manual")
            else:
                print(f"            Women have {(odds_ratio-1)*100:.1f}% higher odds of {CAREER_LABELS[j]} vs Manual")
    print()

# Compare education effects across alternatives
educ_tech = results.params_matrix[0, 0]  # Technical vs Manual
educ_mgr = results.params_matrix[1, 0]   # Managerial vs Manual
print(f"Education comparison:")
print(f"  Technical vs Manual: {educ_tech:+.4f}")
print(f"  Managerial vs Manual: {educ_mgr:+.4f}")
if abs(educ_mgr) > abs(educ_tech):
    print(f"  -> Education matters MORE for managerial career selection")

In [None]:
# Visualize coefficients: grouped bar chart
fig, ax = plt.subplots(figsize=(12, 6))

x = np.arange(len(exog_vars))
width = 0.35

# Technical vs Manual
bars1 = ax.bar(x - width/2, results.params_matrix[0], width,
               label=f'{CAREER_LABELS[model.non_base_alts[0]]} vs {CAREER_LABELS[model.base_alternative]}',
               color=CAREER_COLORS['Technical'], alpha=0.8, edgecolor='black')
# Managerial vs Manual
bars2 = ax.bar(x + width/2, results.params_matrix[1], width,
               label=f'{CAREER_LABELS[model.non_base_alts[1]]} vs {CAREER_LABELS[model.base_alternative]}',
               color=CAREER_COLORS['Managerial'], alpha=0.8, edgecolor='black')

# Add value labels
for bar in list(bars1) + list(bars2):
    h = bar.get_height()
    ax.text(bar.get_x() + bar.get_width()/2, h + (0.005 if h >= 0 else -0.015),
            f'{h:.3f}', ha='center', va='bottom' if h >= 0 else 'top', fontsize=9)

ax.set_xlabel('Variable')
ax.set_ylabel('Coefficient (log-odds ratio)')
ax.set_title('Multinomial Logit Coefficients by Alternative\n(relative to Manual)', fontweight='bold')
ax.set_xticks(x)
ax.set_xticklabels(exog_vars)
ax.legend()
ax.axhline(y=0, color='k', linestyle='--', linewidth=0.8, alpha=0.5)
ax.grid(True, alpha=0.3, axis='y')

plt.tight_layout()
plt.savefig(FIG_DIR / '06_coefficients_comparison.png', dpi=150, bbox_inches='tight')
plt.show()

print("Figure saved to outputs/figures/06_coefficients_comparison.png")

<a id='section4'></a>
## 4. IIA Test — Hausman-McFadden

### 4.1 The IIA Property

The **Independence of Irrelevant Alternatives** (IIA) property states:

$$\frac{P(y = j)}{P(y = k)} = \exp\left(X' (\beta_j - \beta_k)\right)$$

This ratio is **independent of other alternatives** in the choice set.

### 4.2 Why It Matters

If IIA is violated (e.g., Technical and Managerial are close substitutes), then:
- Removing an alternative should change estimated coefficients
- Adding a similar alternative may bias substitution patterns

### 4.3 Hausman-McFadden Test

**Procedure**:
1. Estimate full model with all alternatives
2. Omit one alternative and re-estimate
3. Compare coefficients:

$$H = (\hat{\beta}_R - \hat{\beta}_F)' [\hat{V}_R - \hat{V}_F]^{-1} (\hat{\beta}_R - \hat{\beta}_F) \sim \chi^2(K)$$

- $H_0$: IIA holds (coefficients are stable)
- $H_1$: IIA is violated

In [None]:
# Hausman-McFadden IIA Test
print("=" * 70)
print(" " * 15 + "HAUSMAN-MCFADDEN IIA TEST")
print("=" * 70)
print("\nH0: IIA holds (coefficients stable when alternative removed)")
print("H1: IIA is violated\n")

# Full model
results_full = results  # Already estimated above

iia_results = []

for alt_to_remove in range(3):
    # Skip if removing the base alternative (requires re-specifying model)
    if alt_to_remove == model.base_alternative:
        continue
    
    # Restrict data: remove observations where career == alt_to_remove
    mask = data['career'] != alt_to_remove
    data_sub = data[mask].copy()
    
    # Remap career values to 0, 1 for the restricted model
    remaining_alts = sorted(set(range(3)) - {alt_to_remove})
    remap = {old: new for new, old in enumerate(remaining_alts)}
    y_sub = data_sub['career'].map(remap).values
    X_sub = data_sub[exog_vars].values
    
    # The base alternative index in the restricted model
    base_in_sub = remap[model.base_alternative]
    
    # Estimate restricted model
    model_sub = MultinomialLogit(
        endog=y_sub,
        exog=X_sub,
        n_alternatives=2,
        base_alternative=base_in_sub
    )
    results_sub = model_sub.fit()
    
    # Get comparable parameters
    # The restricted model estimates beta for the remaining non-base alt
    # This should correspond to one row of the full model's params_matrix
    remaining_nonbase = [a for a in remaining_alts if a != model.base_alternative][0]
    full_idx = list(model.non_base_alts).index(remaining_nonbase)
    
    beta_full = results_full.params_matrix[full_idx]
    beta_sub = results_sub.params_matrix[0]
    
    # Covariance matrices for these parameters
    K = model.K
    start = full_idx * K
    end = start + K
    vcov_full = results_full.cov_params[start:end, start:end]
    vcov_sub = results_sub.cov_params[:K, :K]
    
    # Hausman statistic
    diff = beta_sub - beta_full
    vcov_diff = vcov_sub - vcov_full
    
    try:
        H = float(diff @ np.linalg.inv(vcov_diff) @ diff)
        df = K
        p_value = 1 - chi2.cdf(abs(H), df)
        conclusion = 'Fail to reject IIA' if p_value > 0.05 else 'Reject IIA'
    except np.linalg.LinAlgError:
        H = np.nan
        df = K
        p_value = np.nan
        conclusion = 'Singular matrix'
    
    iia_results.append({
        'Removed': f"{alt_to_remove} ({CAREER_LABELS[alt_to_remove]})",
        'N_obs': len(data_sub),
        'H_statistic': H,
        'df': df,
        'p_value': p_value,
        'Conclusion': conclusion
    })
    
    print(f"Omitting {CAREER_LABELS[alt_to_remove]}:")
    print(f"  N = {len(data_sub)}, H = {H:.4f}, df = {df}, p = {p_value:.4f}")
    print(f"  -> {conclusion}")
    print()

print("Interpretation:")
print("  p > 0.05: No evidence against IIA (model assumption appears valid)")
print("  p < 0.05: Evidence against IIA (consider Nested Logit or Mixed Logit)")

<a id='section5'></a>
## 5. Predicted Probabilities and Classification

### 5.1 Probability Matrix

The MNL produces a probability matrix of size $N \times J$, where each row sums to 1.

In [None]:
# Predicted probabilities
probs = results.predict_proba()

print(f"Predicted probabilities shape: {probs.shape}")
print(f"  {probs.shape[0]} observations x {probs.shape[1]} alternatives\n")

# Create DataFrame for readability
prob_df = pd.DataFrame(probs, columns=[CAREER_LABELS[j] for j in range(3)])
print("First 10 predicted probability vectors:")
print(prob_df.head(10).round(4))

# Verify rows sum to 1
row_sums = probs.sum(axis=1)
print(f"\nRow sums:")
print(f"  Min: {row_sums.min():.10f}")
print(f"  Max: {row_sums.max():.10f}")
print(f"  All equal to 1: {np.allclose(row_sums, 1.0)}")

In [None]:
# Observed vs predicted choice shares
observed_shares = data['career'].value_counts(normalize=True).sort_index()
predicted_shares = prob_df.mean()

comparison = pd.DataFrame({
    'Observed': [observed_shares[i] for i in range(3)],
    'Predicted': predicted_shares.values,
}, index=[CAREER_LABELS[i] for i in range(3)])
comparison['Difference'] = comparison['Predicted'] - comparison['Observed']

print("=== Observed vs Predicted Choice Shares ===")
print(comparison.round(4))
print(f"\nMean absolute difference: {comparison['Difference'].abs().mean():.4f}")

In [None]:
# Classification: predict most likely career
y_pred = results.predict()
y_true = data['career'].values

accuracy = np.mean(y_pred == y_true)
print(f"=== Classification Performance ===")
print(f"Overall accuracy: {accuracy:.4f} ({accuracy*100:.1f}%)")
print(f"Random baseline (1/J): {1/3:.4f} ({100/3:.1f}%)")
print(f"Improvement over random: {accuracy - 1/3:.4f}")

# Confusion matrix
cm = results.confusion_matrix
print(f"\nConfusion Matrix (rows=actual, columns=predicted):")
cm_df = pd.DataFrame(cm,
                     index=[f'Actual: {CAREER_LABELS[i]}' for i in range(3)],
                     columns=[f'Pred: {CAREER_LABELS[i]}' for i in range(3)])
print(cm_df)

In [None]:
# Visualize confusion matrix
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Heatmap of confusion matrix
labels_list = [CAREER_LABELS[i] for i in range(3)]
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', ax=axes[0],
            xticklabels=labels_list, yticklabels=labels_list,
            linewidths=1, square=True)
axes[0].set_title('Confusion Matrix (Counts)', fontweight='bold')
axes[0].set_xlabel('Predicted Career')
axes[0].set_ylabel('Actual Career')

# Normalized confusion matrix
cm_norm = cm.astype(float) / cm.sum(axis=1, keepdims=True)
sns.heatmap(cm_norm, annot=True, fmt='.2f', cmap='Blues', ax=axes[1],
            xticklabels=labels_list, yticklabels=labels_list,
            linewidths=1, square=True, vmin=0, vmax=1)
axes[1].set_title('Confusion Matrix (Row-Normalized)', fontweight='bold')
axes[1].set_xlabel('Predicted Career')
axes[1].set_ylabel('Actual Career')

plt.suptitle(f'Classification Performance (Accuracy: {accuracy:.1%})',
             fontsize=14, fontweight='bold', y=1.02)
plt.tight_layout()
plt.savefig(FIG_DIR / '06_confusion_matrix.png', dpi=150, bbox_inches='tight')
plt.show()

print("Figure saved to outputs/figures/06_confusion_matrix.png")

<a id='section6'></a>
## 6. Marginal Effects in Multinomial Logit

### 6.1 Why Coefficients Are Not Marginal Effects

In MNL, $\beta_{j,k}$ is **not** the marginal effect of $x_k$ on $P(y=j)$. The nonlinear probability structure means:

$$\frac{\partial P_j}{\partial x_k} = P_j \left[ \beta_{j,k} - \sum_{m=0}^{J-1} P_m \cdot \beta_{m,k} \right]$$

This means changing $x_k$ affects **all J probabilities** simultaneously.

### 6.2 Key Property: Sum to Zero

Since probabilities must sum to 1:

$$\sum_{j=0}^{J-1} \frac{\partial P_j}{\partial x_k} = 0$$

An increase in probability for one alternative must come at the expense of others.

### 6.3 Average Marginal Effects (AME)

We compute AME by averaging marginal effects across all observations.

In [None]:
# Compute Average Marginal Effects (AME)
ame = results.marginal_effects(at='overall')

print("=== Average Marginal Effects (AME) ===")
print("\nEffect of a one-unit increase in each variable on P(career=j)\n")

ame_df = pd.DataFrame(ame,
                      index=[CAREER_LABELS[j] for j in range(3)],
                      columns=exog_vars)
print(ame_df.round(4))

# Verify sum-to-zero property
print("\n=== Sum-to-Zero Verification ===")
print("Sum of AME across alternatives for each variable:")
for k, var in enumerate(exog_vars):
    col_sum = ame[:, k].sum()
    print(f"  {var:8s}: {col_sum:+.6f}  {'PASS' if abs(col_sum) < 1e-4 else 'CHECK'}")

In [None]:
# Compute AME with standard errors
ame_se = results.marginal_effects_se(at='overall')

print("=== AME with Standard Errors ===")
print(f"{'Variable':<10} {'Career':<12} {'AME':>10} {'SE':>10} {'z':>8} {'p':>8}")
print("-" * 60)

for k, var in enumerate(exog_vars):
    for j in range(3):
        me = ame[j, k]
        se = ame_se[j, k]
        z = me / se if se > 0 else np.nan
        p = 2 * (1 - norm.cdf(abs(z))) if not np.isnan(z) else np.nan
        sig = '***' if p < 0.01 else '**' if p < 0.05 else '*' if p < 0.1 else ''
        print(f"{var if j == 0 else '':8s}   {CAREER_LABELS[j]:<12} {me:+10.4f} {se:10.4f} {z:8.2f} {p:8.4f} {sig}")
    print()

In [None]:
# Save AME table
ame_table = pd.DataFrame(ame,
                         index=[CAREER_LABELS[j] for j in range(3)],
                         columns=exog_vars)
ame_table.to_csv(TABLE_DIR / '06_marginal_effects.csv')
print("AME table saved to outputs/tables/06_marginal_effects.csv")

In [None]:
# Visualize AME: heatmap (alternatives x variables)
fig, ax = plt.subplots(figsize=(10, 5))

sns.heatmap(ame_df, annot=True, fmt='.4f', cmap='RdBu_r', center=0,
            ax=ax, linewidths=1, square=False,
            cbar_kws={'label': 'Average Marginal Effect'})
ax.set_title('Average Marginal Effects\n(effect on P(career=j) per unit change in variable)',
             fontweight='bold')
ax.set_ylabel('Career Alternative')
ax.set_xlabel('Variable')

plt.tight_layout()
plt.savefig(FIG_DIR / '06_ame_heatmap.png', dpi=150, bbox_inches='tight')
plt.show()

print("Figure saved to outputs/figures/06_ame_heatmap.png")

In [None]:
# Plot marginal effects using the built-in method
fig = results.plot_marginal_effects(at='overall', figsize=(14, 8))
plt.savefig(FIG_DIR / '06_marginal_effects_plot.png', dpi=150, bbox_inches='tight')
plt.show()

print("Figure saved to outputs/figures/06_marginal_effects_plot.png")

<a id='section7'></a>
## 7. Application — Career Choice

### Research Question

**What factors determine career path selection (manual, technical, managerial)?**

We investigate the roles of education, experience, age, and gender in career choice.

### 7.1 Exploratory Analysis

In [None]:
# Cross-tabulation: Career by education level
data['educ_level'] = pd.cut(data['educ'], bins=[0, 9, 12, 16, 25],
                            labels=['< 9 yrs', '9-12 yrs', '12-16 yrs', '16+ yrs'])

print("=== Career Distribution by Education Level ===")
cross_educ = pd.crosstab(data['educ_level'], data['career'].map(CAREER_LABELS),
                         normalize='index').round(3)
print(cross_educ)

print("\n=== Career Distribution by Gender ===")
cross_gender = pd.crosstab(
    data['female'].map({0: 'Male', 1: 'Female'}),
    data['career'].map(CAREER_LABELS),
    normalize='index'
).round(3)
print(cross_gender)

print("\n=== Career Distribution by Location ===")
cross_urban = pd.crosstab(
    data['urban'].map({0: 'Rural', 1: 'Urban'}),
    data['career'].map(CAREER_LABELS),
    normalize='index'
).round(3)
print(cross_urban)

In [None]:
# Visualize career by education level
fig, axes = plt.subplots(1, 3, figsize=(16, 5))

# 1. By education level
cross_educ.plot(kind='bar', stacked=True, ax=axes[0],
                color=[CAREER_COLORS[c] for c in cross_educ.columns],
                alpha=0.8, edgecolor='black')
axes[0].set_title('Career by Education Level', fontweight='bold')
axes[0].set_ylabel('Proportion')
axes[0].set_xlabel('Education Level')
axes[0].tick_params(axis='x', rotation=45)
axes[0].legend(title='Career', loc='upper left')

# 2. By gender
cross_gender.plot(kind='bar', stacked=True, ax=axes[1],
                  color=[CAREER_COLORS[c] for c in cross_gender.columns],
                  alpha=0.8, edgecolor='black')
axes[1].set_title('Career by Gender', fontweight='bold')
axes[1].set_ylabel('Proportion')
axes[1].set_xlabel('')
axes[1].tick_params(axis='x', rotation=0)
axes[1].legend(title='Career')

# 3. By location
cross_urban.plot(kind='bar', stacked=True, ax=axes[2],
                 color=[CAREER_COLORS[c] for c in cross_urban.columns],
                 alpha=0.8, edgecolor='black')
axes[2].set_title('Career by Location', fontweight='bold')
axes[2].set_ylabel('Proportion')
axes[2].set_xlabel('')
axes[2].tick_params(axis='x', rotation=0)
axes[2].legend(title='Career')

plt.suptitle('Career Choice Patterns by Subgroup', fontsize=15, fontweight='bold', y=1.02)
plt.tight_layout()
plt.savefig(FIG_DIR / '06_career_by_subgroup.png', dpi=150, bbox_inches='tight')
plt.show()

print("Figure saved to outputs/figures/06_career_by_subgroup.png")

### 7.2 Full Model with All Covariates

In [None]:
# Full model with all covariates
full_vars = ['educ', 'exper', 'age', 'female', 'urban']
X_full = data[full_vars].values

model_full = MultinomialLogit(
    endog=y,
    exog=X_full,
    n_alternatives=3,
    base_alternative=0
)
model_full.exog_names = full_vars
results_full = model_full.fit()

print("=" * 70)
print(" " * 10 + "FULL MODEL: CAREER CHOICE DETERMINANTS")
print("=" * 70)
print(results_full.summary())

### 7.3 Marginal Effects: Gender Gaps in Career Access

In [None]:
# AME from full model
ame_full = results_full.marginal_effects(at='overall')

print("=== Average Marginal Effects (Full Model) ===")
ame_full_df = pd.DataFrame(
    ame_full,
    index=[CAREER_LABELS[j] for j in range(3)],
    columns=full_vars
)
print(ame_full_df.round(4))

# Focus on education and gender
print("\n=== Key Findings ===")

print("\nEducation effects (AME):")
for j in range(3):
    me = ame_full[j, 0]  # educ is first variable
    print(f"  1 extra year of education: P({CAREER_LABELS[j]}) changes by {me:+.4f} ({me*100:+.2f} pp)")

print("\nGender effects (AME of female):")
for j in range(3):
    me = ame_full[j, 3]  # female is 4th variable
    print(f"  Being female: P({CAREER_LABELS[j]}) changes by {me:+.4f} ({me*100:+.2f} pp)")

print("\nUrban effects (AME):")
for j in range(3):
    me = ame_full[j, 4]  # urban is 5th variable
    print(f"  Urban location: P({CAREER_LABELS[j]}) changes by {me:+.4f} ({me*100:+.2f} pp)")

### 7.4 Predictions for Representative Profiles

In [None]:
# Predict probabilities for representative individuals
profiles = pd.DataFrame({
    'Profile': [
        'Young woman, college degree',
        'Older man, no college',
        'Young urban male, graduate',
        'Older rural female, basic ed'
    ],
    'educ':   [16,  9,  18,  8],
    'exper':  [ 3, 25,   2, 20],
    'age':    [25, 50,  26, 48],
    'female': [ 1,  0,   0,  1],
    'urban':  [ 1,  0,   1,  0]
})

X_profiles = profiles[full_vars].values
prob_profiles = results_full.predict_proba(exog=X_profiles)

print("=== Predicted Probabilities for Representative Profiles ===")
for i, row in profiles.iterrows():
    print(f"\n{row['Profile']}:")
    print(f"  (educ={row['educ']}, exper={row['exper']}, age={row['age']}, "
          f"female={row['female']}, urban={row['urban']})")
    for j in range(3):
        bar = '|' + '#' * int(prob_profiles[i, j] * 40) + ' ' * (40 - int(prob_profiles[i, j] * 40)) + '|'
        print(f"  P({CAREER_LABELS[j]:10s}) = {prob_profiles[i, j]:.4f}  {bar}")

In [None]:
# Visualize representative profiles
fig, ax = plt.subplots(figsize=(12, 6))

x = np.arange(len(profiles))
bottom = np.zeros(len(profiles))

for j in range(3):
    label = CAREER_LABELS[j]
    values = prob_profiles[:, j]
    ax.bar(x, values, bottom=bottom, label=label,
           color=CAREER_COLORS[label], alpha=0.8, edgecolor='black')
    
    # Add percentage labels
    for i, (v, b) in enumerate(zip(values, bottom)):
        if v > 0.05:  # Only label if visible
            ax.text(i, b + v/2, f'{v:.0%}', ha='center', va='center',
                    fontsize=9, fontweight='bold')
    bottom += values

ax.set_xlabel('Profile')
ax.set_ylabel('Probability')
ax.set_title('Predicted Career Probabilities for Representative Profiles',
             fontweight='bold')
ax.set_xticks(x)
ax.set_xticklabels(profiles['Profile'], rotation=15, ha='right')
ax.legend(title='Career')
ax.set_ylim(0, 1.05)
ax.grid(True, alpha=0.3, axis='y')

plt.tight_layout()
plt.savefig(FIG_DIR / '06_career_profiles.png', dpi=150, bbox_inches='tight')
plt.show()

print("Figure saved to outputs/figures/06_career_profiles.png")

### 7.5 Observed vs Predicted Choice Shares

In [None]:
# Compare observed vs predicted choice shares
pred_probs_full = results_full.predict_proba()
obs_shares = data['career'].value_counts(normalize=True).sort_index().values
pred_shares = pred_probs_full.mean(axis=0)

fig, ax = plt.subplots(figsize=(10, 6))

x = np.arange(3)
width = 0.35

bars1 = ax.bar(x - width/2, obs_shares * 100, width, label='Observed',
               color='#3498db', alpha=0.8, edgecolor='black')
bars2 = ax.bar(x + width/2, pred_shares * 100, width, label='Predicted',
               color='#e74c3c', alpha=0.8, edgecolor='black')

for bar in list(bars1) + list(bars2):
    h = bar.get_height()
    ax.text(bar.get_x() + bar.get_width()/2, h + 0.3, f'{h:.1f}%',
            ha='center', va='bottom', fontsize=10)

ax.set_xlabel('Career')
ax.set_ylabel('Share (%)')
ax.set_title('Observed vs Predicted Career Choice Shares', fontweight='bold')
ax.set_xticks(x)
ax.set_xticklabels([CAREER_LABELS[i] for i in range(3)])
ax.legend()
ax.grid(True, alpha=0.3, axis='y')

plt.tight_layout()
plt.savefig(FIG_DIR / '06_career_shares.png', dpi=150, bbox_inches='tight')
plt.show()

print("Figure saved to outputs/figures/06_career_shares.png")

### 7.7 Summary of Findings

### 7.6 Wald Test: Cross-Alternative Coefficient Equality

We can test whether the effect of a variable (e.g., education) is the **same** across alternatives. This is a **linear restriction** on the parameters:

$$H_0: \beta_{1,\text{educ}} = \beta_{2,\text{educ}} \quad \text{(education equally important for Technical and Managerial)}$$

The Wald statistic is:

$$W = (R\hat{\beta} - q)' [R \hat{V} R']^{-1} (R\hat{\beta} - q) \sim \chi^2(m)$$

where $R$ is the restriction matrix and $q = 0$.

In [None]:
# Wald test: Is education equally important for Technical and Managerial?
# H0: β_1,educ = β_2,educ  (same effect on Technical and Managerial vs Manual)

from panelbox.utils.statistical import wald_test

# Parameter vector layout: [β_1,educ, β_1,exper, β_1,age, β_1,female, β_1,urban,
#                            β_2,educ, β_2,exper, β_2,age, β_2,female, β_2,urban]
# We want to test β_1,educ - β_2,educ = 0
# R = [1, 0, 0, 0, 0, -1, 0, 0, 0, 0], q = 0

K = len(full_vars)
n_params = results_full.params.shape[0]

# Restriction matrix: β_1,educ - β_2,educ = 0
R_educ = np.zeros((1, n_params))
R_educ[0, 0] = 1     # β_1,educ (position 0)
R_educ[0, K] = -1    # β_2,educ (position K)

stat_educ, pval_educ, df_educ = wald_test(R_educ, results_full.params, results_full.cov_params)

print("=" * 70)
print(" " * 10 + "WALD TESTS: CROSS-ALTERNATIVE RESTRICTIONS")
print("=" * 70)

print(f"\nTest 1: β_educ(Technical) = β_educ(Managerial)")
print(f"  β_educ(Technical vs Manual):   {results_full.params_matrix[0, 0]:+.4f}")
print(f"  β_educ(Managerial vs Manual):  {results_full.params_matrix[1, 0]:+.4f}")
print(f"  Wald statistic: {stat_educ:.4f}")
print(f"  p-value: {pval_educ:.4f}")
print(f"  Conclusion: {'Reject H0 — education effects differ' if pval_educ < 0.05 else 'Fail to reject — no evidence of difference'}")

# Test 2: All coefficients equal across alternatives
# H0: β_1 = β_2 (all K restrictions simultaneously)
R_all = np.zeros((K, n_params))
for k in range(K):
    R_all[k, k] = 1       # β_1,k
    R_all[k, K + k] = -1  # β_2,k

stat_all, pval_all, df_all = wald_test(R_all, results_full.params, results_full.cov_params)

print(f"\nTest 2: β(Technical) = β(Managerial) (joint test, all {K} coefficients)")
print(f"  Wald statistic: {stat_all:.4f}")
print(f"  df: {df_all}")
print(f"  p-value: {pval_all:.4f}")
print(f"  Conclusion: {'Reject H0 — alternatives have different determinants' if pval_all < 0.05 else 'Fail to reject — similar determinants'}")

# Save Wald test results
wald_df = pd.DataFrame({
    'Test': ['educ equality', 'all coefficients equality'],
    'H0': ['β_1,educ = β_2,educ', 'β_1 = β_2 (all vars)'],
    'Wald_stat': [stat_educ, stat_all],
    'df': [df_educ, df_all],
    'p_value': [pval_educ, pval_all]
})
wald_df.to_csv(TABLE_DIR / '06_wald_tests.csv', index=False)
print(f"\nWald test results saved to outputs/tables/06_wald_tests.csv")

In [None]:
print("=" * 70)
print(" " * 15 + "SUMMARY OF FINDINGS")
print("=" * 70)

print("\n1. MODEL FIT")
print(f"   Log-likelihood: {results_full.llf:.2f}")
print(f"   AIC: {results_full.aic:.2f}")
print(f"   BIC: {results_full.bic:.2f}")
print(f"   Pseudo R-squared: {results_full.pseudo_r2:.4f}")
print(f"   Prediction accuracy: {results_full.accuracy:.1%}")

print("\n2. EDUCATION")
print(f"   Education is the strongest determinant of career choice.")
print(f"   AME on P(Managerial): {ame_full[2, 0]:+.4f} per extra year")
print(f"   AME on P(Manual): {ame_full[0, 0]:+.4f} per extra year")

print("\n3. GENDER")
print(f"   Women face lower probability of technical and managerial careers.")
print(f"   AME on P(Managerial): {ame_full[2, 3]:+.4f}")
print(f"   AME on P(Manual): {ame_full[0, 3]:+.4f}")

print("\n4. LOCATION")
print(f"   Urban location increases access to non-manual careers.")
print(f"   AME on P(Managerial): {ame_full[2, 4]:+.4f}")
print(f"   AME on P(Manual): {ame_full[0, 4]:+.4f}")

print("\n" + "=" * 70)

<a id='exercises'></a>
## 8. Exercises

---

### Exercise 1: Reference Category Invariance (Easy)

Estimate the model using `base_alternative=1` (Technical as reference). Compare predicted probabilities with the base=0 model.

**Task**:
1. Estimate the model with `base_alternative=1`
2. Compare the first 5 predicted probability vectors
3. Verify that they are numerically identical

**Questions**:
- Do the coefficients change? How?
- Do the predictions change?
- What does this tell us about the role of the base category?

In [None]:
# Exercise 1: Your solution here

# Step 1: Estimate model with base_alternative=1
# TODO: model_base1 = MultinomialLogit(..., base_alternative=1)
# TODO: results_base1 = model_base1.fit()

# Step 2: Compare predicted probabilities
# TODO: probs_base0 = results_full.predict_proba()
# TODO: probs_base1 = results_base1.predict_proba()
# TODO: Compare first 5 rows

# Step 3: Verify numerical identity
# TODO: np.allclose(probs_base0, probs_base1)

---

### Exercise 2: IIA Test (Medium)

Run the Hausman-McFadden test by omitting each non-base alternative. Are the results consistent?

**Task**:
1. Estimate full model
2. Omit Technical (career=1), re-estimate, compute Hausman statistic
3. Omit Managerial (career=2), re-estimate, compute Hausman statistic
4. Compare conclusions

**Hint**: Use the code from Section 4 as a starting point.

In [None]:
# Exercise 2: Your solution here

# Step 1: Full model (already estimated as results_full)

# Step 2: Omit Technical
# TODO: Filter data, remap categories, estimate restricted model

# Step 3: Omit Managerial
# TODO: Same procedure

# Step 4: Compare Hausman statistics and p-values
# TODO: Create comparison table

---

### Exercise 3: Marginal Effects Interpretation (Medium)

Calculate AME for education and verify the sum-to-zero property.

**Task**:
1. Compute AME for education using `results_full.marginal_effects(at='overall', variable=0)`
2. Verify that effects sum to zero
3. Explain: If education increases P(Managerial), which alternatives lose probability?
4. Compare AME at mean vs AME overall — are they similar?

In [None]:
# Exercise 3: Your solution here

# Step 1: AME for education
# TODO: me_educ = results_full.marginal_effects(at='overall', variable=0)

# Step 2: Verify sum-to-zero
# TODO: print(f"Sum: {me_educ.sum():.6f}")

# Step 3: Interpret which alternatives lose
# TODO: Print and discuss

# Step 4: Compare 'overall' vs 'mean'
# TODO: me_educ_mean = results_full.marginal_effects(at='mean', variable=0)

---

### Exercise 4: Multinomial vs Conditional Logit (Hard)

Using the transportation dataset from Notebook 05, compare the approaches.

**Task**:
1. Load the transportation data
2. Estimate a Conditional Logit (alternative-specific attributes: cost, time)
3. Estimate a Multinomial Logit (individual-specific: income, distance)
4. Discuss: When is each model appropriate? What are the tradeoffs?

**Hint**: For MNL, you need to reshape the data to wide format (one row per individual-year).

In [None]:
# Exercise 4: Your solution here

# Step 1: Load transportation data
# TODO: transport = pd.read_csv(DATA_DIR / 'transportation_choice.csv')

# Step 2: Conditional Logit (from Notebook 05)
# TODO: Estimate with cost, time as alt_varying_vars

# Step 3: Multinomial Logit
# TODO: Reshape to wide format, extract individual-level variables
# TODO: Estimate MultinomialLogit with income, distance as covariates

# Step 4: Discussion
# TODO: When to use MNL vs CL?

---

### Exercise 5: Subgroup Analysis (Hard)

Estimate the career choice model separately for men and women.

**Task**:
1. Split data by gender
2. Estimate MNL for each subgroup (using educ, exper, age, urban)
3. Compare coefficients: Are determinants of career choice different by gender?
4. Compare AME across genders
5. Create a visualization showing the differences

In [None]:
# Exercise 5: Your solution here

# Step 1: Split by gender
# TODO: data_male = data[data['female'] == 0]
# TODO: data_female = data[data['female'] == 1]

# Step 2: Estimate for each subgroup
# TODO: model_male = MultinomialLogit(...)
# TODO: model_female = MultinomialLogit(...)

# Step 3: Compare coefficients
# TODO: Create comparison table

# Step 4: Compare AME
# TODO: ame_male = results_male.marginal_effects(at='overall')
# TODO: ame_female = results_female.marginal_effects(at='overall')

# Step 5: Visualize differences
# TODO: Create grouped bar chart

---

## Summary and Key Takeaways

### What We Learned

1. **Multinomial Logit** extends binary logit to J > 2 **unordered** categories

2. **Reference category normalization**: Set $\beta_{\text{base}} = 0$ for identification. This is a normalization, not a substantive choice — predictions are invariant

3. **Coefficients** are **log-odds ratios** relative to the base category:
   - $\beta_{j,k} > 0$: higher $x_k$ increases odds of $j$ vs base
   - $\beta_{j,k} \neq \partial P_j / \partial x_k$ (must compute AME)

4. **IIA assumption**: $P(j)/P(k)$ independent of other alternatives. Test with Hausman-McFadden

5. **Marginal effects** in MNL are more complex than binary logit:
   - Changing $x_k$ affects **all J probabilities** simultaneously
   - AME **sum to zero** across alternatives

6. **MNL vs Conditional Logit**: MNL uses individual-specific $X_i$ with alternative-specific $\beta_j$; CL uses alternative-specific $Z_{ij}$ with common $\gamma$

### Common Pitfalls

1. Interpreting $\beta_j$ as marginal effects (they are log-odds ratios)
2. Comparing coefficients across models with different base categories
3. Ignoring IIA when alternatives are similar
4. Treating $\beta_{\text{base}} = 0$ as a finding rather than a normalization

### Next Steps

- **Ordered Logit/Probit**: When alternatives have a natural ordering
- **Nested Logit**: Relaxes IIA by grouping similar alternatives
- **Mixed Logit**: Random coefficients for heterogeneous preferences

---

## References

### Essential Reading

1. Cameron, A. C., & Trivedi, P. K. (2005). *Microeconometrics: Methods and Applications*, Ch. 15. Cambridge University Press.

2. Train, K. (2009). *Discrete Choice Methods with Simulation*. Cambridge University Press.

### Classic Papers

3. McFadden, D. (1973). "Conditional logit analysis of qualitative choice behavior." In P. Zarembka (Ed.), *Frontiers in Econometrics*.

4. Hausman, J., & McFadden, D. (1984). "Specification tests for the multinomial logit model." *Econometrica*, 52(5), 1219-1240.

---

**Thank you for completing this tutorial!**

Questions or feedback? Visit: https://github.com/panelbox/panelbox/issues