# Multinomial Logit for Panel Data: Occupational Choice

This tutorial demonstrates multinomial logit models for panel data, using occupational choice as an example.

## Introduction

### When to Use Multinomial Logit

Multinomial logit is used when the dependent variable has **J > 2 unordered categories**. Examples:
- Occupational choice (white collar, blue collar, unemployed)
- Transportation mode choice (car, bus, train, bike)
- Brand choice (Brand A, B, C)
- Product choice across multiple categories

### Model Specification

The probability of choosing alternative $j$ is:

$$P(y_{it} = j | X_{it}) = \frac{\exp(X_{it}'\beta_j)}{\sum_{k=1}^J \exp(X_{it}'\beta_k)}$$

where one category (baseline) is normalized: $\beta_1 = 0$.

### Available Methods in Panelbox

1. **Pooled Multinomial Logit**: Standard MLE, ignores panel structure
2. **Fixed Effects**: Controls for individual heterogeneity (computationally intensive)
3. **Random Effects**: Allows for individual random effects integrated via quadrature

### Marginal Effects

For multinomial logit:
$$\frac{\partial P(y=j)}{\partial x_k} = P(y=j)\left[\beta_{jk} - \sum_m P(y=m)\beta_{mk}\right]$$

**Key property**: $\sum_j \frac{\partial P(y=j)}{\partial x_k} = 0$ (probabilities sum to 1)

In [None]:
import numpy as np
import pandas as pd
from panelbox.models.discrete import MultinomialLogit
import matplotlib.pyplot as plt
import seaborn as sns

# Set random seed
np.random.seed(42)
sns.set_style('whitegrid')

## Example: Occupational Choice

We model workers' choices among three occupations:
- **0**: Unemployed (baseline category)
- **1**: Blue Collar
- **2**: White Collar

Factors affecting choice:
- **Education**: years of schooling
- **Experience**: years of work experience
- **Age**: worker's age

### Generate Simulated Data

In [None]:
# Panel structure
n_workers = 200
n_years = 5
n_obs = n_workers * n_years

# Create panel
workers = np.repeat(np.arange(n_workers), n_years)
years = np.tile(np.arange(n_years), n_workers)

# Generate individual characteristics (varying over time)
education = np.random.randint(8, 18, n_obs)  # 8-17 years
experience = np.random.randint(0, 20, n_obs)  # 0-19 years
age = 18 + experience + np.random.randint(0, 5, n_obs)

# Standardize for better convergence
education_std = (education - education.mean()) / education.std()
experience_std = (experience - experience.mean()) / experience.std()
age_std = (age - age.mean()) / age.std()

# True parameters (Alternative 1: Blue Collar, Alternative 2: White Collar)
# vs. Unemployed (baseline)
beta_blue = np.array([0.3, 0.5, -0.2])    # [education, experience, age]
beta_white = np.array([0.8, 0.2, 0.1])    # More sensitive to education

# Generate choices
choices = []
for i in range(n_obs):
    X_i = np.array([education_std[i], experience_std[i], age_std[i]])
    
    # Utilities for each occupation
    util_unemployed = 0  # Baseline
    util_blue = X_i @ beta_blue + np.random.gumbel(0, 1)  # Type I extreme value error
    util_white = X_i @ beta_white + np.random.gumbel(0, 1)
    
    # Choose occupation with highest utility
    utils = [util_unemployed, util_blue, util_white]
    choice = np.argmax(utils)
    choices.append(choice)

# Create DataFrame
df = pd.DataFrame({
    'worker': workers,
    'year': years,
    'occupation': choices,
    'education': education,
    'experience': experience,
    'age': age,
    'education_std': education_std,
    'experience_std': experience_std,
    'age_std': age_std
})

# Add occupation labels
occupation_labels = {0: 'Unemployed', 1: 'Blue Collar', 2: 'White Collar'}
df['occupation_label'] = df['occupation'].map(occupation_labels)

print("Data Summary:")
print(f"Total observations: {len(df)}")
print(f"Number of workers: {n_workers}")
print(f"Years per worker: {n_years}")
print("\nOccupation distribution:")
print(df['occupation_label'].value_counts())
print("\nFirst observations:")
print(df.head(10))

### Descriptive Analysis

In [None]:
# Visualize occupation choices
fig, axes = plt.subplots(1, 3, figsize=(15, 4))

# Education by occupation
for occ in [0, 1, 2]:
    axes[0].hist(df[df['occupation']==occ]['education'], 
                 alpha=0.5, label=occupation_labels[occ], bins=10)
axes[0].set_xlabel('Years of Education')
axes[0].set_ylabel('Frequency')
axes[0].set_title('Education by Occupation')
axes[0].legend()

# Experience by occupation
for occ in [0, 1, 2]:
    axes[1].hist(df[df['occupation']==occ]['experience'], 
                 alpha=0.5, label=occupation_labels[occ], bins=10)
axes[1].set_xlabel('Years of Experience')
axes[1].set_ylabel('Frequency')
axes[1].set_title('Experience by Occupation')
axes[1].legend()

# Age by occupation
df.boxplot(column='age', by='occupation_label', ax=axes[2])
axes[2].set_xlabel('Occupation')
axes[2].set_ylabel('Age')
axes[2].set_title('Age Distribution by Occupation')
plt.suptitle('')  # Remove default title

plt.tight_layout()
plt.show()

## Model 1: Pooled Multinomial Logit

Standard multinomial logit without accounting for panel structure.

In [None]:
# Prepare data
y = df['occupation'].values
X = df[['education_std', 'experience_std', 'age_std']].values

# Estimate pooled multinomial logit
model_pooled = MultinomialLogit(
    endog=y,
    exog=X,
    n_alternatives=3,
    base_alternative=0,  # Unemployed is baseline
    method='pooled'
)

result_pooled = model_pooled.fit(maxiter=1000)
print(result_pooled.summary())

### Interpret Coefficients

Coefficients represent the effect on the **log odds** of choosing that occupation vs. unemployment:
- Positive coefficient → increases probability of that occupation vs. baseline
- Negative coefficient → decreases probability vs. baseline

**Example interpretation**:
- If $\beta_{\text{education, white}}$ is positive and large → higher education increases probability of white collar vs. unemployment

In [None]:
# Compare estimated vs. true parameters
print("\nParameter Comparison:")
print("="*60)
print("\nBlue Collar (vs. Unemployed):")
print(f"  True:      {beta_blue}")
print(f"  Estimated: {result_pooled.params_matrix[0]}")
print("\nWhite Collar (vs. Unemployed):")
print(f"  True:      {beta_white}")
print(f"  Estimated: {result_pooled.params_matrix[1]}")

## Marginal Effects

Marginal effects show how a one-unit change in $X$ affects the **probability** of each occupation.

In [None]:
# Compute average marginal effects
me = result_pooled.marginal_effects(at='mean')

print("\nAverage Marginal Effects (at mean values):")
print("="*70)
print("\nShape:", me.shape, "(J alternatives × K covariates)")
print("\nMarginal Effects:")
print(pd.DataFrame(
    me,
    index=['Unemployed', 'Blue Collar', 'White Collar'],
    columns=['Education', 'Experience', 'Age']
))

# Verify they sum to zero
print("\nSum across alternatives (should be ≈ 0):")
print(me.sum(axis=0))

In [None]:
# Visualize marginal effects
me_df = pd.DataFrame(
    me,
    index=['Unemployed', 'Blue Collar', 'White Collar'],
    columns=['Education', 'Experience', 'Age']
)

fig, ax = plt.subplots(figsize=(10, 6))
me_df.T.plot(kind='bar', ax=ax, rot=0)
ax.set_ylabel('Marginal Effect')
ax.set_xlabel('Covariate')
ax.set_title('Average Marginal Effects by Occupation')
ax.axhline(y=0, color='black', linestyle='--', linewidth=0.8)
ax.legend(title='Occupation', bbox_to_anchor=(1.05, 1), loc='upper left')
plt.tight_layout()
plt.show()

### Interpretation of Marginal Effects

**Example**: If ME(Education, White Collar) = 0.15:
- A 1-year increase in education → 15 percentage point increase in probability of white collar
- This must be offset by decreases in other categories (since probabilities sum to 1)

**Key insight**: Marginal effects can differ in sign from coefficients!

## Predictions

Predict occupation probabilities and choices for new observations.

In [None]:
# Get predicted probabilities for in-sample data
probs = result_pooled.predict_proba()
choices_pred = result_pooled.predict()

print("Predicted probabilities (first 10 observations):")
print(pd.DataFrame(
    probs[:10],
    columns=['P(Unemployed)', 'P(Blue Collar)', 'P(White Collar)']
))

# Prediction accuracy
accuracy = (choices_pred == y).mean()
print(f"\nPrediction accuracy: {accuracy:.2%}")

# Confusion matrix
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y, choices_pred)
print("\nConfusion Matrix:")
print(pd.DataFrame(
    cm,
    index=['Actual: ' + occupation_labels[i] for i in range(3)],
    columns=['Pred: ' + occupation_labels[i] for i in range(3)]
))

## Predict for New Individuals

Example: What's the probability for someone with 16 years education, 5 years experience, age 30?

In [None]:
# Create new observation (standardized)
new_education = (16 - education.mean()) / education.std()
new_experience = (5 - experience.mean()) / experience.std()
new_age = (30 - age.mean()) / age.std()

X_new = np.array([[new_education, new_experience, new_age]])

# Predict
prob_new = result_pooled.predict_proba(X_new)
choice_new = result_pooled.predict(X_new)

print("New individual profile:")
print(f"  Education: 16 years")
print(f"  Experience: 5 years")
print(f"  Age: 30 years")
print("\nPredicted probabilities:")
for i, occ in occupation_labels.items():
    print(f"  {occ}: {prob_new[0, i]:.2%}")
print(f"\nMost likely occupation: {occupation_labels[choice_new[0]]}")

## Sensitivity Analysis

How do probabilities change as education varies?

In [None]:
# Vary education from 8 to 18 years
education_range = np.linspace(8, 18, 50)
probs_by_education = []

for ed in education_range:
    ed_std = (ed - education.mean()) / education.std()
    exp_std = (10 - experience.mean()) / experience.std()  # Fix experience at 10
    age_std = (35 - age.mean()) / age.std()  # Fix age at 35
    
    X_temp = np.array([[ed_std, exp_std, age_std]])
    prob_temp = result_pooled.predict_proba(X_temp)
    probs_by_education.append(prob_temp[0])

probs_by_education = np.array(probs_by_education)

# Plot
plt.figure(figsize=(10, 6))
for i, occ in occupation_labels.items():
    plt.plot(education_range, probs_by_education[:, i], 
             label=occ, linewidth=2)

plt.xlabel('Years of Education')
plt.ylabel('Probability')
plt.title('Occupation Probabilities vs. Education\n(Experience=10, Age=35)')
plt.legend()
plt.grid(alpha=0.3)
plt.tight_layout()
plt.show()

## Summary and Best Practices

### Key Takeaways

1. **Multinomial logit** models unordered categorical outcomes (J > 2)
2. **Coefficients** represent log-odds ratios vs. baseline category
3. **Marginal effects** show impact on probabilities (more interpretable)
4. **Probabilities must sum to 1** → marginal effects sum to 0
5. **IIA assumption**: Independence of Irrelevant Alternatives (can be limiting)

### When to Use Each Method

- **Pooled**: Simple, fast, ignores panel structure
- **Fixed Effects**: Controls for individual heterogeneity (slow for J > 4)
- **Random Effects**: Middle ground, assumes random individual effects

### Model Diagnostics

- **Prediction accuracy**: How well does model classify?
- **Pseudo R²**: Goodness of fit (McFadden's R²)
- **IIA tests**: Test for violations of independence assumption

### Extensions

- **Nested logit**: Relaxes IIA for grouped alternatives
- **Mixed logit**: Allows random coefficients
- **Conditional logit**: For alternative-specific attributes