# Solutions: Tutorial 02 - Model Specification with Formulas

**Series**: PanelBox - Fundamentals (Solutions)
**Level**: Beginner to Intermediate
**Tutorial**: 02_formulas_specification.ipynb

This notebook contains complete solutions to the exercises in Tutorial 02.

---

## Setup and Imports

In [None]:
# Standard libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from IPython.display import display
from patsy import dmatrices

# PanelBox library
import sys
sys.path.append('/home/guhaase/projetos/panelbox')
import panelbox as pb
from panelbox.core.panel_data import PanelData

# Configuration
plt.style.use('seaborn-v0_8-darkgrid')
sns.set_palette("husl")
plt.rcParams['figure.figsize'] = (10, 6)
pd.set_option('display.max_columns', None)
pd.set_option('display.precision', 4)

print(f"PanelBox version: {pb.__version__}")
print("Setup complete!")

## Load Data

In [None]:
# Load Grunfeld dataset
import os
data_path = '/home/guhaase/projetos/panelbox/examples/datasets/grunfeld.csv'
data = pd.read_csv(data_path)

# Quick preview
print(f"Dataset loaded: {data.shape[0]} observations")
display(data.head())

---

## Exercise 1: Log-Log Model

**Task**: Specify a log-log model (Cobb-Douglas style):
$$
\log(\text{Investment}_{it}) = \beta_0 + \beta_1 \log(\text{Value}_{it}) + \beta_2 \log(\text{Capital}_{it}) + \varepsilon_{it}
$$

In [None]:
print("="*70)
print("SOLUTION 1: LOG-LOG MODEL")
print("="*70)

# Step 1: Write the formula
formula = "np.log(invest) ~ np.log(value) + np.log(capital)"
print(f"\nFormula: '{formula}'")
print(f"\nModel specification:")
print(f"  log(invest) = β₀ + β₁·log(value) + β₂·log(capital) + ε")
print(f"\nInterpretation:")
print(f"  β₁ = elasticity of investment with respect to value")
print(f"  β₂ = elasticity of investment with respect to capital")
print(f"  Example: If β₁ = 0.8, a 10% increase in value → 8% increase in investment")

In [None]:
# Step 2: Create design matrix
y, X = dmatrices(formula, data=data, return_type='dataframe')

print(f"\nDesign matrix created:")
print(f"  y (dependent) shape: {y.shape}")
print(f"  y (dependent) variable: {y.columns[0]}")
print(f"\n  X (predictors) shape: {X.shape}")
print(f"  X (predictors) columns: {list(X.columns)}")

In [None]:
# Step 3: Display first 10 rows
print("\nFirst 10 rows of design matrices:")
print("\nDependent variable (y):")
display(y.head(10))

print("\nPredictors (X):")
display(X.head(10))

In [None]:
# Additional analysis: Verify transformations
print("\n" + "-"*70)
print("VERIFICATION")
print("-"*70)

sample_idx = 0
print(f"\nObservation {sample_idx}:")
print(f"  Original invest: {data['invest'].iloc[sample_idx]:.4f}")
print(f"  log(invest): {np.log(data['invest'].iloc[sample_idx]):.4f}")
print(f"  From formula: {y.iloc[sample_idx, 0]:.4f}")
print(f"  Match: {np.isclose(np.log(data['invest'].iloc[sample_idx]), y.iloc[sample_idx, 0])}")

print(f"\n  Original value: {data['value'].iloc[sample_idx]:.4f}")
print(f"  log(value): {np.log(data['value'].iloc[sample_idx]):.4f}")
print(f"  From formula: {X['np.log(value)'].iloc[sample_idx]:.4f}")
print(f"  Match: {np.isclose(np.log(data['value'].iloc[sample_idx]), X['np.log(value)'].iloc[sample_idx])}")

---

## Exercise 2: Polynomial Model

**Task**: Test for diminishing returns by including value, value², and value³:
$$
\text{Investment}_{it} = \beta_0 + \beta_1 \text{Value}_{it} + \beta_2 \text{Value}_{it}^2 + \beta_3 \text{Value}_{it}^3 + \beta_4 \text{Capital}_{it} + \varepsilon_{it}
$$

In [None]:
print("="*70)
print("SOLUTION 2: POLYNOMIAL MODEL")
print("="*70)

# Step 1: Write the formula (using I() to protect exponents)
formula_poly = "invest ~ value + I(value**2) + I(value**3) + capital"
print(f"\nFormula: '{formula_poly}'")
print(f"\nModel specification:")
print(f"  invest = β₀ + β₁·value + β₂·value² + β₃·value³ + β₄·capital + ε")
print(f"\nInterpretation:")
print(f"  Marginal effect of value = β₁ + 2·β₂·value + 3·β₃·value²")
print(f"  The effect of value depends on its level (non-constant marginal effect)")
print(f"  Can capture diminishing returns (β₂ < 0) or accelerating returns (β₂ > 0)")

In [None]:
# Step 2: Create design matrix
y_poly, X_poly = dmatrices(formula_poly, data=data, return_type='dataframe')

print(f"\nDesign matrix created:")
print(f"  X shape: {X_poly.shape}")
print(f"  X columns: {list(X_poly.columns)}")

In [None]:
# Step 3: How many columns?
print(f"\n" + "-"*70)
print("COLUMN COUNT")
print("-"*70)

print(f"\nTotal columns in X: {X_poly.shape[1]}")
print(f"\nBreakdown:")
print(f"  - Intercept: 1")
print(f"  - value: 1")
print(f"  - value²: 1")
print(f"  - value³: 1")
print(f"  - capital: 1")
print(f"  - Total: 5")

print(f"\nFirst 10 rows of X:")
display(X_poly.head(10))

In [None]:
# Visualization: Polynomial fit
fig, ax = plt.subplots(figsize=(10, 6))

ax.scatter(data['value'], data['invest'], alpha=0.5, label='Actual data', s=50)
ax.set_xlabel('Value', fontsize=12, fontweight='bold')
ax.set_ylabel('Investment', fontsize=12, fontweight='bold')
ax.set_title('Investment vs Value (Polynomial Model)', fontsize=14, fontweight='bold')
ax.legend()
ax.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

print("\nNote: Cubic polynomial can capture complex non-linear relationships")

---

## Exercise 3: Interaction Effects

**Task**: Model how the effect of value depends on a post-war indicator:
$$
\text{Investment}_{it} = \beta_0 + \beta_1 \text{Value}_{it} + \beta_2 \text{Post1945}_t + \beta_3 (\text{Value}_{it} \times \text{Post1945}_t) + \beta_4 \text{Capital}_{it} + \varepsilon_{it}
$$

In [None]:
print("="*70)
print("SOLUTION 3: INTERACTION EFFECTS")
print("="*70)

# Step 1: Create post_1945 dummy
data['post_1945'] = (data['year'] > 1945).astype(int)

print(f"\nCreated post_1945 indicator:")
print(f"  = 1 if year > 1945")
print(f"  = 0 if year ≤ 1945")

print(f"\nDistribution:")
print(data['post_1945'].value_counts().sort_index())
print(f"\nYear range: {data['year'].min()} to {data['year'].max()}")

In [None]:
# Step 2: Write formula with interaction (using * operator)
formula_int = "invest ~ value * post_1945 + capital"

print(f"\nFormula: '{formula_int}'")
print(f"\nExpands to: 'invest ~ value + post_1945 + value:post_1945 + capital'")
print(f"\nModel specification:")
print(f"  invest = β₀ + β₁·value + β₂·post_1945 + β₃·(value × post_1945) + β₄·capital + ε")

# Create design matrix
y_int, X_int = dmatrices(formula_int, data=data, return_type='dataframe')

print(f"\nDesign matrix:")
print(f"  X shape: {X_int.shape}")
print(f"  X columns: {list(X_int.columns)}")

print(f"\nFirst 10 rows:")
display(X_int.head(10))

In [None]:
# Step 3: Interpret β₃
print("\n" + "-"*70)
print("INTERPRETING β₃ (INTERACTION COEFFICIENT)")
print("-"*70)

print(f"\nMarginal effect of value on investment:")
print(f"\nBefore 1945 (post_1945 = 0):")
print(f"  ∂invest/∂value = β₁")
print(f"  Effect of value is just β₁")

print(f"\nAfter 1945 (post_1945 = 1):")
print(f"  ∂invest/∂value = β₁ + β₃")
print(f"  Effect of value is β₁ plus the interaction β₃")

print(f"\nInterpretation of β₃:")
print(f"  β₃ = Change in the effect of value after 1945")
print(f"  β₃ > 0 → Value became MORE important after 1945")
print(f"  β₃ < 0 → Value became LESS important after 1945")
print(f"  β₃ = 0 → No change in the effect of value")

print(f"\nEconomic interpretation:")
print(f"  Post-WWII structural change in investment behavior")
print(f"  Did financial markets (value) become more important after the war?")

In [None]:
# Visualization: Scatter plot with pre/post 1945
fig, ax = plt.subplots(figsize=(10, 6))

# Pre-1945
pre_data = data[data['post_1945'] == 0]
ax.scatter(pre_data['value'], pre_data['invest'], 
           alpha=0.6, label='Pre-1945', marker='o', s=50, color='blue')

# Post-1945
post_data = data[data['post_1945'] == 1]
ax.scatter(post_data['value'], post_data['invest'], 
           alpha=0.6, label='Post-1945', marker='^', s=50, color='red')

ax.set_xlabel('Value', fontsize=12, fontweight='bold')
ax.set_ylabel('Investment', fontsize=12, fontweight='bold')
ax.set_title('Investment vs Value: Pre- vs Post-1945', fontsize=14, fontweight='bold')
ax.legend()
ax.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

print("\nVisual inspection suggests different slopes pre- vs post-1945")
print("This is what the interaction term β₃ captures!")

---

## Exercise 4: Two-Way FE with Interaction

**Task**: Combine two-way fixed effects with an interaction:
$$
\text{Investment}_{it} = \beta_1 \text{Value}_{it} + \beta_2 \text{Capital}_{it} + \beta_3 (\text{Value}_{it} \times \text{Capital}_{it}) + \alpha_i + \lambda_t + \varepsilon_{it}
$$

In [None]:
print("="*70)
print("SOLUTION 4: TWO-WAY FIXED EFFECTS WITH INTERACTION")
print("="*70)

# Step 1: Write complete formula
formula_twfe = "invest ~ value * capital + C(firm) + C(year)"

print(f"\nFormula: '{formula_twfe}'")
print(f"\nExpands to:")
print(f"  'invest ~ value + capital + value:capital + C(firm) + C(year)'")
print(f"\nModel specification:")
print(f"  invest = β₁·value + β₂·capital + β₃·(value × capital) + α_firm + λ_year + ε")
print(f"\nComponents:")
print(f"  - Main effect of value: β₁")
print(f"  - Main effect of capital: β₂")
print(f"  - Interaction: β₃")
print(f"  - Firm fixed effects: α_firm (controls for time-invariant firm characteristics)")
print(f"  - Year fixed effects: λ_year (controls for common time shocks)")

In [None]:
# Step 2: Create design matrix and count columns
y_twfe, X_twfe = dmatrices(formula_twfe, data=data, return_type='dataframe')

n_firms = data['firm'].nunique()
n_years = data['year'].nunique()

print(f"\n" + "-"*70)
print("COLUMN COUNT")
print("-"*70)

print(f"\nTotal columns in X: {X_twfe.shape[1]}")
print(f"\nBreakdown:")
print(f"  - Intercept: 1")
print(f"  - value: 1")
print(f"  - capital: 1")
print(f"  - value:capital (interaction): 1")
print(f"  - Firm dummies: {n_firms - 1} (out of {n_firms} firms, 1 is reference)")
print(f"  - Year dummies: {n_years - 1} (out of {n_years} years, 1 is reference)")
print(f"  - Total: 1 + 3 + {n_firms-1} + {n_years-1} = {X_twfe.shape[1]}")

print(f"\nX columns (first 15):")
print(list(X_twfe.columns[:15]))

In [None]:
# Display first rows (selected columns)
print(f"\nFirst 10 rows (selected columns):")
selected_cols = ['Intercept', 'value', 'capital', 'value:capital']
display(X_twfe[selected_cols].head(10))

In [None]:
# Step 3: Why include interaction?
print("\n" + "-"*70)
print("WHY INCLUDE INTERACTION WITH FIXED EFFECTS?")
print("-"*70)

print(f"\nReason 1: Effect heterogeneity")
print(f"  The effect of value might depend on capital stock level")
print(f"  Marginal effect of value = β₁ + β₃·capital")
print(f"  - Low capital firms: Effect is β₁ + β₃·(low)")
print(f"  - High capital firms: Effect is β₁ + β₃·(high)")

print(f"\nReason 2: Complementarity or substitutability")
print(f"  β₃ > 0 → Value and capital are COMPLEMENTS")
print(f"    (Value matters more when capital is high)")
print(f"  β₃ < 0 → Value and capital are SUBSTITUTES")
print(f"    (Value matters less when capital is high)")

print(f"\nReason 3: Fixed effects orthogonal to interaction")
print(f"  Firm FE: Controls for time-invariant firm traits")
print(f"  Year FE: Controls for common time shocks")
print(f"  Interaction: Captures within-firm, within-year variation")
print(f"  All three can (and should) be included together")

print(f"\nEconomic example:")
print(f"  Firms with more capital might respond differently to value shocks")
print(f"  FE controls for baseline differences; interaction captures differential response")

---

## Summary

In these exercises, you practiced:

✅ **Exercise 1**: Log-log model for elasticities
✅ **Exercise 2**: Polynomial model with quadratic and cubic terms
✅ **Exercise 3**: Interaction effects with dummy variables
✅ **Exercise 4**: Combining interactions with two-way fixed effects

### Key Skills Acquired

1. **Transformations**: Using `np.log()` and `I()` for non-linear relationships
2. **Interactions**: Understanding `*` operator and interpretation of β₃
3. **Fixed effects**: Combining `C(entity)` and `C(time)` with other terms
4. **Economic interpretation**: Translating formulas into economic meaning

### Formula Quick Reference

```python
# Log-log (elasticities)
"np.log(y) ~ np.log(x1) + np.log(x2)"

# Polynomial
"y ~ x + I(x**2) + I(x**3)"

# Interaction (full)
"y ~ x1 * x2"  # Expands to: x1 + x2 + x1:x2

# Two-way FE with interaction
"y ~ x1 * x2 + C(entity) + C(time)"
```

---

### Next Steps

Continue to **Tutorial 03: Estimation and Results Interpretation** to actually estimate these models and interpret the coefficients!

---

In [None]:
print("="*70)
print("SOLUTIONS COMPLETED!")
print("="*70)
print("\nYou've successfully completed all exercises in Tutorial 02.")
print("Next: Tutorial 03 - Estimation and Results Interpretation")
print("\nGreat work!")