# 06 - Systematic Comparison of Panel Estimators (Capstone)

**Level**: Advanced (Capstone)  
**Estimated Duration**: 90-120 minutes  
**Prerequisites**: Notebooks 01-05  

---

## Learning Objectives

By the end of this notebook, you will be able to:

1. **Apply** complete workflow: EDA ‚Üí Estimation ‚Üí Testing ‚Üí Model Selection
2. **Estimate** all static panel estimators systematically (Pooled, FE, RE, FD, BE, IV)
3. **Compare** coefficients, standard errors, and R¬≤ across models
4. **Conduct** specification tests (F-test, Hausman, first-stage F)
5. **Make** informed model selection based on data characteristics and tests
6. **Interpret** results in economic context
7. **Present** findings in professional format (tables, plots)

---

## Overview

This **capstone notebook** integrates all concepts from the static panel models series. We'll work through a complete empirical analysis using best practices:

- Start with a research question
- Explore data thoroughly
- Estimate all relevant models
- Test model assumptions
- Select the appropriate specification
- Interpret results economically
- Present findings professionally

**Research Question**: Does R&D investment increase firm productivity?

This question involves:
- Unobserved firm quality (fixed effects)
- Potential endogeneity (simultaneity)
- Multiple estimation approaches

Let's begin!

---

# Setup

Import required packages and configure settings.

In [None]:
# Standard imports
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats

# PanelBox
import panelbox as pb
from panelbox.models.static import PooledOLS, FixedEffectsOLS, RandomEffectsGLS
from panelbox.models.static import FirstDifferenceOLS, BetweenEstimator
from panelbox import PanelIV
from panelbox.tests import HausmanTest

# Visualization settings
sns.set_style('whitegrid')
plt.rcParams['figure.figsize'] = (12, 6)
plt.rcParams['font.size'] = 11

# Display settings
pd.set_option('display.max_columns', None)
pd.set_option('display.precision', 4)
pd.set_option('display.width', 100)

# Random seed for reproducibility
np.random.seed(42)

print("PanelBox version:", pb.__version__)
print("Setup complete!")

---

# Section 1: Research Question and Data

## 1.1 Applied Problem: R&D and Firm Productivity

### Research Question

**Does R&D investment increase firm productivity?**

This is a fundamental question in innovation economics:

- **Policy relevance**: R&D subsidies and tax credits
- **Firm strategy**: How much to invest in R&D?
- **Growth theory**: Innovation drives long-run growth

### Challenges

1. **Unobserved firm quality** (Œ±_i)
   - High-quality firms may invest more in R&D AND be more productive
   - Creates spurious correlation
   
2. **Simultaneity**
   - Productive firms can afford more R&D
   - R&D increases productivity
   - Direction of causality unclear
   
3. **Measurement issues**
   - R&D stock vs. flow
   - Depreciation rates
   - Spillovers

### Empirical Strategy

We'll address these challenges by:
- Using panel data to control for Œ±_i (fixed effects)
- Instrumenting R&D with lagged values (IV)
- Comparing all estimators systematically

## 1.2 Load and Describe Data

We'll create a synthetic firm productivity panel based on realistic patterns from manufacturing data.

In [None]:
# Create synthetic firm productivity data
# Based on realistic manufacturing patterns

np.random.seed(42)
N = 150  # Number of firms
T = 10   # Number of years

# Firm-specific characteristics
firm_quality = np.random.normal(0, 1, N)  # Unobserved quality (Œ±_i)
firm_age_base = np.random.randint(5, 30, N)  # Base age in years

# Generate panel
data_list = []

for i in range(N):
    # Firm-specific initial conditions
    log_capital_base = 8 + 0.5 * firm_quality[i] + np.random.normal(0, 0.5)
    log_labor_base = 4 + 0.3 * firm_quality[i] + np.random.normal(0, 0.3)
    log_rd_base = 5 + 0.6 * firm_quality[i] + np.random.normal(0, 0.4)
    
    for t in range(T):
        # Time-varying shock (creates endogeneity)
        omega_it = np.random.normal(0, 0.3)
        
        # Inputs (with growth and shocks)
        log_capital = log_capital_base + 0.02 * t + np.random.normal(0, 0.1)
        log_labor = log_labor_base + 0.01 * t + np.random.normal(0, 0.1)
        
        # R&D (endogenous: responds to productivity shocks)
        log_rd = log_rd_base + 0.03 * t + 0.5 * omega_it + np.random.normal(0, 0.2)
        
        # Productivity (Cobb-Douglas with true R&D effect = 0.15)
        log_tfp = (
            2.0 +                           # Constant
            0.35 * log_capital +            # Capital elasticity
            0.45 * log_labor +              # Labor elasticity
            0.15 * log_rd +                 # R&D elasticity (TRUE EFFECT)
            firm_quality[i] +               # Fixed effect
            omega_it +                      # Time-varying shock
            np.random.normal(0, 0.1)        # Idiosyncratic error
        )
        
        # Firm age
        firm_age = firm_age_base[i] + t
        
        data_list.append({
            'firm': i + 1,
            'year': 2010 + t,
            'log_tfp': log_tfp,
            'log_capital': log_capital,
            'log_labor': log_labor,
            'log_rd': log_rd,
            'firm_age': firm_age,
            'true_quality': firm_quality[i]  # For verification only
        })

# Create DataFrame
data = pd.DataFrame(data_list)

print("="*70)
print("FIRM PRODUCTIVITY PANEL DATA")
print("="*70)
print(f"\nDataset Shape: {data.shape}")
print(f"\nVariables: {data.columns.tolist()}")
print(f"\nPanel Structure:")
print(f"  N (firms):     {data['firm'].nunique()}")
print(f"  T (years):     {data['year'].nunique()}")
print(f"  Total obs:     {len(data)}")
print(f"  Balanced:      {data.groupby('firm').size().nunique() == 1}")

print(f"\nSummary Statistics:")
print(data[['log_tfp', 'log_capital', 'log_labor', 'log_rd', 'firm_age']].describe())

print("\nFirst 10 observations:")
print(data.head(10))

---

# Section 2: Exploratory Data Analysis

Before estimation, we explore:
- Time series patterns
- Cross-sectional variation
- Within vs. between variation

## 2.1 Trajectories - Spaghetti Plot

In [None]:
# Spaghetti plot: TFP trajectories (sample of firms)
sample_firms = data['firm'].unique()[:15]

fig, axes = plt.subplots(1, 2, figsize=(15, 6))

# Left: TFP trajectories
for firm in sample_firms:
    firm_data = data[data['firm'] == firm]
    axes[0].plot(firm_data['year'], firm_data['log_tfp'], alpha=0.6, marker='o', linewidth=1.5)

axes[0].set_xlabel('Year', fontsize=12)
axes[0].set_ylabel('Log TFP', fontsize=12)
axes[0].set_title('TFP Trajectories (Sample of 15 Firms)', fontsize=13, fontweight='bold')
axes[0].grid(alpha=0.3)

# Right: R&D trajectories
for firm in sample_firms:
    firm_data = data[data['firm'] == firm]
    axes[1].plot(firm_data['year'], firm_data['log_rd'], alpha=0.6, marker='s', linewidth=1.5)

axes[1].set_xlabel('Year', fontsize=12)
axes[1].set_ylabel('Log R&D Stock', fontsize=12)
axes[1].set_title('R&D Investment Trajectories (Sample of 15 Firms)', fontsize=13, fontweight='bold')
axes[1].grid(alpha=0.3)

plt.tight_layout()
plt.show()

print("Observations:")
print("  - Substantial heterogeneity across firms (different levels)")
print("  - Generally increasing trends over time")
print("  - Both within-firm changes AND between-firm differences")

## 2.2 Variance Decomposition

**Key Question**: Is variation mainly within firms (over time) or between firms (cross-sectional)?

This helps us understand:
- Whether FE will be effective (needs within variation)
- Whether BE will be informative (needs between variation)
- Which estimator is likely most efficient

In [None]:
# Within vs Between variance decomposition
print("="*70)
print("VARIANCE DECOMPOSITION")
print("="*70)

for var in ['log_tfp', 'log_capital', 'log_labor', 'log_rd']:
    # Total variance
    var_total = data[var].var()
    
    # Within variance (deviations from firm means)
    var_within = data.groupby('firm')[var].transform(lambda x: x - x.mean()).var()
    
    # Between variance (variance of firm means)
    var_between = data.groupby('firm')[var].mean().var()
    
    print(f"\n{var}:")
    print(f"  Total:   {var_total:.4f}")
    print(f"  Within:  {var_within:.4f} ({var_within/var_total*100:5.1f}%)")
    print(f"  Between: {var_between:.4f} ({var_between/var_total*100:5.1f}%)")
    
    # Interpretation
    if var_within/var_total > 0.5:
        print(f"  ‚Üí Mainly within variation (FE effective)")
    else:
        print(f"  ‚Üí Mainly between variation (BE informative)")

print("\n" + "="*70)
print("INTERPRETATION:")
print("="*70)
print("- R&D has substantial WITHIN variation ‚Üí FE can identify effect")
print("- TFP also varies within firms ‚Üí Panel methods useful")
print("- Capital/Labor more between than within ‚Üí Fixed effects may reduce precision")

## 2.3 Scatter Plots - Relationships

In [None]:
# Scatter plot: R&D vs TFP
fig, axes = plt.subplots(1, 2, figsize=(15, 6))

# Left: Raw data
axes[0].scatter(data['log_rd'], data['log_tfp'], alpha=0.3, s=20, color='steelblue')
axes[0].set_xlabel('Log R&D Stock', fontsize=12)
axes[0].set_ylabel('Log TFP', fontsize=12)
axes[0].set_title('Raw Data: Pooled Relationship', fontsize=13, fontweight='bold')

# Add OLS line
z = np.polyfit(data['log_rd'], data['log_tfp'], 1)
p = np.poly1d(z)
axes[0].plot(data['log_rd'], p(data['log_rd']), "r-", linewidth=2, label=f'Slope = {z[0]:.3f}')
axes[0].legend()
axes[0].grid(alpha=0.3)

# Right: Demeaned data (within variation)
data['log_rd_dm'] = data.groupby('firm')['log_rd'].transform(lambda x: x - x.mean())
data['log_tfp_dm'] = data.groupby('firm')['log_tfp'].transform(lambda x: x - x.mean())

axes[1].scatter(data['log_rd_dm'], data['log_tfp_dm'], alpha=0.3, s=20, color='darkgreen')
axes[1].set_xlabel('Log R&D Stock (demeaned)', fontsize=12)
axes[1].set_ylabel('Log TFP (demeaned)', fontsize=12)
axes[1].set_title('Demeaned Data: Within-Firm Relationship', fontsize=13, fontweight='bold')

# Add FE line
z_fe = np.polyfit(data['log_rd_dm'], data['log_tfp_dm'], 1)
p_fe = np.poly1d(z_fe)
axes[1].plot(data['log_rd_dm'], p_fe(data['log_rd_dm']), "r-", linewidth=2, label=f'Slope = {z_fe[0]:.3f}')
axes[1].axhline(0, color='black', linestyle='--', alpha=0.5)
axes[1].axvline(0, color='black', linestyle='--', alpha=0.5)
axes[1].legend()
axes[1].grid(alpha=0.3)

plt.tight_layout()
plt.show()

print("Key Insights:")
print(f"  - Pooled slope:  {z[0]:.3f} (between + within)")
print(f"  - Within slope:  {z_fe[0]:.3f} (FE estimate preview)")
print(f"  - True effect:   0.150 (known from data generation)")
print("\n  ‚Üí Pooled is biased upward (unobserved quality confounds)")
print("  ‚Üí Within is closer to truth (controls for fixed quality)")

---

# Section 3: Estimate All Models

Now we systematically estimate all major panel estimators:

1. **Pooled OLS** - Baseline (ignores panel structure)
2. **Fixed Effects (One-Way)** - Controls for Œ±_i
3. **Fixed Effects (Two-Way)** - Controls for Œ±_i and time effects
4. **Random Effects** - Efficient if Œ±_i uncorrelated with X
5. **First Difference** - Alternative to FE
6. **Between Estimator** - Cross-sectional variation only
7. **IV-FE** - Addresses endogeneity beyond Œ±_i

## 3.1 Pooled OLS

In [None]:
print("="*70)
print("ESTIMATOR 1: POOLED OLS")
print("="*70)

pooled = PooledOLS(
    formula="log_tfp ~ log_capital + log_labor + log_rd",
    data=data,
    entity_col='firm',
    time_col='year'
)

res_pooled = pooled.fit(cov_type='clustered')

print(res_pooled.summary())
print("\nNote: Standard errors clustered by firm to account for within-firm correlation")

## 3.2 Fixed Effects (One-Way)

In [None]:
print("="*70)
print("ESTIMATOR 2: FIXED EFFECTS (ONE-WAY)")
print("="*70)

fe = FixedEffectsOLS(
    formula="log_tfp ~ log_capital + log_labor + log_rd",
    data=data,
    entity_col='firm',
    time_col='year'
)

res_fe = fe.fit(cov_type='clustered')

print(res_fe.summary())

## 3.3 Fixed Effects (Two-Way)

In [None]:
print("="*70)
print("ESTIMATOR 3: FIXED EFFECTS (TWO-WAY)")
print("="*70)
print("Controls for entity AND time fixed effects")
print("="*70)

fe_2way = FixedEffectsOLS(
    formula="log_tfp ~ log_capital + log_labor + log_rd",
    data=data,
    entity_col='firm',
    time_col='year',
    time_effects=True
)

res_fe_2way = fe_2way.fit(cov_type='clustered')

print(res_fe_2way.summary())

## 3.4 Random Effects

In [None]:
print("="*70)
print("ESTIMATOR 4: RANDOM EFFECTS (GLS)")
print("="*70)
print("Assumes Œ±_i uncorrelated with regressors")
print("="*70)

re = RandomEffectsGLS(
    formula="log_tfp ~ log_capital + log_labor + log_rd + firm_age",
    data=data,
    entity_col='firm',
    time_col='year'
)

res_re = re.fit()

print(res_re.summary())
print("\nNote: firm_age included (time-varying, so not absorbed by FE)")

## 3.5 First Difference

In [None]:
print("="*70)
print("ESTIMATOR 5: FIRST DIFFERENCE")
print("="*70)
print("Estimates using changes (Œîy_it = Œîx_it Œ≤ + ŒîŒµ_it)")
print("="*70)

fd = FirstDifferenceOLS(
    formula="log_tfp ~ log_capital + log_labor + log_rd",
    data=data,
    entity_col='firm',
    time_col='year'
)

res_fd = fd.fit(cov_type='robust')

print(res_fd.summary())

## 3.6 Between Estimator

In [None]:
print("="*70)
print("ESTIMATOR 6: BETWEEN ESTIMATOR")
print("="*70)
print("Uses only between-firm variation (firm means)")
print("="*70)

be = BetweenEstimator(
    formula="log_tfp ~ log_capital + log_labor + log_rd + firm_age",
    data=data,
    entity_col='firm',
    time_col='year'
)

res_be = be.fit(cov_type='robust')

print(res_be.summary())
print("\nNote: N = 150 (number of firms, not observations)")

## 3.7 IV-FE (Instrumental Variables with Fixed Effects)

To address potential endogeneity beyond Œ±_i, we instrument current R&D with its lagged value.

In [None]:
# Create lagged R&D as instrument
data = data.sort_values(['firm', 'year'])
data['log_rd_lag1'] = data.groupby('firm')['log_rd'].shift(1)

# Drop missing values from lag
data_iv = data.dropna(subset=['log_rd_lag1']).copy()

print(f"Data for IV estimation: {data_iv.shape}")
print(f"Lost {len(data) - len(data_iv)} observations due to lagging")

In [None]:
print("="*70)
print("ESTIMATOR 7: IV-FE (Instrumental Variables with Fixed Effects)")
print("="*70)
print("Instrument: log_rd_lag1 (lagged R&D)")
print("Endogenous: log_rd (current R&D)")
print("="*70)

iv_fe = PanelIV(
    formula="log_tfp ~ log_capital + log_labor + log_rd | log_capital + log_labor + log_rd_lag1",
    data=data_iv,
    entity_col='firm',
    time_col='year',
    model_type='fe'
)

res_iv_fe = iv_fe.fit(cov_type='clustered')

print(res_iv_fe.summary())

# First-stage diagnostics
print("\n" + "="*70)
print("FIRST-STAGE DIAGNOSTICS")
print("="*70)

for endog_var, stats in res_iv_fe.first_stage_results.items():
    print(f"\nEndogenous variable: {endog_var}")
    print(f"  Instrument:  log_rd_lag1")
    print(f"  F-statistic: {stats['f_statistic']:.2f}")
    print(f"  P-value:     {stats['f_pvalue']:.6f}")
    
    if stats['f_statistic'] < 10:
        print("  ‚ö†Ô∏è  WARNING: Weak instrument (F < 10)")
        print("      ‚Üí IV estimates may be biased toward OLS")
    else:
        print("  ‚úì Strong instrument (F > 10)")
        print("      ‚Üí IV estimates reliable")

---

# Section 4: Comparison of Results

Now let's systematically compare all estimators.

## 4.1 Coefficient Comparison Table

In [None]:
# Extract coefficients from all models
coef_dict = {
    'Pooled': res_pooled.params,
    'FE': res_fe.params,
    'FE-2way': res_fe_2way.params,
    'RE': res_re.params,
    'FD': res_fd.params,
    'BE': res_be.params,
    'IV-FE': res_iv_fe.params
}

coef_table = pd.DataFrame(coef_dict)

print("="*100)
print("COEFFICIENT COMPARISON ACROSS ALL ESTIMATORS")
print("="*100)
print(coef_table.to_string(float_format=lambda x: f'{x:.4f}'))

print("\n" + "="*100)
print("KEY VARIABLE: log_rd (R&D effect on productivity)")
print("="*100)
print(f"True effect:      0.1500 (known from data generation)")
print(f"\nPooled:           {coef_table.loc['log_rd', 'Pooled']:.4f}")
print(f"FE (one-way):     {coef_table.loc['log_rd', 'FE']:.4f}")
print(f"FE (two-way):     {coef_table.loc['log_rd', 'FE-2way']:.4f}")
print(f"RE:               {coef_table.loc['log_rd', 'RE']:.4f}")
print(f"FD:               {coef_table.loc['log_rd', 'FD']:.4f}")
print(f"BE:               {coef_table.loc['log_rd', 'BE']:.4f}")
print(f"IV-FE:            {coef_table.loc['log_rd', 'IV-FE']:.4f}")

## 4.2 Standard Error Comparison

In [None]:
# Extract standard errors from all models
se_dict = {
    'Pooled': res_pooled.std_errors,
    'FE': res_fe.std_errors,
    'FE-2way': res_fe_2way.std_errors,
    'RE': res_re.std_errors,
    'FD': res_fd.std_errors,
    'BE': res_be.std_errors,
    'IV-FE': res_iv_fe.std_errors
}

se_table = pd.DataFrame(se_dict)

print("="*100)
print("STANDARD ERROR COMPARISON")
print("="*100)
print(se_table.to_string(float_format=lambda x: f'{x:.4f}'))

print("\n" + "="*100)
print("PRECISION COMPARISON (log_rd standard errors)")
print("="*100)
for estimator in se_table.columns:
    se = se_table.loc['log_rd', estimator]
    print(f"{estimator:12s}: {se:.4f}")

print("\nNote: IV-FE has larger SE (instruments less informative than actual values)")

## 4.3 Coefficient Plot with Confidence Intervals

Visualize the R&D coefficient across all estimators.

In [None]:
# Focus on R&D coefficient
rd_coefs = coef_table.loc['log_rd']
rd_ses = se_table.loc['log_rd']

fig, ax = plt.subplots(figsize=(14, 7))
x_pos = np.arange(len(rd_coefs))

# Bar plot with error bars (95% CI)
colors = ['#1f77b4', '#ff7f0e', '#2ca02c', '#d62728', '#9467bd', '#8c564b', '#e377c2']
bars = ax.bar(x_pos, rd_coefs, alpha=0.8, yerr=1.96*rd_ses, capsize=6, 
               color=colors, edgecolor='black', linewidth=1.5)

# True value line
ax.axhline(0.15, color='red', linestyle='--', linewidth=2.5, label='True Effect (0.15)', zorder=10)

# Zero line
ax.axhline(0, color='black', linestyle='-', linewidth=1, alpha=0.3)

ax.set_xticks(x_pos)
ax.set_xticklabels(rd_coefs.index, rotation=0, fontsize=12, fontweight='bold')
ax.set_ylabel('Coefficient on log(R&D)', fontsize=13, fontweight='bold')
ax.set_title('R&D Effect on Productivity: Comparison Across Estimators (95% CI)', 
             fontsize=14, fontweight='bold', pad=20)
ax.legend(fontsize=12, loc='upper right')
ax.grid(alpha=0.3, axis='y', linestyle=':', linewidth=1)
ax.set_ylim([0, max(rd_coefs) * 1.3])

plt.tight_layout()
plt.show()

print("\nInterpretation:")
print("  - Pooled OLS is biased upward (unobserved quality confounds)")
print("  - FE estimates are closer to truth (control for Œ±_i)")
print("  - BE is biased (between variation contaminated by quality)")
print("  - FE, FE-2way, RE, FD all reasonably close to 0.15")
print("  - IV-FE has wider CI (less precision due to instrumentation)")

---

# Section 5: Specification Tests

Use formal tests to guide model selection.

## 5.1 F-Test: Fixed Effects vs Pooled OLS

**Null Hypothesis**: All firm fixed effects are zero (Pooled OLS sufficient)  
**Alternative**: At least one firm effect is non-zero (use FE)

In [None]:
print("="*70)
print("F-TEST: Fixed Effects vs Pooled OLS")
print("="*70)
print("H‚ÇÄ: Œ±‚ÇÅ = Œ±‚ÇÇ = ... = Œ±_N = 0 (all firm effects zero)")
print("H‚ÇÅ: At least one Œ±_i ‚â† 0")
print("="*70)

# Extract F-statistic from FE results
f_stat = res_fe.f_statistic_fe
f_pval = res_fe.f_pvalue_fe

print(f"\nF-statistic:  {f_stat:.2f}")
print(f"P-value:      {f_pval:.6f}")
print(f"\nDecision (Œ± = 0.05):")

if f_pval < 0.05:
    print("  ‚Üí Reject H‚ÇÄ")
    print("  ‚Üí Use Fixed Effects (firm heterogeneity is significant)")
else:
    print("  ‚Üí Fail to reject H‚ÇÄ")
    print("  ‚Üí Pooled OLS sufficient")

print("\n" + "="*70)
print("INTERPRETATION")
print("="*70)
print("Strong evidence for firm-specific effects.")
print("Pooling all firms together (OLS) is inappropriate.")
print("Proceed with panel methods that control for Œ±_i.")

## 5.2 Hausman Test: Fixed Effects vs Random Effects

**Null Hypothesis**: Œ±_i uncorrelated with regressors (RE consistent and efficient)  
**Alternative**: Œ±_i correlated with regressors (FE consistent, RE inconsistent)

In [None]:
print("="*70)
print("HAUSMAN TEST: Fixed Effects vs Random Effects")
print("="*70)
print("H‚ÇÄ: E[Œ±_i | X_it] = 0 (RE consistent and efficient)")
print("H‚ÇÅ: E[Œ±_i | X_it] ‚â† 0 (only FE consistent)")
print("="*70)

# Conduct Hausman test
hausman = HausmanTest(res_fe, res_re)

print(f"\nœá¬≤ statistic:  {hausman.statistic:.4f}")
print(f"Degrees of freedom: {hausman.df}")
print(f"P-value:       {hausman.pvalue:.6f}")
print(f"\nDecision (Œ± = 0.05):")

if hausman.pvalue < 0.05:
    print("  ‚Üí Reject H‚ÇÄ")
    print("  ‚Üí Use Fixed Effects (Œ±_i correlated with regressors)")
    print("  ‚Üí Random Effects is INCONSISTENT")
else:
    print("  ‚Üí Fail to reject H‚ÇÄ")
    print("  ‚Üí Use Random Effects (more efficient)")
    print("  ‚Üí Both FE and RE consistent, but RE has lower variance")

print("\n" + "="*70)
print("INTERPRETATION")
print("="*70)
print("The test result indicates whether unobserved firm quality (Œ±_i)")
print("is systematically related to R&D, capital, and labor.")
print("\nIf rejected: High-quality firms invest more ‚Üí need FE to control")
print("If not rejected: Quality independent of inputs ‚Üí RE valid and preferred")

## 5.3 First-Stage F-Test (IV Relevance)

**Null Hypothesis**: Instrument is irrelevant (not correlated with endogenous variable)  
**Rule of Thumb**: F > 10 indicates strong instrument

In [None]:
print("="*70)
print("FIRST-STAGE F-TEST (IV Relevance)")
print("="*70)
print("Tests whether instrument (log_rd_lag1) predicts endogenous variable (log_rd)")
print("="*70)

for endog, stats in res_iv_fe.first_stage_results.items():
    print(f"\nEndogenous variable: {endog}")
    print(f"Instrument:          log_rd_lag1")
    print(f"\nF-statistic:         {stats['f_statistic']:.2f}")
    print(f"P-value:             {stats['f_pvalue']:.6f}")
    
    print("\nAssessment:")
    if stats['f_statistic'] < 10:
        print("  ‚ö†Ô∏è  WEAK INSTRUMENT (F < 10)")
        print("      ‚Üí IV estimates biased toward OLS")
        print("      ‚Üí Standard errors understate uncertainty")
        print("      ‚Üí Consider alternative instruments or LIML")
    elif stats['f_statistic'] < 20:
        print("  ‚öôÔ∏è  MODERATE INSTRUMENT (10 ‚â§ F < 20)")
        print("      ‚Üí Instrument marginally adequate")
        print("      ‚Üí Some weak instrument bias may remain")
        print("      ‚Üí Consider weak-IV-robust inference")
    else:
        print("  ‚úì STRONG INSTRUMENT (F ‚â• 20)")
        print("      ‚Üí Instrument highly relevant")
        print("      ‚Üí IV estimates reliable")
        print("      ‚Üí Standard inference valid")

print("\n" + "="*70)
print("BACKGROUND: Stock-Yogo (2005) Critical Values")
print("="*70)
print("For 10% maximal IV size (1 endogenous, 1 instrument): F > 16.38")
print("For 10% maximal IV size (1 endogenous, 2 instruments): F > 19.93")
print("\nRule of thumb F > 10 is conservative threshold for 'not too weak'")

---

# Section 6: Model Selection Decision Tree

Based on our tests, we systematically choose the best model.

## 6.1 Decision Criteria

In [None]:
print("="*70)
print("MODEL SELECTION DECISION TREE")
print("="*70)

# Step 1: Test for fixed effects
print("\n[STEP 1] Are there significant firm fixed effects (Œ±_i)?")
print(f"         F-test p-value = {res_fe.f_pvalue_fe:.6f}")

if res_fe.f_pvalue_fe < 0.05:
    print("         ‚Üí YES, reject pooling (p < 0.05)")
    print("         ‚Üí Firm heterogeneity is significant")
    print("         ‚Üí Must use panel methods\n")
    
    # Step 2: FE vs RE
    print("[STEP 2] Is Œ±_i correlated with regressors?")
    print(f"         Hausman test p-value = {hausman.pvalue:.6f}")
    
    if hausman.pvalue < 0.05:
        print("         ‚Üí YES, reject RE (p < 0.05)")
        print("         ‚Üí Unobserved quality correlates with inputs")
        print("         ‚Üí Use Fixed Effects or First Difference\n")
        
        # Step 3: Endogeneity beyond Œ±_i?
        print("[STEP 3] Is there endogeneity beyond Œ±_i?")
        print("         (Simultaneity, measurement error, etc.)")
        
        # Get first-stage F
        first_stage_f = list(res_iv_fe.first_stage_results.values())[0]['f_statistic']
        print(f"         First-stage F = {first_stage_f:.2f}")
        
        if first_stage_f > 10:
            print("         ‚Üí Valid instrument available (F > 10)")
            print("         ‚Üí Consider IV-FE if endogeneity suspected\n")
            
            print("[DECISION] Two options:")
            print("           (a) Fixed Effects (if only Œ±_i is the problem)")
            print("           (b) IV-FE (if simultaneity/measurement error also present)")
            print("\n         Given R&D simultaneity concern:")
            print("         ‚Üí RECOMMENDED MODEL: IV-FE")
            final_model = res_iv_fe
            final_name = "IV-FE"
        else:
            print("         ‚Üí Weak instrument (F < 10)")
            print("         ‚Üí Cannot reliably use IV\n")
            print("[DECISION] RECOMMENDED MODEL: Fixed Effects (two-way)")
            final_model = res_fe_2way
            final_name = "FE (two-way)"
    else:
        print("         ‚Üí NO, do not reject RE (p ‚â• 0.05)")
        print("         ‚Üí Unobserved effects uncorrelated with inputs")
        print("         ‚Üí Random Effects is consistent and efficient\n")
        print("[DECISION] RECOMMENDED MODEL: Random Effects")
        final_model = res_re
        final_name = "Random Effects"
else:
    print("         ‚Üí NO, do not reject pooling (p ‚â• 0.05)")
    print("         ‚Üí No significant firm heterogeneity\n")
    print("[DECISION] RECOMMENDED MODEL: Pooled OLS (with clustered SE)")
    final_model = res_pooled
    final_name = "Pooled OLS"

print("\n" + "="*70)
print(f"FINAL MODEL SELECTED: {final_name}")
print("="*70)

## 6.2 Final Model Summary

In [None]:
print("="*70)
print(f"FINAL MODEL: {final_name}")
print("="*70)
print(final_model.summary())

---

# Section 7: Economic Interpretation

## 7.1 R&D Effect on Productivity

In [None]:
# Extract R&D coefficient from final model
beta_rd = final_model.params['log_rd']
se_rd = final_model.std_errors['log_rd']
t_stat = beta_rd / se_rd
p_value = 2 * (1 - stats.norm.cdf(abs(t_stat)))

print("="*70)
print("ECONOMIC INTERPRETATION: R&D EFFECT")
print("="*70)

print(f"\nCoefficient:     {beta_rd:.4f}")
print(f"Standard error:  {se_rd:.4f}")
print(f"t-statistic:     {t_stat:.2f}")
print(f"p-value:         {p_value:.4f}")

print("\n" + "-"*70)
print("INTERPRETATION")
print("-"*70)

print(f"\n1. PERCENTAGE EFFECT:")
print(f"   A 10% increase in R&D stock ‚Üí {beta_rd*10:.2f}% increase in TFP")
print(f"   A 1% increase in R&D stock  ‚Üí {beta_rd:.2f}% increase in TFP")

print(f"\n2. STATISTICAL SIGNIFICANCE:")
if p_value < 0.01:
    print(f"   ‚úì Highly significant (p < 0.01)")
elif p_value < 0.05:
    print(f"   ‚úì Significant at 5% level (p < 0.05)")
elif p_value < 0.10:
    print(f"   ~ Marginally significant (p < 0.10)")
else:
    print(f"   ‚úó Not statistically significant (p ‚â• 0.10)")

# 95% Confidence Interval
ci_lower = beta_rd - 1.96 * se_rd
ci_upper = beta_rd + 1.96 * se_rd

print(f"\n3. CONFIDENCE INTERVAL (95%):")
print(f"   [{ci_lower:.4f}, {ci_upper:.4f}]")
print(f"   We are 95% confident the true R&D elasticity lies in this range")

print(f"\n4. COMPARISON TO TRUTH:")
print(f"   True effect:      0.1500 (from data generation)")
print(f"   Estimated effect: {beta_rd:.4f}")
print(f"   Bias:             {beta_rd - 0.15:.4f}")
print(f"   Relative bias:    {100*(beta_rd - 0.15)/0.15:.1f}%")

print(f"\n5. ECONOMIC MAGNITUDE:")
print(f"   This elasticity is in line with empirical literature on R&D returns.")
print(f"   Studies typically find R&D elasticities between 0.05 and 0.25.")
print(f"   Our estimate of {beta_rd:.2f} is within this plausible range.")

if final_name == "IV-FE":
    print(f"\n6. CAUSAL INTERPRETATION:")
    print(f"   Using IV-FE, this is a CAUSAL effect (under IV assumptions):")
    print(f"   - Fixed effects control for unobserved firm quality")
    print(f"   - IV addresses simultaneity/reverse causality")
    print(f"   - Identifies WITHIN-FIRM effect of exogenous R&D changes")
elif final_name.startswith("FE"):
    print(f"\n6. CAUSAL INTERPRETATION:")
    print(f"   Using FE, this is a within-firm association:")
    print(f"   - Controls for time-invariant firm quality")
    print(f"   - BUT may still reflect simultaneity (productive firms invest more)")
    print(f"   - Caution: Not fully causal without addressing endogeneity")

## 7.2 Other Coefficients

In [None]:
print("="*70)
print("OTHER COEFFICIENT INTERPRETATIONS")
print("="*70)

if 'log_capital' in final_model.params:
    beta_k = final_model.params['log_capital']
    print(f"\nCapital Elasticity: {beta_k:.4f}")
    print(f"  ‚Üí 1% increase in capital ‚Üí {beta_k:.2f}% increase in TFP")
    print(f"  ‚Üí True value: 0.35 (from data generation)")

if 'log_labor' in final_model.params:
    beta_l = final_model.params['log_labor']
    print(f"\nLabor Elasticity: {beta_l:.4f}")
    print(f"  ‚Üí 1% increase in labor ‚Üí {beta_l:.2f}% increase in TFP")
    print(f"  ‚Üí True value: 0.45 (from data generation)")

if 'firm_age' in final_model.params:
    beta_age = final_model.params['firm_age']
    print(f"\nFirm Age Effect: {beta_age:.4f}")
    print(f"  ‚Üí Each additional year of firm age ‚Üí {beta_age:.4f} unit change in log(TFP)")
    print(f"  ‚Üí Approximately {100*beta_age:.2f}% change per year")

# Returns to scale
if 'log_capital' in final_model.params and 'log_labor' in final_model.params:
    rts = beta_k + beta_l
    print(f"\n" + "-"*70)
    print("RETURNS TO SCALE (Capital + Labor elasticities)")
    print("-"*70)
    print(f"Sum of elasticities: {rts:.4f}")
    
    if rts > 1.05:
        print(f"  ‚Üí INCREASING returns to scale (RTS > 1)")
        print(f"  ‚Üí Doubling inputs more than doubles output")
    elif rts < 0.95:
        print(f"  ‚Üí DECREASING returns to scale (RTS < 1)")
        print(f"  ‚Üí Doubling inputs less than doubles output")
    else:
        print(f"  ‚Üí CONSTANT returns to scale (RTS ‚âà 1)")
        print(f"  ‚Üí Doubling inputs approximately doubles output")
    
    print(f"\n  Note: True sum = 0.35 + 0.45 = 0.80 (constant returns in data)")

---

# Section 8: Professional Presentation

## 8.1 Regression Table (Publication Style)

In [None]:
# Create publication-ready table
models_to_report = {
    '(1) Pooled': res_pooled,
    '(2) FE': res_fe,
    '(3) RE': res_re,
    '(4) IV-FE': res_iv_fe
}

# Variables to report
vars_to_report = ['log_capital', 'log_labor', 'log_rd']

print("="*90)
print("Table 1: R&D and Firm Productivity - Comparison of Estimators")
print("="*90)
print("Dependent Variable: log(Total Factor Productivity)")
print("-"*90)

# Header
header = "Variable" + " "*14
for model_name in models_to_report.keys():
    header += f"{model_name:>12s}  "
print(header)
print("-"*90)

# Coefficients and standard errors
for var in vars_to_report:
    # Coefficient row
    row_coef = f"{var:20s}"
    for model_name, result in models_to_report.items():
        if var in result.params:
            coef = result.params[var]
            # Add stars for significance
            pval = 2 * (1 - stats.norm.cdf(abs(coef / result.std_errors[var])))
            stars = '***' if pval < 0.01 else '**' if pval < 0.05 else '*' if pval < 0.10 else ''
            row_coef += f"{coef:>9.4f}{stars:3s}  "
        else:
            row_coef += "      -       "
    print(row_coef)
    
    # Standard error row
    row_se = " "*20
    for model_name, result in models_to_report.items():
        if var in result.std_errors:
            se = result.std_errors[var]
            row_se += f"  ({se:>7.4f})   "
        else:
            row_se += "              "
    print(row_se)
    print()

print("-"*90)

# Model statistics
print(f"{'Observations':20s}", end="")
for model_name, result in models_to_report.items():
    print(f"{result.nobs:>12.0f}  ", end="")
print()

print(f"{'R-squared':20s}", end="")
for model_name, result in models_to_report.items():
    print(f"{result.rsquared:>12.4f}  ", end="")
print()

print(f"{'Firm FE':20s}", end="")
for model_name, result in models_to_report.items():
    fe_status = "Yes" if "FE" in model_name or "IV-FE" in model_name else "No"
    print(f"{fe_status:>12s}  ", end="")
print()

print(f"{'Clustering':20s}", end="")
for model_name in models_to_report.keys():
    print(f"{'Firm':>12s}  ", end="")
print()

print("="*90)
print("Notes:")
print("  Standard errors in parentheses. *** p<0.01, ** p<0.05, * p<0.10")
print("  (1) Pooled OLS with clustered SE")
print("  (2) Fixed Effects (within estimator)")
print("  (3) Random Effects (GLS)")
print("  (4) IV-FE with log_rd instrumented by log_rd_lag1")
print("  Sample: 150 firms, 2010-2019, balanced panel")

## 8.2 Summary Visualization

In [None]:
# Create comprehensive summary plot
fig, axes = plt.subplots(2, 2, figsize=(16, 12))

# Panel A: Coefficient comparison
ax1 = axes[0, 0]
estimators = ['Pooled', 'FE', 'FE-2way', 'RE', 'FD', 'BE', 'IV-FE']
coefs_rd = [coef_table.loc['log_rd', est] for est in estimators]
ses_rd = [se_table.loc['log_rd', est] for est in estimators]

x_pos = np.arange(len(estimators))
colors_all = ['#1f77b4', '#ff7f0e', '#2ca02c', '#d62728', '#9467bd', '#8c564b', '#e377c2']
ax1.bar(x_pos, coefs_rd, yerr=1.96*np.array(ses_rd), capsize=5, 
        color=colors_all, alpha=0.8, edgecolor='black', linewidth=1.2)
ax1.axhline(0.15, color='red', linestyle='--', linewidth=2, label='True Effect')
ax1.axhline(0, color='black', linestyle='-', linewidth=0.8, alpha=0.3)
ax1.set_xticks(x_pos)
ax1.set_xticklabels(estimators, rotation=45, ha='right')
ax1.set_ylabel('R&D Coefficient', fontweight='bold')
ax1.set_title('Panel A: Coefficient Estimates (95% CI)', fontweight='bold', fontsize=12)
ax1.legend()
ax1.grid(alpha=0.3, axis='y')

# Panel B: Standard errors comparison
ax2 = axes[0, 1]
ax2.bar(x_pos, ses_rd, color=colors_all, alpha=0.8, edgecolor='black', linewidth=1.2)
ax2.set_xticks(x_pos)
ax2.set_xticklabels(estimators, rotation=45, ha='right')
ax2.set_ylabel('Standard Error', fontweight='bold')
ax2.set_title('Panel B: Precision Comparison', fontweight='bold', fontsize=12)
ax2.grid(alpha=0.3, axis='y')

# Panel C: R-squared comparison
ax3 = axes[1, 0]
r2_values = [
    res_pooled.rsquared,
    res_fe.rsquared,
    res_fe_2way.rsquared,
    res_re.rsquared,
    res_fd.rsquared,
    res_be.rsquared,
    res_iv_fe.rsquared
]
ax3.bar(x_pos, r2_values, color=colors_all, alpha=0.8, edgecolor='black', linewidth=1.2)
ax3.set_xticks(x_pos)
ax3.set_xticklabels(estimators, rotation=45, ha='right')
ax3.set_ylabel('R-squared', fontweight='bold')
ax3.set_title('Panel C: Model Fit', fontweight='bold', fontsize=12)
ax3.set_ylim([0, 1])
ax3.grid(alpha=0.3, axis='y')

# Panel D: Bias comparison
ax4 = axes[1, 1]
bias_values = [c - 0.15 for c in coefs_rd]
colors_bias = ['red' if b > 0.02 else 'green' if abs(b) <= 0.02 else 'orange' for b in bias_values]
ax4.bar(x_pos, bias_values, color=colors_bias, alpha=0.8, edgecolor='black', linewidth=1.2)
ax4.axhline(0, color='black', linestyle='-', linewidth=1.5)
ax4.set_xticks(x_pos)
ax4.set_xticklabels(estimators, rotation=45, ha='right')
ax4.set_ylabel('Bias (Estimate - True)', fontweight='bold')
ax4.set_title('Panel D: Bias Relative to Truth (0.15)', fontweight='bold', fontsize=12)
ax4.grid(alpha=0.3, axis='y')

plt.suptitle('Figure 1: Comprehensive Comparison of Panel Estimators', 
             fontsize=14, fontweight='bold', y=0.995)
plt.tight_layout()
plt.show()

print("Summary:")
print("  - Pooled and BE are biased upward (Panel A & D)")
print("  - FE methods closer to truth (green bars in Panel D)")
print("  - IV-FE has larger SE (less precision, Panel B)")
print("  - FE/RE have high R¬≤ (Panel C)")

---

# Section 9: Exercises

## Exercise 9.1: Apply Full Workflow to New Data

**Task**: Repeat the complete analysis on a different dataset.

**Instructions**:
1. Load the Grunfeld investment data (`pb.load_grunfeld()`)
2. Research question: Does firm value affect investment?
3. Conduct EDA (trajectories, variance decomposition)
4. Estimate all 7 models
5. Perform specification tests
6. Select final model using decision tree
7. Interpret results economically
8. Create presentation table

In [None]:
# EXERCISE 9.1: YOUR CODE HERE

# Step 1: Load Grunfeld data
# TODO: grunfeld = pb.load_grunfeld()

# Step 2: EDA
# TODO: Trajectories, variance decomposition

# Step 3: Estimate all models
# TODO: Pooled, FE, RE, FD, BE, IV-FE

# Step 4: Tests
# TODO: F-test, Hausman, first-stage F

# Step 5: Model selection
# TODO: Decision tree logic

# Step 6: Interpretation
# TODO: Economic interpretation

# Step 7: Presentation
# TODO: Professional table

pass  # Remove when you add code

## Exercise 9.2: Sensitivity Analysis

**Task**: Assess robustness of results to specification choices.

**Instructions**:
1. Re-estimate FE with different SE types (robust, clustered, Driscoll-Kraay)
2. Compare standard errors and significance levels
3. Add firm age as regressor in FE model
4. Test whether R&D coefficient changes
5. Document sensitivity of conclusions

In [None]:
# EXERCISE 9.2: YOUR CODE HERE

# TODO: Estimate FE with different SE types
# TODO: Compare results
# TODO: Add controls and re-estimate
# TODO: Assess sensitivity

pass  # Remove when you add code

---

# Section 10: Summary and Key Takeaways

## 10.1 Complete Workflow

This notebook demonstrated a **complete empirical workflow**:

1. **Research Question**: Does R&D increase productivity?
2. **EDA**: Trajectories, variance decomposition, scatter plots
3. **Estimation**: All 7 estimators (Pooled, FE, FE-2way, RE, FD, BE, IV-FE)
4. **Testing**: F-test, Hausman, first-stage F
5. **Selection**: Decision tree based on tests
6. **Interpretation**: Economic magnitude and significance
7. **Presentation**: Professional tables and figures

---

## 10.2 Estimator Comparison Summary

| Estimator | Controls Œ±_i | Uses Within Variation | Efficiency | When to Use |
|-----------|-------------|----------------------|------------|-------------|
| **Pooled OLS** | No | No | High (if valid) | No heterogeneity, time-invariant effects negligible |
| **FE** | Yes | Yes | Moderate | Œ±_i correlated with X, sufficient within variation |
| **RE** | Partially | Both | High (if valid) | Œ±_i uncorrelated with X (Hausman test) |
| **FD** | Yes | Yes | Low (MA(1) errors) | Serial correlation, small T |
| **BE** | No | No | Low (small N) | Only between variation relevant |
| **IV-FE** | Yes | Yes | Low (but consistent) | Endogeneity beyond Œ±_i, valid instruments |

---

## 10.3 Specification Testing

**Always test, never assume!**

1. **F-test** (Pooled vs FE):
   - Tests whether firm effects are jointly significant
   - If rejected ‚Üí use FE or RE
   
2. **Hausman test** (FE vs RE):
   - Tests whether Œ±_i correlated with regressors
   - If rejected ‚Üí use FE (RE inconsistent)
   - If not rejected ‚Üí use RE (more efficient)
   
3. **First-stage F** (IV):
   - Tests instrument relevance
   - F < 10 ‚Üí weak instrument, be cautious
   - F > 20 ‚Üí strong instrument, proceed confidently

---

## 10.4 Key Insights

1. **Pooled OLS biased** when unobserved heterogeneity (Œ±_i) correlated with X
   - Our case: High-quality firms invest more in R&D AND are more productive
   
2. **Fixed Effects removes bias** from time-invariant Œ±_i
   - Identifies effect using within-firm variation
   
3. **IV-FE handles additional endogeneity** (simultaneity, measurement error)
   - But requires strong, valid instruments
   - Always check first-stage F
   
4. **Variance decomposition guides estimator choice**
   - High within variation ‚Üí FE effective
   - High between variation ‚Üí BE informative (if no bias)
   
5. **Trade-off: Bias vs. Variance**
   - FE: Unbiased but less efficient (loses between variation)
   - RE: Efficient but biased if assumptions fail
   - IV: Consistent but high variance (weak instruments)

---

## 10.5 Best Practices for Applied Work

1. **Report multiple specifications**
   - Show Pooled, FE, RE in table (columns 1-3)
   - Helps readers assess robustness
   
2. **Always cluster standard errors**
   - Account for within-entity correlation
   - Panel data violates i.i.d. assumption
   
3. **Conduct and report specification tests**
   - F-test, Hausman, first-stage F
   - Justify your model selection
   
4. **Interpret economically, not just statistically**
   - What is the magnitude in percentage terms?
   - Is it economically significant?
   - Compare to existing literature
   
5. **Acknowledge limitations**
   - IV assumptions (exclusion restriction)
   - Measurement issues
   - External validity

---

## 10.6 Next Steps

**You've completed the Static Panel Models series!**

Advanced topics to explore next:

1. **Dynamic Panels** (Arellano-Bond GMM)
   - When lagged dependent variable is regressor
   - System GMM vs. Difference GMM
   
2. **Nonlinear Panels**
   - Fixed effects logit/probit
   - Poisson/Negative Binomial
   
3. **Advanced IV Diagnostics**
   - Weak-IV-robust inference
   - Overidentification tests
   - Heterogeneous treatment effects
   
4. **Difference-in-Differences**
   - Two-way FE as DiD
   - Parallel trends assumption
   - Staggered treatment

---

## üéì Congratulations!

You now have the skills to:
- Estimate and compare all major panel estimators
- Conduct rigorous specification tests
- Select appropriate models based on data and context
- Interpret results in economic terms
- Present findings professionally

**Well done on completing this capstone notebook!**

---

## References

### Key Papers

- **Hausman, J. A. (1978)**. "Specification Tests in Econometrics." *Econometrica*, 46(6), 1251-1271.
- **Stock, J. H., & Yogo, M. (2005)**. "Testing for Weak Instruments in Linear IV Regression." In *Identification and Inference for Econometric Models* (pp. 80-108). Cambridge University Press.

### Textbooks

- **Wooldridge, J. M. (2010)**. *Econometric Analysis of Cross Section and Panel Data* (2nd ed.). MIT Press. [Chapters 10-11]
- **Baltagi, B. H. (2021)**. *Econometric Analysis of Panel Data* (6th ed.). Springer.
- **Angrist, J. D., & Pischke, J.-S. (2009)**. *Mostly Harmless Econometrics*. Princeton University Press. [Chapter 5]

### Software Documentation

- [PanelBox Documentation](https://panelbox.readthedocs.io/)
- [PanelBox Examples](https://github.com/yourorg/panelbox/tree/main/examples)

---

**End of Notebook**