# Tutorial 03: Estimation and Results Interpretation

**Series**: PanelBox - Fundamentals
**Level**: Intermediate
**Estimated Time**: 60-75 minutes
**Prerequisites**: Tutorials 01 (Panel Data Structures) and 02 (Formulas)

## Learning Objectives

By the end of this tutorial, you will be able to:
- Estimate panel data models using PanelBox
- Interpret regression coefficients in economic terms
- Understand standard errors, t-statistics, and p-values
- Compare classical, robust, and clustered standard errors
- Compute and interpret confidence intervals
- Perform hypothesis tests
- Export results to multiple formats (LaTeX, Markdown, JSON)
- Validate and diagnose model fit

## Table of Contents
1. [Introduction to Estimation](#1-introduction-to-estimation)
2. [Your First Model: Pooled OLS](#2-your-first-model-pooled-ols)
3. [Understanding Results Tables](#3-understanding-results-tables)
4. [Standard Errors and Inference](#4-standard-errors-and-inference)
5. [Hypothesis Testing](#5-hypothesis-testing)
6. [Model Diagnostics](#6-model-diagnostics)
7. [Exporting Results](#7-exporting-results)
8. [Practical Exercises](#8-practical-exercises)
9. [Summary and Next Steps](#9-summary-and-next-steps)

---

In [ ]:
# Notebook metadata
__version__ = "1.0.0"
__last_updated__ = "2026-02-16"
__compatible_with__ = "PanelBox >= 0.1.0"

# Standard libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats
from IPython.display import display, Markdown, HTML

# PanelBox library (development mode)
import sys
sys.path.append('/home/guhaase/projetos/panelbox')
import panelbox as pb
from panelbox.core.panel_data import PanelData
from panelbox.models import PooledOLS

# Try to import other estimators (may not all be implemented yet)
try:
    from panelbox.models import FixedEffects, RandomEffects
    FE_AVAILABLE = True
except ImportError:
    FE_AVAILABLE = False
    print("Note: FixedEffects/RandomEffects not available yet")

# Plotting configuration
plt.style.use('seaborn-v0_8-darkgrid')
sns.set_palette("husl")
plt.rcParams['figure.figsize'] = (10, 6)
pd.set_option('display.max_columns', None)
pd.set_option('display.precision', 4)
pd.set_option('display.width', 100)

# Display library version
print(f"PanelBox version: {pb.__version__}")
print(f"Notebook version: {__version__}")
print("Setup complete!")

In [None]:
# Load Grunfeld dataset
try:
    from panelbox.datasets import load_grunfeld
    data = load_grunfeld()
    print("✓ Loaded from panelbox.datasets.load_grunfeld()")
except ImportError:
    import os
    data_path = '/home/guhaase/projetos/panelbox/examples/datasets/grunfeld.csv'
    if os.path.exists(data_path):
        data = pd.read_csv(data_path)
        print(f"✓ Loaded from {data_path}")
    else:
        data_path = '/home/guhaase/projetos/panelbox/panelbox/datasets/grunfeld.csv'
        data = pd.read_csv(data_path)
        print(f"✓ Loaded from {data_path}")

# Quick recap
print(f"\nGrunfeld Investment Data")
print(f"Observations: {data.shape[0]}")
print(f"Variables: {list(data.columns)}")
display(data.head())

## 1. Introduction to Estimation

### The Econometric Workflow

```
1. THEORY          → What determines investment?
2. MODEL           → invest = β₀ + β₁·value + β₂·capital + ε
3. SPECIFICATION   → Formula: "invest ~ value + capital"
4. ESTIMATION      → Find β̂₀, β̂₁, β̂₂ that best fit the data
5. INFERENCE       → Are β̂ statistically significant?
6. INTERPRETATION  → What do the numbers mean economically?
```

So far, you've learned steps 1-3. This tutorial focuses on **4-6**.

---

### Pooled OLS: The Simplest Estimator

**Pooled Ordinary Least Squares** (Pooled OLS) treats panel data as a large cross-section:
- Ignores panel structure (entities and time)
- Estimates by minimizing sum of squared residuals:
$$
\min_{\beta} \sum_{i=1}^N \sum_{t=1}^T (Y_{it} - X_{it}'\beta)^2
$$

**Advantages**:
- ✅ Simple, fast, interpretable
- ✅ Efficient if no unobserved heterogeneity

**Disadvantages**:
- ❌ Biased if entity-specific effects exist
- ❌ Standard errors underestimate uncertainty (observations not independent)

**When to use**:
- Exploratory analysis
- Benchmark before fixed effects
- When entities are truly homogeneous (rare!)

---

### What We'll Estimate

**Model**: Grunfeld investment equation
$$
\text{Investment}_{it} = \beta_0 + \beta_1 \text{Value}_{it} + \beta_2 \text{Capital}_{it} + \varepsilon_{it}
$$

**Research questions**:
1. How does market value affect investment? (β₁)
2. How does existing capital stock affect investment? (β₂)
3. Are these effects statistically significant?
4. How much variation do we explain? (R²)

---

## 2. Your First Model: Pooled OLS

### Step 1: Specify the Formula

We'll estimate:
```python
formula = "invest ~ value + capital"
```

This expands to:
$$
\text{invest}_{it} = \beta_0 + \beta_1 \cdot \text{value}_{it} + \beta_2 \cdot \text{capital}_{it} + \varepsilon_{it}
$$

---

In [ ]:
# Fit Pooled OLS model
print("="*70)
print("ESTIMATING POOLED OLS MODEL")
print("="*70)

# Specify formula
formula = "invest ~ value + capital"
print(f"\nFormula: {formula}")
print(f"Model: invest = β₀ + β₁·value + β₂·capital + ε")

# Create model instance
model = PooledOLS(formula, data=data)

# Fit the model
results = model.fit()

print("\n✓ Model estimated successfully!")
print(f"  Estimator: {model.__class__.__name__}")
print(f"  Observations: {results.nobs}")
print(f"  Parameters: {results.params.shape[0]}")

In [ ]:
# Display results summary
print("\n" + "="*70)
print("ESTIMATION RESULTS")
print("="*70)

# Print summary table
print(results.summary())

# Alternative: Display as DataFrame
print("\n" + "="*70)
print("COEFFICIENT TABLE")
print("="*70)
display(results.summary_table())

## 3. Understanding Results Tables

### Key Components of Results

A typical econometrics results table contains:

#### 1. Model Information
- **Estimator**: Pooled OLS, Fixed Effects, etc.
- **Formula**: Model specification
- **Observations**: Number of data points (N×T)
- **Entities/Time**: Panel dimensions

#### 2. Coefficient Estimates
- **Parameter** (β̂): Estimated coefficient
- **Std. Error** (SE): Uncertainty in estimate
- **t-statistic**: β̂ / SE (test H₀: β = 0)
- **p-value**: Probability of seeing this t-stat if H₀ true
- **Confidence Interval**: Range likely containing true β

#### 3. Model Fit Statistics
- **R²**: Fraction of variance explained (0 to 1)
- **Adjusted R²**: R² penalized for # of parameters
- **F-statistic**: Test of overall model significance
- **Log-likelihood**: Goodness of fit (higher = better)

---

### Interpreting Coefficients

For our model: `invest = β₀ + β₁·value + β₂·capital + ε`

**β₁ (value coefficient)**:
- **Meaning**: Change in investment per unit change in firm value, holding capital constant
- **Units**: If value increases by 1 (million $), investment increases by β₁ (million $)
- **Ceteris paribus**: "All else equal"

**β₂ (capital coefficient)**:
- **Meaning**: Change in investment per unit change in capital stock, holding value constant

**β₀ (intercept)**:
- **Meaning**: Expected investment when value = capital = 0
- **Interpretation**: Often not economically meaningful (extrapolation)

---

---

## 4. Hypothesis Testing

*This section covers hypothesis tests*

**TODO**: Show:
- Testing individual coefficients (t-tests)
- Joint hypothesis tests (F-tests)
- Linear restrictions
- Wald tests
- Practical examples

---

## 5. Model Comparison

*This section compares different specifications*

**TODO**: Demonstrate:
- Comparing nested models
- Information criteria (AIC, BIC)
- Model selection guidelines
- Creating comparison tables

---

## 6. Exporting Results

*This section shows how to export results*

**TODO**: Cover:
- LaTeX tables for papers
- Markdown tables for reports
- HTML for web display
- JSON for further processing
- Copying to clipboard

---

## 7. Exercises

### Exercise 1: Basic Estimation
Estimate a pooled OLS model and interpret all coefficients.

### Exercise 2: Standard Errors
Compare results using different standard error types.

### Exercise 3: Hypothesis Testing
Test whether two coefficients are jointly significant.

### Exercise 4: Model Selection
Compare two model specifications using AIC/BIC.

### Exercise 5: Results Export
Export results to LaTeX for a research paper.

**Solutions**: See `../../solutions/01_fundamentals/03_estimation_solutions.ipynb` (coming soon)

---

## 8. Summary

### Key Takeaways

- Pooled OLS is the starting point for panel analysis
- Robust/clustered standard errors are often necessary for panel data
- Hypothesis testing follows standard regression procedures
- Model comparison helps select appropriate specifications
- Results can be exported to various formats

### Next Steps

**Optional**: Continue to **Tutorial 1.4: Spatial Fundamentals** if you plan to work with spatial data.

**Recommended**: Proceed to **Module 2: Classical Estimators** to learn about Fixed Effects and Random Effects models.

---

**Checkpoint**: Before moving on, ensure you can:
- [ ] Estimate panel data models
- [ ] Interpret coefficients, p-values, and confidence intervals
- [ ] Choose appropriate standard error types
- [ ] Perform hypothesis tests
- [ ] Compare model specifications
- [ ] Export results to LaTeX/Markdown/HTML