# First Difference and Between Estimators

**Level**: Advanced  
**Duration**: 60-75 minutes  
**Prerequisites**: Fixed Effects (Notebook 02), Time series concepts

---

## Learning Objectives

By the end of this notebook, you will be able to:

1. **Understand** the First Difference (FD) transformation and how it eliminates α_i
2. **Distinguish** between FD and FE (demeaning vs differencing)
3. **Identify** when FD is preferable to FE (serial correlation, unit roots)
4. **Recognize** the MA(1) structure induced by differencing
5. **Estimate** Between Estimator and interpret cross-sectional relationships
6. **Decompose** total variance into within and between components
7. **Choose** appropriate estimator based on data characteristics

---

## Introduction

While Fixed Effects (FE) is the workhorse of panel data analysis, it's not always the best choice. In this notebook, we explore two important alternatives:

- **First Difference (FD)**: Eliminates α_i by differencing, robust to serial correlation
- **Between Estimator (BE)**: Uses cross-sectional variation (entity means)

We'll learn when each estimator is appropriate and how to decompose variance into within and between components.

In [None]:
# Import required packages
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import panelbox as pb
from scipy import stats

# Set display options
pd.set_option('display.max_columns', None)
pd.set_option('display.precision', 4)
np.set_printoptions(precision=4, suppress=True)

# Set plotting style
plt.style.use('seaborn-v0_8-darkgrid')
sns.set_palette("husl")

print(f"PanelBox version: {pb.__version__}")

---

## Section 1: First Difference Transformation

### 1.1 The Problem: Serial Correlation

Fixed Effects assumes that idiosyncratic errors ε_it are **uncorrelated over time**:

$$
\text{Cov}(\varepsilon_{it}, \varepsilon_{is}) = 0 \quad \text{for } t \neq s
$$

This assumption is violated if:
- Errors follow AR(1): ε_it = ρε_{i,t-1} + η_it
- Random walk: ε_it = ε_{i,t-1} + η_it (unit root)
- Measurement error persists over time

**Consequences**:
- FE estimator **inconsistent** (biased even with large N, T)
- Standard errors invalid

**Solution**: Use **First Difference (FD)** estimator instead.

### 1.2 FD Transformation

The FD transformation differences the data:

$$
\begin{align}
y_{it} &= \beta x_{it} + \alpha_i + \varepsilon_{it} \\
y_{i,t-1} &= \beta x_{i,t-1} + \alpha_i + \varepsilon_{i,t-1} \\
\hline
\Delta y_{it} &= \beta \Delta x_{it} + \Delta \varepsilon_{it}
\end{align}
$$

where Δy_it = y_it - y_{i,t-1} and **α_i cancels out** (Δα_i = 0).

**Advantages**:
1. Robust to arbitrary serial correlation in ε_it
2. Handles **unit roots** (non-stationarity)
3. Better small-sample properties when ε_it ~ AR(1)

**Disadvantage**:
- Loses one observation per entity (T-1 instead of T)

Let's load data and estimate FD:

In [None]:
# Load Grunfeld investment data
data = pb.datasets.load_grunfeld()

print("Grunfeld Investment Data:")
print(data.head(10))
print(f"\nShape: {data.shape}")
print(f"Firms: {data['firm'].nunique()}, Years: {data['year'].nunique()}")
print(f"\nVariable descriptions:")
print("- invest: Gross investment")
print("- value: Market value (beginning of year)")
print("- capital: Capital stock (beginning of year)")

In [None]:
# Estimate First Difference model
print("="*70)
print("FIRST DIFFERENCE ESTIMATOR")
print("="*70)

fd_model = pb.FirstDifferenceEstimator(
    "invest ~ value + capital", 
    data, 
    entity_col='firm', 
    time_col='year'
)
fd_results = fd_model.fit(cov_type='clustered')

print(fd_results.summary())

# Check observations dropped
n_original = len(data)
n_used = fd_results.nobs
n_dropped = n_original - n_used
n_firms = data['firm'].nunique()

print(f"\n{'='*70}")
print("OBSERVATION COUNT:")
print(f"{'='*70}")
print(f"Original observations: {n_original}")
print(f"Used in estimation: {n_used}")
print(f"Dropped: {n_dropped} (first observation for each of {n_firms} firms)")

**Key observations**:
1. FD drops first period for each entity (loses N observations)
2. Coefficients estimated on **changes** (Δy, Δx)
3. Clustered SE account for within-entity correlation

### 1.3 FD vs FE Comparison

How do FD and FE compare?

**Theoretical Results**:
- **T=2**: FD ≡ FE (numerically identical)
- **T>2, ε ~ i.i.d.**: FE more efficient (uses all T observations)
- **T>2, ε ~ AR(1)**: FD more efficient and consistent
- **Unit roots**: FD consistent, FE inconsistent

Let's compare numerically:

In [None]:
# Estimate Fixed Effects for comparison
fe_model = pb.FixedEffects(
    "invest ~ value + capital", 
    data, 
    entity_col='firm', 
    time_col='year'
)
fe_results = fe_model.fit(cov_type='clustered')

# Compare coefficients and SEs
comparison_df = pd.DataFrame({
    'FD_coef': fd_results.params,
    'FE_coef': fe_results.params,
    'Diff_coef': fd_results.params - fe_results.params,
    'FD_se': fd_results.std_errors,
    'FE_se': fe_results.std_errors
})

print("="*70)
print("FD vs FE COMPARISON")
print("="*70)
print(comparison_df)

print("\n" + "="*70)
print("INTERPRETATION:")
print("="*70)
print("1. Coefficients are similar but not identical (T>2)")
print("2. FE has slightly smaller SEs (more efficient under i.i.d.)")
print("3. If serial correlation present, FD would be more reliable")

**When to use FD vs FE**:

| Scenario | Preferred Estimator | Reason |
|----------|-------------------|--------|
| ε ~ i.i.d. | FE | More efficient |
| ε ~ AR(1) | FD | Consistent |
| Unit root (random walk) | FD | FE inconsistent |
| T = 2 | Either | Identical |
| Large T | FE | Efficiency matters |

### 1.4 MA(1) Induced by Differencing

**Important technical point**: Even if ε_it are i.i.d., the differenced errors Δε_it have **MA(1) structure**:

$$
\begin{align}
\Delta \varepsilon_{it} &= \varepsilon_{it} - \varepsilon_{i,t-1} \\
\text{Cov}(\Delta\varepsilon_{it}, \Delta\varepsilon_{i,t-1}) &= \text{Cov}(\varepsilon_{it} - \varepsilon_{i,t-1}, \varepsilon_{i,t-1} - \varepsilon_{i,t-2}) \\
&= -\sigma^2_{\varepsilon}
\end{align}
$$

**Implication**: Must use **robust standard errors** (Driscoll-Kraay, Newey-West) to account for serial correlation in Δε_it.

In [None]:
# Estimate FD with different SE types
fd_clustered = fd_model.fit(cov_type='clustered')
fd_dk = fd_model.fit(cov_type='driscoll_kraay', max_lags=1)

# Compare standard errors
se_comparison = pd.DataFrame({
    'Clustered': fd_clustered.std_errors,
    'Driscoll-Kraay': fd_dk.std_errors,
    'Pct_Diff': (fd_dk.std_errors / fd_clustered.std_errors - 1) * 100
})

print("="*70)
print("FD: STANDARD ERROR COMPARISON")
print("="*70)
print(se_comparison)
print("\nNote: Driscoll-Kraay SEs account for MA(1) in differenced errors")

---

## Section 2: Between Estimator

### 2.1 Between Transformation

The **Between Estimator (BE)** uses only **cross-sectional variation** by averaging over time:

$$
\bar{y}_i = \beta \bar{x}_i + \alpha + \bar{u}_i
$$

where $\bar{y}_i = \frac{1}{T_i}\sum_{t=1}^{T_i} y_{it}$ and $\bar{u}_i = \alpha_i + \bar{\varepsilon}_i$.

**Key properties**:
- Sample size: **N** (not NT)
- Uses **between-entity** variation only
- α_i absorbed into error (not eliminated)
- Can include **time-invariant** regressors

**When to use BE**:
1. Interest in **long-run cross-sectional relationships**
2. Time-invariant variables important (education, location)
3. T small, N large
4. Exploratory analysis

Let's estimate the Between Estimator:

In [None]:
# Estimate Between Estimator
print("="*70)
print("BETWEEN ESTIMATOR")
print("="*70)

be_model = pb.BetweenEstimator(
    "invest ~ value + capital", 
    data, 
    entity_col='firm', 
    time_col='year'
)
be_results = be_model.fit(cov_type='robust')

print(be_results.summary())

print(f"\n{'='*70}")
print("OBSERVATION COUNT:")
print(f"{'='*70}")
print(f"Original observations (NT): {len(data)}")
print(f"BE sample size (N): {be_results.nobs}")
print(f"Average T per firm: {len(data) / data['firm'].nunique():.1f}")

In [None]:
# Inspect entity means used in BE
entity_means = data.groupby('firm')[['invest', 'value', 'capital']].mean()

print("="*70)
print("ENTITY MEANS (used in Between Estimator)")
print("="*70)
print(entity_means.head(10))
print(f"\nShape: {entity_means.shape} (one row per firm)")

### 2.2 When to Use Between Estimator

**Use BE when**:
1. **Long-run relationships**: "Do high-value firms invest more?"
2. **Time-invariant variables**: Education, gender, location
3. **T small**: Limited time variation
4. **Exploratory**: Understand cross-sectional patterns

**Don't use BE when**:
1. **Unobserved heterogeneity**: α_i correlated with x_it (use FE/FD)
2. **Reverse causality**: Need within variation for identification
3. **T large**: FE/FD more efficient

### 2.3 FE vs BE Interpretation

**Critical distinction**:

- **FE (within)**: "When a firm increases value by 1, investment changes by β_FE"
  - Interpretation: **Within-firm changes**
  - Controls for time-invariant α_i

- **BE (between)**: "Firms with 1 unit higher average value have β_BE higher investment"
  - Interpretation: **Cross-sectional differences**
  - Confounded by α_i

**Different questions, different answers!**

In [None]:
# Compare FE and BE coefficients
fe_be_comp = pd.DataFrame({
    'FE (within)': fe_results.params,
    'BE (between)': be_results.params,
    'Ratio (BE/FE)': be_results.params / fe_results.params
})

print("="*70)
print("FE vs BE COEFFICIENT COMPARISON")
print("="*70)
print(fe_be_comp)

print("\n" + "="*70)
print("INTERPRETATION:")
print("="*70)
print("FE (value): When firm increases value by 1, invest increases by", 
      f"{fe_results.params['value']:.4f}")
print("BE (value): Firms with 1 unit higher avg value invest",
      f"{be_results.params['value']:.4f} more")
print("\nDifference suggests presence of α_i (unobserved heterogeneity)")

---

## Section 3: Variance Decomposition

### 3.1 Within vs Between Variance

Total variation in x_it can be decomposed:

$$
\text{Var}(x_{it}) = \underbrace{\text{Var}(x_{it} - \bar{x}_i)}_{\text{Within}} + \underbrace{\text{Var}(\bar{x}_i)}_{\text{Between}}
$$

**Within variance**: Fluctuations around entity mean (used by FE, FD)  
**Between variance**: Differences in entity means (used by BE)

**Why it matters**:
- FE/FD only work if **within variance > 0**
- Time-invariant variables have **zero within variance**
- High within variance → more precise FE estimates

In [None]:
# Function to compute variance decomposition
def variance_decomposition(data, entity_col, var):
    """
    Decompose total variance into within and between components.
    
    Parameters
    ----------
    data : DataFrame
    entity_col : str
    var : str
        Variable name
    
    Returns
    -------
    dict
    """
    var_total = data[var].var()
    
    # Within variance: Var(x_it - x̄_i)
    data_demeaned = data.groupby(entity_col)[var].transform(lambda x: x - x.mean())
    var_within = data_demeaned.var()
    
    # Between variance: Var(x̄_i)
    entity_means = data.groupby(entity_col)[var].mean()
    var_between = entity_means.var()
    
    return {
        'Variable': var,
        'Total': var_total,
        'Within': var_within,
        'Between': var_between,
        'Pct_Within': var_within / var_total * 100,
        'Pct_Between': var_between / var_total * 100
    }

# Compute for all variables
var_decomp_list = []
for var in ['invest', 'value', 'capital']:
    var_decomp_list.append(variance_decomposition(data, 'firm', var))

var_decomp_df = pd.DataFrame(var_decomp_list)

print("="*70)
print("VARIANCE DECOMPOSITION")
print("="*70)
print(var_decomp_df.to_string(index=False))

print("\n" + "="*70)
print("INTERPRETATION:")
print("="*70)
print(f"- 'invest' has {var_decomp_df.loc[0, 'Pct_Within']:.1f}% within variation")
print(f"  → Substantial time variation within firms (good for FE/FD)")
print(f"\n- 'value' has {var_decomp_df.loc[1, 'Pct_Within']:.1f}% within variation")
print(f"  → Most variation is between firms")
print(f"\n- 'capital' has {var_decomp_df.loc[2, 'Pct_Within']:.1f}% within variation")
print(f"  → Changes slowly within firms")

### 3.2 Visualization: Within vs Between Scatter

Let's visualize the difference between within and between variation:

In [None]:
# Create within vs between scatter plots
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Panel A: Between variation (entity means)
firm_means = data.groupby('firm')[['invest', 'value']].mean()
axes[0].scatter(firm_means['value'], firm_means['invest'], 
                s=100, alpha=0.7, edgecolors='black', linewidth=0.5)

# Add regression line
z = np.polyfit(firm_means['value'], firm_means['invest'], 1)
p = np.poly1d(z)
x_line = np.linspace(firm_means['value'].min(), firm_means['value'].max(), 100)
axes[0].plot(x_line, p(x_line), 'r--', linewidth=2, 
             label=f'BE slope = {be_results.params["value"]:.4f}')

axes[0].set_xlabel('Average Value (x̄_i)', fontsize=11)
axes[0].set_ylabel('Average Investment (ȳ_i)', fontsize=11)
axes[0].set_title('Between-Firm Variation (BE uses this)', fontsize=12, fontweight='bold')
axes[0].legend()
axes[0].grid(alpha=0.3)

# Panel B: Within variation (deviations from entity means)
data_dm = data.copy()
for col in ['invest', 'value']:
    data_dm[col + '_dm'] = data.groupby('firm')[col].transform(lambda x: x - x.mean())

axes[1].scatter(data_dm['value_dm'], data_dm['invest_dm'], 
                alpha=0.5, s=30, edgecolors='none')

# Add regression line
z = np.polyfit(data_dm['value_dm'], data_dm['invest_dm'], 1)
p = np.poly1d(z)
x_line = np.linspace(data_dm['value_dm'].min(), data_dm['value_dm'].max(), 100)
axes[1].plot(x_line, p(x_line), 'r--', linewidth=2, 
             label=f'FE slope = {fe_results.params["value"]:.4f}')

axes[1].axhline(0, color='black', linestyle='-', linewidth=0.8)
axes[1].axvline(0, color='black', linestyle='-', linewidth=0.8)
axes[1].set_xlabel('Value - Firm Mean (x_it - x̄_i)', fontsize=11)
axes[1].set_ylabel('Investment - Firm Mean (y_it - ȳ_i)', fontsize=11)
axes[1].set_title('Within-Firm Variation (FE/FD use this)', fontsize=12, fontweight='bold')
axes[1].legend()
axes[1].grid(alpha=0.3)

plt.tight_layout()
plt.show()

print("\nKEY INSIGHT:")
print("- Left panel: Cross-sectional differences (between firms)")
print("- Right panel: Time variation within each firm (centered at 0)")
print("- Slopes differ → unobserved heterogeneity (α_i) important")

---

## Section 4: Comparison of All Estimators

### 4.1 Comprehensive Comparison

Let's compare **all four estimators**:
1. **Pooled OLS**: Ignores panel structure
2. **Fixed Effects (FE)**: Within transformation
3. **First Difference (FD)**: Differencing transformation
4. **Between Estimator (BE)**: Entity means

This will show how different sources of variation lead to different estimates.

In [None]:
# Estimate all four models
pooled = pb.PooledOLS(
    "invest ~ value + capital", 
    data, 
    entity_col='firm', 
    time_col='year'
).fit(cov_type='clustered')

fe = pb.FixedEffects(
    "invest ~ value + capital", 
    data, 
    entity_col='firm', 
    time_col='year'
).fit(cov_type='clustered')

fd = pb.FirstDifferenceEstimator(
    "invest ~ value + capital", 
    data, 
    entity_col='firm', 
    time_col='year'
).fit(cov_type='clustered')

be = pb.BetweenEstimator(
    "invest ~ value + capital", 
    data, 
    entity_col='firm', 
    time_col='year'
).fit(cov_type='robust')

print("All models estimated successfully!")

In [None]:
# Compare coefficients
coef_comp = pd.DataFrame({
    'Pooled': pooled.params,
    'FE': fe.params,
    'FD': fd.params,
    'BE': be.params
})

print("="*70)
print("COEFFICIENT COMPARISON")
print("="*70)
print(coef_comp)

# Compare standard errors
se_comp = pd.DataFrame({
    'Pooled': pooled.std_errors,
    'FE': fe.std_errors,
    'FD': fd.std_errors,
    'BE': be.std_errors
})

print("\n" + "="*70)
print("STANDARD ERROR COMPARISON")
print("="*70)
print(se_comp)

# Compare R-squared
r2_comp = pd.DataFrame({
    'Estimator': ['Pooled', 'FE', 'FD', 'BE'],
    'R²': [pooled.r2, fe.r2, fd.r2, be.r2],
    'N': [pooled.nobs, fe.nobs, fd.nobs, be.nobs]
})

print("\n" + "="*70)
print("MODEL FIT COMPARISON")
print("="*70)
print(r2_comp.to_string(index=False))

**Key patterns**:
1. **Pooled** has largest sample, but biased (ignores α_i)
2. **FE** and **FD** similar (both use within variation)
3. **BE** different (uses between variation, smaller N)
4. BE > FE suggests **positive correlation** between x_it and α_i

### 4.2 Coefficient Plot with Confidence Intervals

Visualize coefficient estimates with 95% confidence intervals:

In [None]:
# Create coefficient plot with CIs
fig, axes = plt.subplots(1, 2, figsize=(14, 6))

vars_to_plot = ['value', 'capital']
estimators = ['Pooled', 'FE', 'FD', 'BE']
colors = ['#1f77b4', '#ff7f0e', '#2ca02c', '#d62728']

for i, var in enumerate(vars_to_plot):
    ax = axes[i]
    
    x_pos = np.arange(len(estimators))
    coefs = [coef_comp.loc[var, est] for est in estimators]
    ses = [se_comp.loc[var, est] for est in estimators]
    ci_lower = [c - 1.96*se for c, se in zip(coefs, ses)]
    ci_upper = [c + 1.96*se for c, se in zip(coefs, ses)]
    
    # Plot coefficients
    for j, (x, c, lower, upper, color) in enumerate(zip(x_pos, coefs, ci_lower, ci_upper, colors)):
        ax.plot([x, x], [lower, upper], color=color, linewidth=2)
        ax.scatter(x, c, s=100, color=color, zorder=3, edgecolors='black', linewidth=1)
    
    ax.axhline(0, color='black', linestyle='--', linewidth=0.8, alpha=0.5)
    ax.set_xticks(x_pos)
    ax.set_xticklabels(estimators, fontsize=10)
    ax.set_ylabel('Coefficient Estimate', fontsize=11)
    ax.set_title(f'Variable: {var}', fontsize=12, fontweight='bold')
    ax.grid(alpha=0.3, axis='y')

plt.suptitle('Comparison of Estimators (with 95% CI)', fontsize=14, fontweight='bold', y=1.02)
plt.tight_layout()
plt.show()

print("\nINTERPRETATION:")
print("- Vertical lines represent 95% confidence intervals")
print("- BE estimates often differ from FE/FD (unobserved heterogeneity)")
print("- Pooled may be biased if α_i correlated with x_it")

### 4.3 Summary Table

Let's create a comprehensive summary table:

In [None]:
# Create summary table
summary_table = pd.DataFrame({
    'Estimator': ['Pooled OLS', 'Fixed Effects', 'First Difference', 'Between'],
    'Transformation': ['None', 'Demeaning', 'Differencing', 'Entity means'],
    'Controls α_i': ['No', 'Yes', 'Yes', 'No'],
    'Sample size': [pooled.nobs, fe.nobs, fd.nobs, be.nobs],
    'β_value': [f"{pooled.params['value']:.4f}",
                f"{fe.params['value']:.4f}",
                f"{fd.params['value']:.4f}",
                f"{be.params['value']:.4f}"],
    'SE_value': [f"{pooled.std_errors['value']:.4f}",
                 f"{fe.std_errors['value']:.4f}",
                 f"{fd.std_errors['value']:.4f}",
                 f"{be.std_errors['value']:.4f}"],
    'R²': [f"{pooled.r2:.4f}", 
           f"{fe.r2:.4f}", 
           f"{fd.r2:.4f}", 
           f"{be.r2:.4f}"]
})

print("="*90)
print("COMPREHENSIVE ESTIMATOR COMPARISON")
print("="*90)
print(summary_table.to_string(index=False))

print("\n" + "="*90)
print("WHEN TO USE EACH ESTIMATOR:")
print("="*90)
print("Pooled OLS:        No unobserved heterogeneity (rare in panel data)")
print("Fixed Effects:     Standard choice, ε ~ i.i.d., controls α_i")
print("First Difference:  Serial correlation, unit roots, T small")
print("Between:           Long-run cross-sectional, time-invariant vars, exploratory")

---

## Section 5: Exercises

### Exercise 5.1: FD vs FE with Serial Correlation

**Objective**: Compare FD and FE performance when errors have AR(1) structure.

**Instructions**:
1. Simulate panel data with AR(1) errors: ε_it = ρ ε_{i,t-1} + η_it
2. Estimate both FE and FD
3. Compare bias in coefficient estimates
4. Which estimator is more robust to serial correlation?

In [None]:
# TODO: Exercise 5.1
# Simulate panel data with AR(1) errors

np.random.seed(42)
N = 100  # firms
T = 10   # time periods
rho = 0.7  # AR(1) coefficient
beta_true = 0.5

# Your code here:
# 1. Simulate x_it, α_i, ε_it with AR(1), y_it
# 2. Create DataFrame
# 3. Estimate FE and FD
# 4. Compare bias: |β̂ - β_true|

# Starter code:
# alpha_i = np.random.normal(0, 1, N)
# x_it = np.random.normal(0, 1, (N, T))
# epsilon_it = np.zeros((N, T))
# for t in range(1, T):
#     epsilon_it[:, t] = rho * epsilon_it[:, t-1] + np.random.normal(0, 0.5, N)
# ...

### Exercise 5.2: Variance Decomposition

**Objective**: Analyze variance decomposition for wage panel data.

**Instructions**:
1. Load `wage_panel` dataset (if available, otherwise use another dataset)
2. Compute variance decomposition for `experience` and `education`
3. Which variable has more within variation?
4. What implications does this have for FE vs BE?

In [None]:
# TODO: Exercise 5.2
# Analyze variance decomposition

# Your code here:
# 1. Load wage_panel data (or use Grunfeld if unavailable)
# 2. Apply variance_decomposition() to relevant variables
# 3. Create bar chart comparing within vs between percentages
# 4. Interpret results

# Starter code:
# data_wage = pb.load_dataset('wage_panel')  # or use another dataset
# var_decomp_wage = [
#     variance_decomposition(data_wage, 'person_id', 'experience'),
#     variance_decomposition(data_wage, 'person_id', 'education')
# ]
# ...

### Exercise 5.3: Hausman Test for FE vs FD

**Objective**: Test whether FE and FD give statistically different results.

**Instructions**:
1. Compute Hausman test statistic: H = (β̂_FE - β̂_FD)' [Var(β̂_FE) - Var(β̂_FD)]^{-1} (β̂_FE - β̂_FD)
2. Under H0: No serial correlation, H ~ χ²(k)
3. If reject, prefer FD (evidence of serial correlation)
4. Apply to Grunfeld data

In [None]:
# TODO: Exercise 5.3
# Implement Hausman test for FE vs FD

# Your code here:
# 1. Extract β̂_FE, β̂_FD and covariance matrices
# 2. Compute difference: d = β̂_FE - β̂_FD
# 3. Compute variance: V = Var(β̂_FE) - Var(β̂_FD)
# 4. Compute H = d' V^{-1} d
# 5. Compare to χ²(k) critical value

# Starter code:
# beta_diff = fe_results.params - fd_results.params
# var_fe = fe_results.cov
# var_fd = fd_results.cov
# var_diff = var_fe - var_fd
# H = beta_diff.T @ np.linalg.inv(var_diff) @ beta_diff
# p_value = 1 - stats.chi2.cdf(H, df=len(beta_diff))
# ...

---

## Section 6: Summary

### Key Takeaways

1. **First Difference (FD)**:
   - Eliminates α_i via differencing: Δy_it = β Δx_it + Δε_it
   - Robust to serial correlation and unit roots
   - Induces MA(1) in errors → use robust SE
   - Loses first observation per entity

2. **FD vs FE**:
   - T=2: Numerically equivalent
   - ε ~ i.i.d.: FE more efficient
   - ε ~ AR(1): FD consistent, FE inconsistent
   - Unit roots: FD consistent, FE not

3. **Between Estimator (BE)**:
   - Uses entity means: ȳ_i = β x̄_i + α + ū_i
   - Sample size N (not NT)
   - Cross-sectional relationships
   - Can include time-invariant variables

4. **FE vs BE interpretation**:
   - FE: "Within-firm changes"
   - BE: "Between-firm differences"
   - Different questions, different answers!

5. **Variance decomposition**:
   - Total = Within + Between
   - FE/FD need within variation > 0
   - Time-invariant vars have zero within variance

6. **Estimator choice**:
   - Serial correlation → FD
   - Unit roots → FD
   - Time-invariant vars → BE (or Random Effects)
   - Standard case → FE

### Decision Tree

```
Panel Data Model
│
├─ Unobserved heterogeneity (α_i) present?
│  ├─ No → Pooled OLS
│  └─ Yes
│     │
│     ├─ Serial correlation in ε_it?
│     │  ├─ Yes → First Difference (FD)
│     │  └─ No
│     │     │
│     │     ├─ T = 2? → FD or FE (equivalent)
│     │     └─ T > 2 → Fixed Effects (FE)
│     │
│     └─ Interest in cross-sectional or time-invariant vars?
│        └─ Yes → Between Estimator (BE) or Random Effects
```

### Next Steps

- **Notebook 05**: Panel IV (endogeneity treatment)
- **Notebook 06**: Random Effects and Hausman test
- **Advanced**: Dynamic panels with lagged dependent variables

---

## References

1. Wooldridge, J. M. (2010). *Econometric Analysis of Cross Section and Panel Data*. MIT Press. Chapter 10.
2. Baltagi, B. H. (2021). *Econometric Analysis of Panel Data*. Springer. Chapter 3.
3. Cameron, A. C., & Trivedi, P. K. (2005). *Microeconometrics: Methods and Applications*. Cambridge University Press. Chapter 21.
4. Hsiao, C. (2014). *Analysis of Panel Data*. Cambridge University Press. Chapter 3.

---

**Notebook complete!** You now understand when to use FD vs FE vs BE and how to decompose variance in panel data.