
# Panel Data Methods — Fixed Effects (FE) & Random Effects (RE) — Python Notebook

**When to Use**  
- You have **repeated observations** for entities over time (stores, geos, users) and need to control for **unobserved, time‑invariant heterogeneity**.  
- Estimating effects of time‑varying regressors (price, promotions, ad spend) while accounting for **entity‑specific baselines**.

**Best Application**  
- Measuring **advertising or price elasticity** across stores/geos over weeks.  
- Policy changes / experiments staggered across entities, with **time fixed effects** to absorb common shocks.

**When Not to Use**  
- Cross‑section only (no panel) → use standard regression.  
- Effects driven by **time‑invariant regressors** are not identifiable in FE (use RE or between estimators, or interact with time).  
- Severe **dynamic panel bias** with lagged dependent variables; consider Arellano–Bond/GMM.

**How to Interpret Results**  
- **FE coefficients** measure within‑entity effects (e.g., how changing price *within* a store relates to sales).  
- **RE coefficients** combine within & between variation, assuming entity effects are **uncorrelated** with regressors.  
- Use a **Hausman test** to check if RE is consistent; if rejected, prefer FE.


In [None]:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import statsmodels.api as sm
import statsmodels.formula.api as smf

pd.set_option('display.max_columns', 200)
plt.rcParams['figure.figsize'] = (8,4)
rng = np.random.default_rng(2025)


### Data: Synthetic store‑week panel with price, promo, and ad spend

In [None]:

n_stores = 60
n_weeks = 52

store_ids = np.arange(n_stores)
weeks = np.arange(1, n_weeks+1)

rows = []
for s in store_ids:
    store_fe = rng.normal(0, 15)   # unobserved store baseline (e.g., location quality)
    ad_effect = rng.normal(0.08, 0.02)  # allow mild heterogeneity on ad response
    for t in weeks:
        time_fe = 10*np.sin(2*np.pi*t/52) + 5*np.cos(2*np.pi*t/26)  # seasonal time effects
        price = 9 + rng.normal(0, 0.7)
        promo = rng.binomial(1, 0.22)
        adsp = max(0, rng.normal(50, 12))
        # DGP: sales depend on price(-), promo(+), adsp(+ diminishing), plus FE and time FE
        sales = (200 + store_fe + time_fe
                 - 8.0*price
                 + 12.0*promo
                 + 0.25*np.sqrt(adsp)
                 + rng.normal(0, 8))
        rows.append({'store': f"S{s:02d}", 'week': t, 'sales': sales,
                     'price': price, 'promo': promo, 'ad_spend': adsp})

df = pd.DataFrame(rows)
df.head()


### Fixed Effects: Entity and Time FE via OLS with Dummies (Two‑Way FE)

In [None]:

# Two-way FE: absorb store and week effects using dummies
# (Within estimator equivalent when using full set of dummies)
fe_formula = "sales ~ price + promo + np.sqrt(ad_spend) + C(store) + C(week)"
fe_model = smf.ols(fe_formula, data=df).fit(cov_type='cluster', cov_kwds={'groups': df['store']})
print(fe_model.summary().tables[1])


### Random Effects: Random Intercepts with Mixed Effects (MixedLM)

In [None]:

# Mixed effects with random intercept for store and fixed time dummies
# Use week FE to absorb common shocks, random intercept to capture store heterogeneity
re_formula = "sales ~ price + promo + np.sqrt(ad_spend) + C(week)"
# MixedLM requires separate groups column
md = sm.MixedLM.from_formula(re_formula, groups="store", re_formula="1", data=df)
re_model = md.fit()
re_model.summary()


### Compare Coefficients (FE vs RE)

In [None]:

coef_fe = fe_model.params[['price','promo','np.sqrt(ad_spend)']]
se_fe = fe_model.bse[['price','promo','np.sqrt(ad_spend)']]

coef_re = re_model.params[['price','promo','np.sqrt(ad_spend)']]
se_re = re_model.bse[['price','promo','np.sqrt(ad_spend)']]

pd.DataFrame({
    'FE_coef': coef_fe.round(4),
    'FE_se(clustered)': se_fe.round(4),
    'RE_coef': coef_re.round(4),
    'RE_se': se_re.round(4),
})


### Hausman Test: Is RE Consistent vs FE?

In [None]:

# Hausman statistic H = (b_RE - b_FE)' [Var(b_RE) - Var(b_FE)]^{-1} (b_RE - b_FE)
import numpy as np

b_fe = coef_fe.values
b_re = coef_re.values

V_fe = np.diag(se_fe.values**2)
V_re = np.diag(se_re.values**2)
V_diff = V_re - V_fe

diff = (b_re - b_fe).reshape(-1,1)

# Regularize if needed for invertibility
eps = 1e-8
try:
    Vinv = np.linalg.inv(V_diff + eps*np.eye(V_diff.shape[0]))
    H = float(diff.T @ Vinv @ diff)
    df_h = len(b_fe)
    from scipy.stats import chi2
    pval = 1 - chi2.cdf(H, df_h)
except np.linalg.LinAlgError:
    H, pval, df_h = np.nan, np.nan, len(b_fe)

{'Hausman_stat': H, 'df': df_h, 'p_value': pval}


### Interpretation: Within‑Store Effects and Practical Use

In [None]:

print("FE interpretation: coefficients reflect within‑store changes holding store and week effects constant.")
print("Price (expected negative), Promo (positive lift), sqrt(ad_spend) (diminishing returns).")

# Example: effect of +$10 ad spend at avg level
avg_adsp = df['ad_spend'].mean()
beta_ad = fe_model.params['np.sqrt(ad_spend)']
d_sales = beta_ad * (np.sqrt(avg_adsp+10) - np.sqrt(avg_adsp))
print(f"Approx ΔSales from +$10 ad spend (at avg level): {d_sales:.3f}")



---

### Practical Guidance
- Prefer **two‑way FE** (entity + time) to absorb unobserved heterogeneity and common shocks.  
- Use **cluster‑robust SE** at the entity level.  
- Check **Hausman**: if RE is rejected, stick with FE; otherwise RE (or mixed models) may be more efficient.  
- For time‑invariant regressors, consider **RE** or **between** estimators.

### References (non‑link citations)
1. Wooldridge — *Econometric Analysis of Cross Section and Panel Data*.  
2. Greene — *Econometric Analysis*.  
3. Angrist & Pischke — *Mostly Harmless Econometrics*.
