# Dynamic panel regression (Anderson-Hsiao estimator)

In [None]:
import numpy as np
import pandas as pd

from linearmodels import PooledOLS          # Pooled model
from linearmodels import RandomEffects      # Random-effect model
from linearmodels import PanelOLS           # Fixed-effect model
from linearmodels import FirstDifferenceOLS # First difference model
from linearmodels import IVGMM              # IV-method

Consider panel `Wages` and regression **lwage на ed, exp, exp^2, wks**

Specification of the dynamic panel regression 

$$
	lwage_{it}=\alpha+\gamma lwage_{i,t-1}+\beta_1ed_i+\beta_2exp_{it}+\beta_3exp^2_{it}+\beta_4wks_{it}+\mu_i+\varepsilon_{it}
$$

Anderson-Hsiao's estimation method:

* We rewrite the model in first differences (we eliminate time-invariant components, FD-transformation) 

$$
	\Delta lwage_{it}=\gamma\Delta lwage_{i,t-1}+\beta_1\Delta exp_{it}+\beta_2\Delta exp^2_{it}+\beta_3\Delta wks_{it}+error
$$
* We use IV-estimators, consider $y_{i,t-2}$ or $\Delta y_{i,t-2}$ as instrumental variables for  $\Delta y_{i,t-1}$

*Remark* since $ed$ is time-invariant, then $\Delta ed=0$. Moreover $\Delta exp=1$

In [None]:
df = pd.read_csv('./panels-plm/Wages.csv')
df.head()

In [None]:
panel_df = df.set_index(['id', 'time'])
panel_df.head()

Let's prepare variable for the FD-equation

* the dependent variable $\Delta lwage_{it}$ (`d_lwage`)
* lag of the dependent variable $\Delta lwage_{i,t-1}$ (`lad_d_lwage`)
* predictors $\Delta exp_{it},\Delta exp^2_{it},\Delta wks_{it}$ (`d_exp`, `d_exp_sq`, `d_wks`)
* instrumental variable $lwage_{i,t-2}$ (`lag2_lwage`)

In [None]:
panel_df['exp_sq'] = panel_df['exp']**2
panel_df[['d_lwage','d_exp', 'd_exp_sq', 'd_wks']] = panel_df.groupby(level=0)[['lwage', 'exp', 'exp_sq' ,'wks']].diff()
panel_df['lag_d_lwage'] = panel_df.groupby(level=0)['d_lwage'].shift()
panel_df['lag2_lwage'] = panel_df.groupby(level=0)['lwage'].shift(periods=2)
panel_df.head()

We fit the model via the formula. 

Please, pay attention how to introduce instrument `lag2_lwage` for `lag_d_lwage` in the formula

*Remark* we call `.dropna()` to delete missing values (IVGMM cannot do it by default!)

In [None]:
mod_dyn = IVGMM.from_formula(formula='d_lwage~[lag_d_lwage~lag2_lwage]+d_exp+d_exp_sq+d_wks', data=panel_df.dropna())

res_dyn = mod_dyn.fit()
res_dyn.params.round(3)

## Dynamic vs FE vs RE vs FD

In [None]:
panel_df['lag_lwage'] = panel_df.groupby(level=0)['lwage'].shift()

In [None]:
# FE & RE & FD estimators
mod_re = RandomEffects.from_formula(formula='lwage~1+exp+exp_sq+wks+lag_lwage', data=panel_df)
mod_fe = PanelOLS.from_formula(formula='lwage~1+exp+exp_sq+wks+lag_lwage+EntityEffects', data=panel_df)
mod_fd = FirstDifferenceOLS.from_formula(formula='lwage~exp+exp_sq+wks+lag_lwage', data=panel_df)

res_re = mod_re.fit()
res_fe = mod_fe.fit()
res_fd = mod_fd.fit()

# compare({'Dyn': res_dyn, 'RE': res_re, 'FE': res_fe, 'FD':res_fd}, stars=True)
print(res_dyn.params.round(3))
print(res_re.params.round(3))
print(res_fe.params.round(3))
print(res_fd.params.round(3))