# Динамическая модель панельных данных

In [1]:
import pandas as pd
from linearmodels import PooledOLS # Pooled model
from linearmodels import PanelOLS # Fixed-effect model
from linearmodels import RandomEffects # Random-effect model
from linearmodels import IVGMM # GMM-method
from linearmodels.panel import compare # сравнение моделей

Рассмотрим панель `Wages` и ргерессию **lwage на ed, exp, exp^2, wks**

Спецификация днапической модели $lwage_{it}=\alpha+\gamma lwage_{i,t-1}+\beta_1ed_i+\beta_2exp_{it}+\beta_3exp^2_{it}+\beta_4wks_{it}+\mu_i+\varepsilon_{it}$

Метод оценивания Anderson-Hsiao:

* Записываем уравнение в первых разностях (удаляем постоынне во времени компоненты, FD-преобразование) 
$$\Delta lwage_{it}=\gamma\Delta lwage_{i,t-1}+\beta_1\Delta exp_{it}+\beta_2\Delta exp^2_{it}+\beta_3\Delta wks_{it}+error$$
* Используем GMM-оценки, выбирая $y_{i,t-2}$ или $\Delta y_{i,t-2}$ в качестве инструмента для $\Delta y_{i,t-1}$

*Замечание* т.к. $ed$ постоянно во времени, то $\Delta ed=0$. Кроме того $\Delta exp=1$

In [2]:
# Загрузим данные
wages = pd.read_csv('https://raw.githubusercontent.com/artamonoff/Econometrica/master/panel-analysis/panels-csv/Wages.csv')
wages.head()

Unnamed: 0,exp,wks,bluecol,ind,south,smsa,married,sex,union,ed,black,lwage,id,time
0,3,32,no,0,yes,no,yes,male,no,9,no,5.56068,1,1
1,4,43,no,0,yes,no,yes,male,no,9,no,5.72031,1,2
2,5,40,no,0,yes,no,yes,male,no,9,no,5.99645,1,3
3,6,39,no,0,yes,no,yes,male,no,9,no,5.99645,1,4
4,7,42,no,1,yes,no,yes,male,no,9,no,6.06146,1,5


In [3]:
# Преобразуем в панель
wages_panel = wages.set_index(['id', 'time'])
wages_panel

Unnamed: 0_level_0,Unnamed: 1_level_0,exp,wks,bluecol,ind,south,smsa,married,sex,union,ed,black,lwage
id,time,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
1,1,3,32,no,0,yes,no,yes,male,no,9,no,5.56068
1,2,4,43,no,0,yes,no,yes,male,no,9,no,5.72031
1,3,5,40,no,0,yes,no,yes,male,no,9,no,5.99645
1,4,6,39,no,0,yes,no,yes,male,no,9,no,5.99645
1,5,7,42,no,1,yes,no,yes,male,no,9,no,6.06146
...,...,...,...,...,...,...,...,...,...,...,...,...,...
595,3,3,50,no,0,no,yes,no,female,no,12,no,5.95324
595,4,4,49,no,0,no,yes,no,female,no,12,no,6.06379
595,5,5,50,no,0,no,yes,no,female,no,12,no,6.21461
595,6,6,50,no,0,no,yes,no,female,no,12,no,6.29157


Подготовми переменные для FD-ураневния
* Зависимая переменная $\Delta lwage_{it}$ (`d_lwage`)
* Лаг зависимой переменной $\Delta lwage_{i,t-1}$ (`lad_d_lwage`)
* Регрессоры $\Delta exp_{it},\Delta exp^2_{it},\Delta wks_{it}$ (`d_exp`, `d_exp_sq`, `d_wks`)
* Инструмент $lwage_{i,t-2}$ (`lag2_lwage`)

In [4]:
wages_panel['exp_sq'] = wages_panel['exp']**2
wages_panel[['d_lwage','d_exp', 'd_exp_sq', 'd_wks']] = wages_panel.groupby(level=0)[['lwage', 'exp', 'exp_sq' ,'wks']].diff()
wages_panel['lag_d_lwage'] = wages_panel.groupby(level=0)['d_lwage'].shift()
wages_panel['lag2_lwage'] = wages_panel.groupby(level=0)['lwage'].shift(periods=2)
wages_panel.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,exp,wks,bluecol,ind,south,smsa,married,sex,union,ed,black,lwage,exp_sq,d_lwage,d_exp,d_exp_sq,d_wks,lag_d_lwage,lag2_lwage
id,time,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1
1,1,3,32,no,0,yes,no,yes,male,no,9,no,5.56068,9,,,,,,
1,2,4,43,no,0,yes,no,yes,male,no,9,no,5.72031,16,0.15963,1.0,7.0,11.0,,
1,3,5,40,no,0,yes,no,yes,male,no,9,no,5.99645,25,0.27614,1.0,9.0,-3.0,0.15963,5.56068
1,4,6,39,no,0,yes,no,yes,male,no,9,no,5.99645,36,0.0,1.0,11.0,-1.0,0.27614,5.72031
1,5,7,42,no,1,yes,no,yes,male,no,9,no,6.06146,49,0.06501,1.0,13.0,3.0,0.0,5.99645


Оцениам модель через спецификация. Обратим внимание как учитывается инструмент `lag2_lwage` для `lag_d_lwage`

*Замечание* метод `.dropna()` используем для удаления наблюдения с пропущенными значениями (инае не работает!)

In [5]:
dyn_panel = IVGMM.from_formula(formula='d_lwage~[lag_d_lwage~lag2_lwage]+d_exp+d_exp_sq+d_wks', 
                               data=wages_panel.dropna()).fit()
dyn_panel.params

d_exp          0.103166
d_exp_sq      -0.000346
d_wks         -0.000674
lag_d_lwage    0.081159
Name: parameter, dtype: float64

In [6]:
wages_panel['lag_lwage'] = wages_panel.groupby(level=0)['lwage'].shift()

In [8]:
# Сравним с оценкой FE
fe_panel = PanelOLS.from_formula(formula='lwage~exp+exp_sq+wks+lag_lwage+EntityEffects', data=wages_panel).fit()
fe_panel.params

Inputs contain missing values. Dropping rows with missing observations.
  super().__init__(dependent, exog, weights=weights, check_rank=check_rank)


exp          0.087843
exp_sq      -0.000250
lag_lwage    0.172299
wks          0.000544
Name: parameter, dtype: float64