### Double Machine Learning for Static Panel Models with Fixed Effects

Extending the partially linear model to panel data by introducing fixed effects $\alpha_i$ to give the partially linear panel regression (PLPR) model.

Partialled-out PLPR (PO-PLPR) model:

\begin{align*}
    Y_{it} &= \theta_0 D_{it} + g_0(X_{it}) + \alpha_i + U_{it} \\
    D_{it} &= m_0(X_{it}) + \gamma_i + V_{it}
\end{align*}

- $Y_{it}$ outcome, $D_{it}$ treatment, $X_{it}$ covariates, $\theta_0$ causal treatment effect
- $g_0$ and $m_0$ nuisance functions
- $\alpha_i$, $\gamma_i$ unobserved individual heterogeneity, correlated with covariates
- $U_{it}$, $V_{it}$ error terms

Further note $E[U_{it} \mid D_{it}, X_{it}, \alpha_i] = 0$, but $E[\alpha_i \mid D_{it}, X_{it}] \neq 0$, and $E[V_{it} \mid X_{it}, \gamma_i]=0$

#### 1 Correlated Random Effect Approach

##### 1.1. General case:

- Learning $g_0$ from $\{ Y_{it}, X_{it}, \bar{X}_i : t=1,\dots, T \}_{i=1}^N$
- First learning $\tilde{m}_0({\cdot})$ from $\{ D_{it}, X_{it}, \bar{X}_i : t=1,\dots, T \}_{i=1}^N$ with prediction $\hat{m}_{0it} = \tilde{m}_0 (X_{it}, \bar{X}_i) $
    - Calculate $\hat{\bar{m}}_i = T^{-1} \sum_{t=1}^T \hat{m}_{0it} $
    - Calculate final nuisance part as $ \hat{m}^*_0 (X_{it}, \bar{X}_i, \bar{D}_i) = \hat{m}_{0it} + \bar{D}_i - \hat{\bar{m}}_i $ 

##### 1.2. Normal assumption:

(conditional distribution $ D_{i1}, \dots, D_{iT} \mid X_{i1}, \dots X_{iT} $ is multivariate normal)
- Learn $m^*_{0}$ from $\{ D_{it}, X_{it}, \bar{X}_i, \bar{D}_i: t=1,\dots, T \}_{i=1}^N$

#### 2. Transformation Approaches

##### 2.1. First Difference (FD) Transformation - Exact

Consider FD transformation $Q(Y_{it})= Y_{it} - Y_{it-1} $, under Assumptions 3.1-3.5, transformed nuisance function can be learnt as

- $ \Delta g_0 (X_{it-1}, X_{it}) $ from $ \{ Y_{it}-Y_{it-1}, X_{it-1}, X_{it} : t=2, \dots , T \}_{i=1}^N $
- $ \Delta m_0 (X_{it-1}, X_{it}) $ from $ \{ D_{it}-D_{it-1}, X_{it-1}, X_{it} : t=2, \dots , T \}_{i=1}^N $

##### 2.2. Within Group (WG) Transformation - Approximate

For WG transformation $Q(X_{it})= X_{it} - \bar{X}_{i} $, where $ \bar{X}_{i} = T^{-1} \sum_{t=1}^T X_{it} $. Approximate model
\begin{align*}
    Q(Y_{it}) &\approx \theta_0 Q(D_{it}) + g_0 (Q(X_{it})) + Q(U_{it}) \\
    Q(D_{it}) &\approx m_0 (Q(X_{it})) + Q(V_{it})
\end{align*}

- $g_0$ can be learnt from transformed data $ \{ Q(Y_{it}), Q(X_{it}) : t=1,\dots,T \}_{i=1}^N $
- $m_0$ can be learnt from transformed data $ \{ Q(D_{it}), Q(X_{it}) : t=1,\dots,T \}_{i=1}^N $

#### Implementation

- Using block-k-fold cross-fitting, where the entire time series of the sampled unit is allocated to one fold to allow for possible serial correlation
within each unit as is common with panel data

- Cluster robust standard error

$\Rightarrow$ using id variable as cluster for DML

In [1]:
import numpy as np
import pandas as pd
import statsmodels.api as sm
from doubleml.data.base_data import DoubleMLData 
from doubleml.data.panel_data import DoubleMLPanelData
from doubleml.plm.plpr import DoubleMLPLPR
from sklearn.linear_model import LassoCV, LinearRegression
from sklearn.base import clone
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import make_pipeline
from doubleml.plm.utils._plpr_util import cre_fct, fd_fct, wd_fct, extend_data
from doubleml.plm.datasets.dgp_static_panel_CP2025 import make_static_panel_CP2025
import warnings
warnings.filterwarnings("ignore")

In [2]:
np.random.seed(1)
data = make_static_panel_CP2025(dgp_type='dgp1')

x_cols = [col for col in data.columns if "x" in col]

X = sm.add_constant(data[['d'] + x_cols])
y = data['y']
clusters = data['id']

ols_model = sm.OLS(y, X).fit(cov_type='cluster', cov_kwds={'groups': clusters})
ols_model.params['d'], ols_model.conf_int().loc['d'][0], ols_model.conf_int().loc['d'][1]

(np.float64(0.6719371174913912),
 np.float64(0.6090488219157397),
 np.float64(0.7348254130670426))

In [None]:
# cre general

# np.random.seed(1)
data = make_static_panel_CP2025(dgp_type='dgp1')
cre_data = cre_fct(data)

x_cols = [col for col in cre_data.columns if "x" in col]

learner = LassoCV()
ml_l = clone(learner)
ml_m = clone(learner)

obj_panel = DoubleMLPanelData(cre_data,
                              y_col='y',
                              d_cols='d',
                              t_col='time',
                              id_col='id',
                              x_cols=x_cols,
                              static_panel=True)

dml_panel_plpr = DoubleMLPLPR(obj_panel, ml_l=ml_l, ml_m=ml_m,
                              approach='cre_general', n_folds=5
                              )
dml_panel_plpr.fit()
print(dml_panel_plpr.summary)

       coef   std err          t          P>|t|     2.5 %    97.5 %
d  0.456003  0.020668  22.063024  7.163019e-108  0.415494  0.496512


In [4]:
data = make_static_panel_CP2025(dgp_type='dgp1')

x_cols = [col for col in data.columns if "x" in col]

learner = LassoCV()
ml_l = clone(learner)
ml_m = clone(learner)

obj_panel = DoubleMLPanelData(data,
                              y_col='y',
                              d_cols='d',
                              t_col='time',
                              id_col='id',
                              x_cols=x_cols,
                              static_panel=True)

dml_panel_plpr = DoubleMLPLPR(obj_panel, ml_l=ml_l, ml_m=ml_m,
                              approach='cre_general', n_folds=5
                              )
dml_panel_plpr.fit()
print(dml_panel_plpr.summary)

       coef   std err          t          P>|t|     2.5 %    97.5 %
d  0.515035  0.020056  25.679326  1.989663e-145  0.475725  0.554345


In [4]:
dml_panel_plpr.smpls

[[(array([   0,    1,    2, ..., 2497, 2498, 2499], shape=(2000,)),
   array([  60,   61,   62,   63,   64,   65,   66,   67,   68,   69,  100,
           101,  102,  103,  104,  105,  106,  107,  108,  109,  110,  111,
           112,  113,  114,  115,  116,  117,  118,  119,  160,  161,  162,
           163,  164,  165,  166,  167,  168,  169,  180,  181,  182,  183,
           184,  185,  186,  187,  188,  189,  270,  271,  272,  273,  274,
           275,  276,  277,  278,  279,  280,  281,  282,  283,  284,  285,
           286,  287,  288,  289,  310,  311,  312,  313,  314,  315,  316,
           317,  318,  319,  360,  361,  362,  363,  364,  365,  366,  367,
           368,  369,  410,  411,  412,  413,  414,  415,  416,  417,  418,
           419,  630,  631,  632,  633,  634,  635,  636,  637,  638,  639,
           720,  721,  722,  723,  724,  725,  726,  727,  728,  729,  770,
           771,  772,  773,  774,  775,  776,  777,  778,  779,  810,  811,
           812,  813

In [12]:
# model rmse

# u_hat = dml_panel_plpr._dml_data.y - dml_panel_plpr.predictions['ml_l'].flatten()
# v_hat = dml_panel_plpr._dml_data.d - dml_panel_plpr.predictions['ml_m'].flatten()

# np.sqrt(np.mean(np.square(u_hat - (dml_panel_plpr.coef[0] * v_hat))))

In [12]:
print(dml_panel_plpr)


------------------ Data Summary      ------------------
Outcome variable: y
Treatment variable(s): ['d']
Covariates: ['x1', 'x2', 'x3', 'x4', 'x5', 'x6', 'x7', 'x8', 'x9', 'x10', 'x11', 'x12', 'x13', 'x14', 'x15', 'x16', 'x17', 'x18', 'x19', 'x20', 'x21', 'x22', 'x23', 'x24', 'x25', 'x26', 'x27', 'x28', 'x29', 'x30', 'm_x1', 'm_x2', 'm_x3', 'm_x4', 'm_x5', 'm_x6', 'm_x7', 'm_x8', 'm_x9', 'm_x10', 'm_x11', 'm_x12', 'm_x13', 'm_x14', 'm_x15', 'm_x16', 'm_x17', 'm_x18', 'm_x19', 'm_x20', 'm_x21', 'm_x22', 'm_x23', 'm_x24', 'm_x25', 'm_x26', 'm_x27', 'm_x28', 'm_x29', 'm_x30']
Instrument variable(s): None
Time variable: time
Id variable: id
Static panel data: True
No. Unique Ids: 250
No. Observations: 2500


------------------ Score & Algorithm ------------------
Score function: partialling out
Static panel model approach: cre_general

------------------ Machine Learner   ------------------
Learner ml_l: LassoCV()
Learner ml_m: LassoCV()
Out-of-sample Performance:
Regression:
Learner ml_l

In [9]:
print(dml_panel_plpr)


------------------ Data summary      ------------------
Outcome variable: y
Treatment variable(s): ['d']
Covariates: ['x1', 'x2', 'x3', 'x4', 'x5', 'x6', 'x7', 'x8', 'x9', 'x10', 'x11', 'x12', 'x13', 'x14', 'x15', 'x16', 'x17', 'x18', 'x19', 'x20', 'x21', 'x22', 'x23', 'x24', 'x25', 'x26', 'x27', 'x28', 'x29', 'x30', 'm_x1', 'm_x2', 'm_x3', 'm_x4', 'm_x5', 'm_x6', 'm_x7', 'm_x8', 'm_x9', 'm_x10', 'm_x11', 'm_x12', 'm_x13', 'm_x14', 'm_x15', 'm_x16', 'm_x17', 'm_x18', 'm_x19', 'm_x20', 'm_x21', 'm_x22', 'm_x23', 'm_x24', 'm_x25', 'm_x26', 'm_x27', 'm_x28', 'm_x29', 'm_x30']
Instrument variable(s): None
Time variable: time
Id variable: id
Static panel data: True
No. Unique Ids: 250
No. Observations: 2500

------------------ Score & algorithm ------------------
Score function: partialling out
Static panel model approach: cre_general

------------------ Machine learner   ------------------
Learner ml_l: LassoCV()
Learner ml_m: LassoCV()
Out-of-sample Performance:
Regression:
Learner ml_l 

In [16]:
# cre normality assumption
data = make_static_panel_CP2025(dgp_type='dgp1')
cre_data = cre_fct(data)

x_cols = [col for col in cre_data.columns if "x" in col]

obj_dml_data_pdml = DoubleMLPanelData(cre_data,
                                      y_col='y',
                                      d_cols='d',
                                      t_col='time',
                                      id_col='id',
                                      x_cols=x_cols,
                                      static_panel=True)

# learner = LassoCV()
learner = make_pipeline(StandardScaler(), LassoCV())

ml_l = clone(learner)
ml_m = clone(learner)

obj_dml_plpr = DoubleMLPLPR(obj_dml_data_pdml, ml_l, ml_m, approach='cre_normal')
obj_dml_plpr.fit()
print(obj_dml_plpr.summary)

       coef   std err          t          P>|t|    2.5 %    97.5 %
d  0.503726  0.022932  21.965879  6.106437e-107  0.45878  0.548672


In [25]:
data = make_static_panel_CP2025(dgp_type='dgp1')
fd_data = fd_fct(data)

obj_dml_data_pdml = DoubleMLPanelData(fd_data,
                                 y_col='y_diff',
                                 d_cols='d_diff',
                                 t_col='time',
                                 id_col='id',
                                 x_cols=[col for col in fd_data.columns if col.startswith("x")],
                                 static_panel=True)

learner = LassoCV()
ml_l = clone(learner)
ml_m = clone(learner)

obj_dml_plpr = DoubleMLPLPR(obj_dml_data_pdml, ml_l, ml_m, approach='fd_exact')
obj_dml_plpr.fit()
print(obj_dml_plpr.summary)

            coef   std err          t         P>|t|     2.5 %    97.5 %
d_diff  0.512959  0.025253  20.313176  9.833638e-92  0.463465  0.562453


In [5]:
data = make_static_panel_CP2025(dgp_type='dgp1')

obj_dml_data_pdml = DoubleMLPanelData(data,
                                 y_col='y',
                                 d_cols='d',
                                 t_col='time',
                                 id_col='id',
                                 x_cols=[col for col in data.columns if col.startswith("x")],
                                 static_panel=True)

learner = LassoCV()
ml_l = clone(learner)
ml_m = clone(learner)

obj_dml_plpr = DoubleMLPLPR(obj_dml_data_pdml, ml_l, ml_m, approach='fd_exact')
obj_dml_plpr.fit()
print(obj_dml_plpr.summary)

AssertionError: 

In [31]:
data = make_static_panel_CP2025(dgp_type='dgp1')
wd_data = wd_fct(data)

obj_dml_data_pdml = DoubleMLPanelData(wd_data,
                                      y_col='y',
                                      d_cols='d',
                                      t_col='time',
                                      id_col='id',
                                      x_cols=[col for col in wd_data.columns if col.startswith("x")],
                                      static_panel=True)

learner = LassoCV()
ml_l = clone(learner)
ml_m = clone(learner)

obj_dml_plpr = DoubleMLPLPR(obj_dml_data_pdml, ml_l, ml_m, approach='wg_approx')
obj_dml_plpr.fit()
print(obj_dml_plpr.summary)

       coef   std err          t          P>|t|     2.5 %    97.5 %
d  0.513121  0.020666  24.828987  4.361657e-136  0.472616  0.553626


In [20]:
n_reps = 100
theta = 0.5

learner = make_pipeline(StandardScaler(), LassoCV())

res_cre_general = np.full((n_reps, 3), np.nan)
res_cre_normal = np.full((n_reps, 3), np.nan)
res_fd = np.full((n_reps, 3), np.nan)
res_wd = np.full((n_reps, 3), np.nan)

np.random.seed(1)

for i in range(n_reps):
    print(f"\rProcessing: {round((i+1)/n_reps*100, 3)} %", end="")
    data = make_static_panel_CP2025(num_n=100, theta=theta, dgp_type='dgp1')

    # CRE general Lasso
    cre_data = cre_fct(data)
    dml_data = DoubleMLPanelData(cre_data, y_col='y', d_cols='d', t_col='time', id_col='id', 
                                 x_cols=[col for col in cre_data.columns if "x" in col],
                                 static_panel=True)
    dml_plpr = DoubleMLPLPR(dml_data, clone(learner), clone(learner), n_folds=5, 
                            approach='cre_general')
    dml_plpr.fit()
    res_cre_general[i, 0] = dml_plpr.coef[0]
    res_cre_general[i, 1] = dml_plpr.coef[0] - theta
    confint = dml_plpr.confint()
    res_cre_general[i, 2] = (confint['2.5 %'].iloc[0] <= theta) & (confint['97.5 %'].iloc[0] >= theta)

    # CRE normality
    dml_plpr = DoubleMLPLPR(dml_data, clone(learner), clone(learner), n_folds=5, 
                            approach='cre_normal')
    dml_plpr.fit()
    res_cre_normal[i, 0] = dml_plpr.coef[0]
    res_cre_normal[i, 1] = dml_plpr.coef[0] - theta
    confint = dml_plpr.confint()
    res_cre_normal[i, 2] = (confint['2.5 %'].iloc[0] <= theta) & (confint['97.5 %'].iloc[0] >= theta)

    # FD approach
    fd_data = fd_fct(data)
    dml_data = DoubleMLPanelData(fd_data, y_col='y_diff', d_cols='d_diff', t_col='time', id_col='id',
                                 x_cols=[col for col in fd_data.columns if col.startswith("x")],
                                 static_panel=True)
    dml_plpr = DoubleMLPLPR(dml_data, clone(learner), clone(learner), n_folds=5, 
                            approach='fd_exact')
    dml_plpr.fit()
    res_fd[i, 0] = dml_plpr.coef[0]
    res_fd[i, 1] = dml_plpr.coef[0] - theta
    confint = dml_plpr.confint()
    res_fd[i, 2] = (confint['2.5 %'].iloc[0] <= theta) & (confint['97.5 %'].iloc[0] >= theta)
    
    # WD approach
    wd_data = wd_fct(data)
    dml_data = DoubleMLPanelData(wd_data, y_col='y', d_cols='d', t_col='time', id_col='id',
                                 x_cols=[col for col in wd_data.columns if "x" in col],
                                 static_panel=True)
    dml_plpr = DoubleMLPLPR(dml_data, clone(learner), clone(learner), n_folds=5, 
                            approach='wg_approx')
    dml_plpr.fit()
    res_wd[i, 0] = dml_plpr.coef[0]
    res_wd[i, 1] = dml_plpr.coef[0] - theta
    confint = dml_plpr.confint()
    res_wd[i, 2] = (confint['2.5 %'].iloc[0] <= theta) & (confint['97.5 %'].iloc[0] >= theta)

Processing: 100.0 %

In [21]:
pd.DataFrame(np.vstack([res_cre_general.mean(axis=0), res_cre_normal.mean(axis=0), 
                        res_fd.mean(axis=0), res_wd.mean(axis=0)]), 
                        columns=['Coef', 'Bias', 'Coverage'], 
                        index=['CRE general', 'CRE normal', 
                               'FD exact', 'WG approx'])

Unnamed: 0,Coef,Bias,Coverage
CRE general,0.516684,0.016684,0.92
CRE normal,0.541518,0.041518,0.78
FD exact,0.504094,0.004094,0.94
WG approx,0.502006,0.002006,0.94
