### Double Machine Learning for Static Panel Models with Fixed Effects

Extending the partially linear model to panel data by introducing fixed effects $\alpha_i$ to give the partially linear panel regression (PLPR) model.

Partialled-out PLPR (PO-PLPR) model:

\begin{align*}
    Y_{it} &= \theta_0 D_{it} + g_0(X_{it}) + \alpha_i + U_{it} \\
    D_{it} &= m_0(X_{it}) + \gamma_i + V_{it}
\end{align*}

- $Y_{it}$ outcome, $D_{it}$ treatment, $X_{it}$ covariates, $\theta_0$ causal treatment effect
- $g_0$ and $m_0$ nuisance functions
- $\alpha_i$, $\gamma_i$ unobserved individual heterogeneity, correlated with covariates
- $U_{it}$, $V_{it}$ error terms

Further note $E[U_{it} \mid D_{it}, X_{it}, \alpha_i] = 0$, but $E[\alpha_i \mid D_{it}, X_{it}] \neq 0$, and $E[V_{it} \mid X_{it}, \gamma_i]=0$

#### 1 Correlated Random Effect Approach

##### 1.1. General case:

- Learning $g_0$ from $\{ Y_{it}, X_{it}, \bar{X}_i : t=1,\dots, T \}_{i=1}^N$
- First learning $\tilde{m}_0({\cdot})$ from $\{ D_{it}, X_{it}, \bar{X}_i : t=1,\dots, T \}_{i=1}^N$ with prediction $\hat{m}_{0it} = \tilde{m}_0 (X_{it}, \bar{X}_i) $
    - Calculate $\hat{\bar{m}}_i = T^{-1} \sum_{t=1}^T \hat{m}_{0it} $
    - Calculate final nuisance part as $ \hat{m}^*_0 (X_{it}, \bar{X}_i, \bar{D}_i) = \hat{m}_{0it} + \bar{D}_i - \hat{\bar{m}}_i $ 

##### 1.2. Normal assumption:

(conditional distribution $ D_{i1}, \dots, D_{iT} \mid X_{i1}, \dots X_{iT} $ is multivariate normal)
- Learn $m^*_{0}$ from $\{ D_{it}, X_{it}, \bar{X}_i, \bar{D}_i: t=1,\dots, T \}_{i=1}^N$

#### 2. Transformation Approaches

##### 2.1. First Difference (FD) Transformation - Exact

Consider FD transformation $Q(Y_{it})= Y_{it} - Y_{it-1} $, under Assumptions 3.1-3.5, transformed nuisance function can be learnt as

- $ \Delta g_0 (X_{it-1}, X_{it}) $ from $ \{ Y_{it}-Y_{it-1}, X_{it-1}, X_{it} : t=2, \dots , T \}_{i=1}^N $
- $ \Delta m_0 (X_{it-1}, X_{it}) $ from $ \{ D_{it}-D_{it-1}, X_{it-1}, X_{it} : t=2, \dots , T \}_{i=1}^N $

##### 2.2. Within Group (WG) Transformation - Approximate

For WG transformation $Q(X_{it})= X_{it} - \bar{X}_{i} $, where $ \bar{X}_{i} = T^{-1} \sum_{t=1}^T X_{it} $. Approximate model
\begin{align*}
    Q(Y_{it}) &\approx \theta_0 Q(D_{it}) + g_0 (Q(X_{it})) + Q(U_{it}) \\
    Q(D_{it}) &\approx m_0 (Q(X_{it})) + Q(V_{it})
\end{align*}

- $g_0$ can be learnt from transformed data $ \{ Q(Y_{it}), Q(X_{it}) : t=1,\dots,T \}_{i=1}^N $
- $m_0$ can be learnt from transformed data $ \{ Q(D_{it}), Q(X_{it}) : t=1,\dots,T \}_{i=1}^N $

#### Implementation

- Using block-k-fold cross-fitting, where the entire time series of the sampled unit is allocated to one fold to allow for possible serial correlation
within each unit as is common with panel data

- Cluster robust standard error

$\Rightarrow$ using id variable as cluster for DML

In [1]:
import numpy as np
import pandas as pd
import statsmodels.api as sm
from doubleml.data.base_data import DoubleMLData
from doubleml.plm.plpr import DoubleMLPLPR
from sklearn.linear_model import LassoCV, LinearRegression
from sklearn.base import clone
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import make_pipeline
from doubleml.plm.utils._plpr_util import cre_fct, fd_fct, wd_fct, extend_data
from doubleml.plm.datasets.dgp_static_panel_CP2025 import make_static_panel_CP2025
import warnings
import warnings
warnings.filterwarnings("ignore")

In [18]:
np.random.seed(1)
data = make_static_panel_CP2025(dgp_type='dgp1')

x_cols = [col for col in data.columns if "x" in col]

X = sm.add_constant(data[['d'] + x_cols])
y = data['y']
clusters = data['id']

ols_model = sm.OLS(y, X).fit(cov_type='cluster', cov_kwds={'groups': clusters})
ols_model.params['d'], ols_model.conf_int().loc['d'][0], ols_model.conf_int().loc['d'][1]

(np.float64(0.6719371174913912),
 np.float64(0.6090488219157397),
 np.float64(0.7348254130670426))

In [19]:
# cre general
data = make_static_panel_CP2025(dgp_type='dgp1')
cre_data = cre_fct(data)

x_cols = [col for col in cre_data.columns if "x" in col]

obj_dml_data_pdml = DoubleMLData(cre_data,
                                 y_col='y',
                                 d_cols='d',
                                 cluster_cols='id',
                                 x_cols=x_cols)

learner = LassoCV()
ml_l = clone(learner)
ml_m = clone(learner)

obj_dml_plpr = DoubleMLPLPR(obj_dml_data_pdml, ml_l=ml_l, ml_m=ml_m,
                            pdml_approach='cre_general', n_folds=5
                            )

obj_dml_plpr.fit()
print(obj_dml_plpr.summary)

       coef   std err          t          P>|t|     2.5 %    97.5 %
d  0.519934  0.020986  24.774727  1.678998e-135  0.478802  0.561067


In [None]:
# model rmse

# u_hat = obj_dml_plpr._dml_data.y - obj_dml_plpr.predictions['ml_l'].flatten()
# v_hat = obj_dml_plpr._dml_data.d - obj_dml_plpr.predictions['ml_m'].flatten()

# np.sqrt(np.mean((u_hat - (obj_dml_plpr.coef[0] * v_hat))**2))

In [4]:
print(obj_dml_plpr)


------------------ Data summary      ------------------
Outcome variable: y
Treatment variable(s): ['d']
Covariates: ['x1', 'x2', 'x3', 'x4', 'x5', 'x6', 'x7', 'x8', 'x9', 'x10', 'x11', 'x12', 'x13', 'x14', 'x15', 'x16', 'x17', 'x18', 'x19', 'x20', 'x21', 'x22', 'x23', 'x24', 'x25', 'x26', 'x27', 'x28', 'x29', 'x30', 'm_x1', 'm_x2', 'm_x3', 'm_x4', 'm_x5', 'm_x6', 'm_x7', 'm_x8', 'm_x9', 'm_x10', 'm_x11', 'm_x12', 'm_x13', 'm_x14', 'm_x15', 'm_x16', 'm_x17', 'm_x18', 'm_x19', 'm_x20', 'm_x21', 'm_x22', 'm_x23', 'm_x24', 'm_x25', 'm_x26', 'm_x27', 'm_x28', 'm_x29', 'm_x30']
Instrument variable(s): None
Cluster variable(s): ['id']
Is cluster data: True
No. Observations: 2500

------------------ Score & algorithm ------------------
Score function: partialling out
Static panel model approach: cre_general

------------------ Machine learner   ------------------
Learner ml_l: LassoCV()
Learner ml_m: LassoCV()
Out-of-sample Performance:
Regression:
Learner ml_l RMSE: [[1.63784321]]
Learner m

Using Panel Data Class

In [10]:
from doubleml.data.panel_data import DoubleMLPanelData

data = make_static_panel_CP2025(dgp_type='dgp1')
cre_data = cre_fct(data)

x_cols = [col for col in cre_data.columns if "x" in col]

obj_dml_data_pdml = DoubleMLPanelData(cre_data,
                                      y_col='y',
                                      d_cols='d',
                                      t_col='time',
                                      id_col='id',
                                      x_cols=x_cols,
                                      static_panel=True)

print(obj_dml_data_pdml)


------------------ Data summary      ------------------
Outcome variable: y
Treatment variable(s): ['d']
Covariates: ['x1', 'x2', 'x3', 'x4', 'x5', 'x6', 'x7', 'x8', 'x9', 'x10', 'x11', 'x12', 'x13', 'x14', 'x15', 'x16', 'x17', 'x18', 'x19', 'x20', 'x21', 'x22', 'x23', 'x24', 'x25', 'x26', 'x27', 'x28', 'x29', 'x30', 'm_x1', 'm_x2', 'm_x3', 'm_x4', 'm_x5', 'm_x6', 'm_x7', 'm_x8', 'm_x9', 'm_x10', 'm_x11', 'm_x12', 'm_x13', 'm_x14', 'm_x15', 'm_x16', 'm_x17', 'm_x18', 'm_x19', 'm_x20', 'm_x21', 'm_x22', 'm_x23', 'm_x24', 'm_x25', 'm_x26', 'm_x27', 'm_x28', 'm_x29', 'm_x30']
Instrument variable(s): None
Time variable: time
Id variable: id
No. Unique Ids: 250
No. Observations: 2500

------------------ DataFrame info    ------------------
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2500 entries, 0 to 2499
Columns: 65 entries, id to m_x30
dtypes: float64(63), int64(2)
memory usage: 1.2 MB



In [5]:
# cre normality assumption
data = make_static_panel_CP2025(dgp_type='dgp1')
cre_data = cre_fct(data)

x_cols = [col for col in cre_data.columns if "x" in col]

obj_dml_data_pdml = DoubleMLData(cre_data,
                                 y_col='y',
                                 d_cols='d',
                                 cluster_cols='id',
                                 x_cols=x_cols)

# learner = LassoCV()
learner = make_pipeline(StandardScaler(), LassoCV())

ml_l = clone(learner)
ml_m = clone(learner)

obj_dml_plpr = DoubleMLPLPR(obj_dml_data_pdml, ml_l, ml_m, pdml_approach='cre')
obj_dml_plpr.fit()
print(obj_dml_plpr.summary)

       coef  std err          t          P>|t|     2.5 %    97.5 %
d  0.548085  0.02097  26.136802  1.392306e-150  0.506985  0.589186


In [6]:
data = make_static_panel_CP2025(dgp_type='dgp1')
fd_data = fd_fct(data)

obj_dml_data_pdml = DoubleMLData(fd_data,
                                 y_col='y_diff',
                                 d_cols='d_diff',
                                 cluster_cols='id',
                                 x_cols=[col for col in fd_data.columns if col.startswith("x")])

learner = LassoCV()
ml_l = clone(learner)
ml_m = clone(learner)

obj_dml_plpr = DoubleMLPLPR(obj_dml_data_pdml, ml_l, ml_m, pdml_approach='transform')
obj_dml_plpr.fit()
print(obj_dml_plpr.summary)

            coef   std err         t         P>|t|     2.5 %    97.5 %
d_diff  0.492487  0.025352  19.42565  4.684180e-84  0.442797  0.542176


In [10]:
data = make_static_panel_CP2025(dgp_type='dgp1')
wd_data = wd_fct(data)

obj_dml_data_pdml = DoubleMLData(wd_data,
                                 y_col='y',
                                 d_cols='d',
                                 cluster_cols='id',
                                 x_cols=[col for col in wd_data.columns if col.startswith("x")])

learner = LassoCV()
ml_l = clone(learner)
ml_m = clone(learner)

obj_dml_plpr = DoubleMLPLPR(obj_dml_data_pdml, ml_l, ml_m, pdml_approach='transform')
obj_dml_plpr.fit()
print(obj_dml_plpr.summary)

     coef   std err          t          P>|t|     2.5 %    97.5 %
d  0.5089  0.019913  25.555893  4.722065e-144  0.469871  0.547929


In [13]:
n_reps = 100
theta = 0.5

learner = make_pipeline(StandardScaler(), LassoCV())

leaner_ols = LinearRegression()

res_cre_ols = np.full((n_reps, 3), np.nan)
res_cre_general = np.full((n_reps, 3), np.nan)
res_cre_normal = np.full((n_reps, 3), np.nan)
res_fd = np.full((n_reps, 3), np.nan)
res_fd_cluster = np.full((n_reps, 3), np.nan)
res_wd = np.full((n_reps, 3), np.nan)

np.random.seed(1)

for i in range(n_reps):
    print(f"\rProcessing: {round((i+1)/n_reps*100, 3)} %", end="")
    data = make_static_panel_CP2025(num_n=100, theta=theta, dgp_type='dgp1')

    # CRE general OLS
    cre_data = cre_fct(data)
    dml_data = DoubleMLData(cre_data, y_col='y', d_cols='d', cluster_cols='id', 
                            x_cols=[col for col in cre_data.columns if "x" in col])
    dml_plpr = DoubleMLPLPR(dml_data, clone(leaner_ols), clone(leaner_ols), n_folds=5, pdml_approach='cre_general')
    dml_plpr.fit()
    res_cre_ols[i, 0] = dml_plpr.coef[0]
    res_cre_ols[i, 1] = dml_plpr.coef[0] - theta
    confint = dml_plpr.confint()
    res_cre_ols[i, 2] = (confint['2.5 %'].iloc[0] <= theta) & (confint['97.5 %'].iloc[0] >= theta)

    # CRE general Lasso
    dml_plpr = DoubleMLPLPR(dml_data, clone(learner), clone(learner), n_folds=5, pdml_approach='cre_general')
    dml_plpr.fit()
    res_cre_general[i, 0] = dml_plpr.coef[0]
    res_cre_general[i, 1] = dml_plpr.coef[0] - theta
    confint = dml_plpr.confint()
    res_cre_general[i, 2] = (confint['2.5 %'].iloc[0] <= theta) & (confint['97.5 %'].iloc[0] >= theta)

    # CRE normality
    dml_data = DoubleMLData(cre_data, y_col='y', d_cols='d', cluster_cols='id', 
                            x_cols=[col for col in cre_data.columns if "x" in col]
                                   )
    dml_plpr = DoubleMLPLPR(dml_data, clone(learner), clone(learner), n_folds=5, pdml_approach='cre')
    dml_plpr.fit()
    res_cre_normal[i, 0] = dml_plpr.coef[0]
    res_cre_normal[i, 1] = dml_plpr.coef[0] - theta
    confint = dml_plpr.confint()
    res_cre_normal[i, 2] = (confint['2.5 %'].iloc[0] <= theta) & (confint['97.5 %'].iloc[0] >= theta)

    # FD approach
    fd_data = fd_fct(data)
    dml_data = DoubleMLData(fd_data, y_col='y_diff', d_cols='d_diff', cluster_cols='id',
                            x_cols=[col for col in fd_data.columns if col.startswith("x")])
    dml_plpr = DoubleMLPLPR(dml_data, clone(learner), clone(learner), n_folds=5, pdml_approach='transform')
    dml_plpr.fit()
    res_fd[i, 0] = dml_plpr.coef[0]
    res_fd[i, 1] = dml_plpr.coef[0] - theta
    confint = dml_plpr.confint()
    res_fd[i, 2] = (confint['2.5 %'].iloc[0] <= theta) & (confint['97.5 %'].iloc[0] >= theta)
    # no cluster
    dml_data = DoubleMLData(fd_data, y_col='y_diff', d_cols='d_diff',
                            x_cols=[col for col in fd_data.columns if col.startswith("x")])
    dml_plpr = DoubleMLPLPR(dml_data, clone(learner), clone(learner), n_folds=5, pdml_approach='transform')
    dml_plpr.fit()
    res_fd_cluster[i, 0] = dml_plpr.coef[0]
    res_fd_cluster[i, 1] = dml_plpr.coef[0] - theta
    confint = dml_plpr.confint()
    res_fd_cluster[i, 2] = (confint['2.5 %'].iloc[0] <= theta) & (confint['97.5 %'].iloc[0] >= theta)
    
    # WD approach
    wd_data = wd_fct(data)
    dml_data = DoubleMLData(wd_data, y_col='y', d_cols='d', cluster_cols='id',
                            x_cols=[col for col in wd_data.columns if "x" in col])
    dml_plpr = DoubleMLPLPR(dml_data, clone(learner), clone(learner), n_folds=5, pdml_approach='transform')
    dml_plpr.fit()
    res_wd[i, 0] = dml_plpr.coef[0]
    res_wd[i, 1] = dml_plpr.coef[0] - theta
    confint = dml_plpr.confint()
    res_wd[i, 2] = (confint['2.5 %'].iloc[0] <= theta) & (confint['97.5 %'].iloc[0] >= theta)

Processing: 100.0 %

In [16]:
pd.DataFrame(np.vstack([res_cre_ols.mean(axis=0), res_cre_general.mean(axis=0), res_cre_normal.mean(axis=0), 
                        res_fd.mean(axis=0), res_fd_cluster.mean(axis=0), res_wd.mean(axis=0)]), 
                        columns=['Coef', 'Bias', 'Coverage'], 
                        index=['CRE OLS', 'CRE general', 'CRE normality', 
                               'FD', 'FD no cluster', 'WD'])

Unnamed: 0,Coef,Bias,Coverage
CRE OLS,0.498516,-0.001484,0.94
CRE general,0.517304,0.017304,0.91
CRE normality,0.540535,0.040535,0.8
FD,0.504695,0.004695,0.95
FD no cluster,0.503954,0.003954,0.87
WD,0.502402,0.002402,0.93
