### Double Machine Learning for Static Panel Models with Fixed Effects

Extending the partially linear model to panel data by introducing fixed effects $\alpha_i$ to give the partially linear panel regression (PLPR) model.

Partialled-out PLPR (PO-PLPR) model:

\begin{align*}
    Y_{it} &= \theta_0 D_{it} + g_0(X_{it}) + \alpha_i + U_{it} \\
    D_{it} &= m_0(X_{it}) + \gamma_i + V_{it}
\end{align*}

- $Y_{it}$ outcome, $D_{it}$ treatment, $X_{it}$ covariates, $\theta_0$ causal treatment effect
- $g_0$ and $m_0$ nuisance functions
- $\alpha_i$, $\gamma_i$ unobserved individual heterogeneity, correlated with covariates
- $U_{it}$, $V_{it}$ error terms

Further note $E[U_{it} \mid D_{it}, X_{it}, \alpha_i] = 0$, but $E[\alpha_i \mid D_{it}, X_{it}] \neq 0$, and $E[V_{it} \mid X_{it}, \gamma_i]=0$

#### 1 Correlated Random Effect Approach

##### 1.1. General case:

- Learning $g_0$ from $\{ Y_{it}, X_{it}, \bar{X}_i : t=1,\dots, T \}_{i=1}^N$
- First learning $\tilde{m}_0({\cdot})$ from $\{ D_{it}, X_{it}, \bar{X}_i : t=1,\dots, T \}_{i=1}^N$ with prediction $\hat{m}_{0it} = \tilde{m}_0 (X_{it}, \bar{X}_i) $
    - Calculate $\hat{\bar{m}}_i = T^{-1} \sum_{t=1}^T \hat{m}_{0it} $
    - Calculate final nuisance part as $ \hat{m}^*_0 (X_{it}, \bar{X}_i, \bar{D}_i) = \hat{m}_{0it} + \bar{D}_i - \hat{\bar{m}}_i $ 

##### 1.2. Normal assumption:

(conditional distribution $ D_{i1}, \dots, D_{iT} \mid X_{i1}, \dots X_{iT} $ is multivariate normal)
- Learn $m^*_{0}$ from $\{ D_{it}, X_{it}, \bar{X}_i, \bar{D}_i: t=1,\dots, T \}_{i=1}^N$

#### 2. Transformation Approaches

##### 2.1. First Difference (FD) Transformation - Exact

Consider FD transformation $Q(Y_{it})= Y_{it} - Y_{it-1} $, under Assumptions 3.1-3.5, transformed nuisance function can be learnt as

- $ \Delta g_0 (X_{it-1}, X_{it}) $ from $ \{ Y_{it}-Y_{it-1}, X_{it-1}, X_{it} : t=2, \dots , T \}_{i=1}^N $
- $ \Delta m_0 (X_{it-1}, X_{it}) $ from $ \{ D_{it}-D_{it-1}, X_{it-1}, X_{it} : t=2, \dots , T \}_{i=1}^N $

##### 2.2. Within Group (WG) Transformation - Approximate

For WG transformation $Q(X_{it})= X_{it} - \bar{X}_{i} $, where $ \bar{X}_{i} = T^{-1} \sum_{t=1}^T X_{it} $. Approximate model
\begin{align*}
    Q(Y_{it}) &\approx \theta_0 Q(D_{it}) + g_0 (Q(X_{it})) + Q(U_{it}) \\
    Q(D_{it}) &\approx m_0 (Q(X_{it})) + Q(V_{it})
\end{align*}

- $g_0$ can be learnt from transformed data $ \{ Q(Y_{it}), Q(X_{it}) : t=1,\dots,T \}_{i=1}^N $
- $m_0$ can be learnt from transformed data $ \{ Q(D_{it}), Q(X_{it}) : t=1,\dots,T \}_{i=1}^N $

#### Implementation

- Using block-k-fold cross-fitting, where the entire time series of the sampled unit is allocated to one fold to allow for possible serial correlation
within each unit as is common with panel data

- Cluster robust standard error

$\Rightarrow$ using id variable as cluster for DML

In [1]:
import numpy as np
import pandas as pd
import statsmodels.api as sm
from doubleml.data.base_data import DoubleMLData 
from doubleml.data.panel_data import DoubleMLPanelData
from doubleml.plm.plpr import DoubleMLPLPR
from sklearn.linear_model import LassoCV, LinearRegression
from sklearn.base import clone
from sklearn.preprocessing import StandardScaler
from sklearn.preprocessing import PolynomialFeatures
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import make_pipeline
from doubleml.plm.utils._plpr_util import cre_fct, fd_fct, wd_fct, extend_data
from doubleml.plm.datasets.dgp_static_panel_CP2025 import make_static_panel_CP2025
import warnings
warnings.filterwarnings("ignore")

In [3]:
np.random.seed(1)
data = make_static_panel_CP2025(dgp_type='dgp1')

x_cols = [col for col in data.columns if "x" in col]

X = sm.add_constant(data[['d'] + x_cols])
y = data['y']
clusters = data['id']

ols_model = sm.OLS(y, X).fit(cov_type='cluster', cov_kwds={'groups': clusters})
ols_model.params['d'], ols_model.conf_int().loc['d'][0], ols_model.conf_int().loc['d'][1]

(np.float64(0.6719371174913908),
 np.float64(0.6090488219157394),
 np.float64(0.7348254130670423))

In [None]:
# cre general
data = make_static_panel_CP2025(dgp_type='dgp1')
cre_data = cre_fct(data)

learner = LassoCV()
ml_l = clone(learner)
ml_m = clone(learner)

panel_data_obj = DoubleMLPanelData(data,
                                   y_col='y',
                                   d_cols='d',
                                   t_col='time',
                                   id_col='id',
                                   x_cols=[col for col in data.columns if "x" in col],
                                   static_panel=True)

dml_plpr_obj = DoubleMLPLPR(panel_data_obj, ml_l=ml_l, ml_m=ml_m, approach='cre_general', n_folds=5)
                            
dml_plpr_obj.fit()
print(dml_plpr_obj.summary)

       coef   std err          t          P>|t|     2.5 %    97.5 %
d  0.519934  0.020986  24.774727  1.678998e-135  0.478802  0.561067


In [5]:
# model rmse

# u_hat = dml_panel_plpr._dml_data.y - dml_panel_plpr.predictions['ml_l'].flatten()
# v_hat = dml_panel_plpr._dml_data.d - dml_panel_plpr.predictions['ml_m'].flatten()

# np.sqrt(np.mean(np.square(u_hat - (dml_panel_plpr.coef[0] * v_hat))))

In [6]:
print(dml_plpr_obj)


------------------ Data Summary      ------------------
Outcome variable: y
Treatment variable(s): ['d']
Covariates: ['x1', 'x2', 'x3', 'x4', 'x5', 'x6', 'x7', 'x8', 'x9', 'x10', 'x11', 'x12', 'x13', 'x14', 'x15', 'x16', 'x17', 'x18', 'x19', 'x20', 'x21', 'x22', 'x23', 'x24', 'x25', 'x26', 'x27', 'x28', 'x29', 'x30', 'x1_mean', 'x2_mean', 'x3_mean', 'x4_mean', 'x5_mean', 'x6_mean', 'x7_mean', 'x8_mean', 'x9_mean', 'x10_mean', 'x11_mean', 'x12_mean', 'x13_mean', 'x14_mean', 'x15_mean', 'x16_mean', 'x17_mean', 'x18_mean', 'x19_mean', 'x20_mean', 'x21_mean', 'x22_mean', 'x23_mean', 'x24_mean', 'x25_mean', 'x26_mean', 'x27_mean', 'x28_mean', 'x29_mean', 'x30_mean']
Instrument variable(s): None
Time variable: time
Id variable: id
Static panel data: True
No. Unique Ids: 250
No. Observations: 2500


------------------ Score & Algorithm ------------------
Score function: partialling out
Static panel model approach: cre_general

------------------ Machine Learner   ------------------
Learner m

In [56]:
# cre general, extend features
data = make_static_panel_CP2025(dgp_type='dgp1')

panel_data_obj = DoubleMLPanelData(data,
                                   y_col='y',
                                   d_cols='d',
                                   t_col='time',
                                   id_col='id',
                                   x_cols=[col for col in data.columns if "x" in col],
                                   static_panel=True)

n_features = len(panel_data_obj.x_cols)
#
indices_x = [i for i in range(n_features)]
indices_x_mean = [i for i in range(n_features, 2 * n_features)]

preprocessor = ColumnTransformer([
    ('poly_x', make_pipeline(
        PolynomialFeatures(degree=2, include_bias=False, interaction_only=False)
    ), indices_x),       
    ('poly_x_mean', make_pipeline(
        PolynomialFeatures(degree=2, include_bias=False, interaction_only=False)
    ), indices_x_mean) 
])

learner = make_pipeline(
    preprocessor,
    StandardScaler(),
    LassoCV()
)

ml_l = clone(learner)
ml_m = clone(learner)

dml_plpr_obj = DoubleMLPLPR(panel_data_obj, ml_l=ml_l, ml_m=ml_m, approach='cre_general', n_folds=5)
dml_plpr_obj.fit(store_models=True)
print(dml_plpr_obj.summary)

       coef   std err          t          P>|t|     2.5 %    97.5 %
d  0.551055  0.020727  26.586753  9.659490e-156  0.510432  0.591679


In [62]:
dml_plpr_obj.models['ml_l']['d'][0][0].named_steps['lassocv'].n_features_in_

990

In [69]:
# dml_plpr_obj.models['ml_l']['d'][0][0].named_steps['columntransformer']['poly_x'].get_feature_names_out()

In [None]:
# cre normality assumption
data = make_static_panel_CP2025(dgp_type='dgp1')
cre_data = cre_fct(data)

panel_data_obj = DoubleMLPanelData(data,
                                   y_col='y',
                                   d_cols='d',
                                   t_col='time',
                                   id_col='id',
                                   x_cols=[col for col in data.columns if "x" in col],
                                   static_panel=True)

# learner = LassoCV()
learner = make_pipeline(StandardScaler(), LassoCV())

ml_l = clone(learner)
ml_m = clone(learner)

dml_plpr_obj = DoubleMLPLPR(panel_data_obj, ml_l, ml_m, approach='cre_normal')
dml_plpr_obj.fit()
print(dml_plpr_obj.summary)

       coef  std err          t          P>|t|     2.5 %    97.5 %
d  0.548085  0.02097  26.136802  1.392306e-150  0.506985  0.589186


In [8]:
# First difference approach
data = make_static_panel_CP2025(dgp_type='dgp1')
fd_data = fd_fct(data)

panel_data_obj = DoubleMLPanelData(data,
                                   y_col='y',
                                   d_cols='d',
                                   t_col='time',
                                   id_col='id',
                                   x_cols=[col for col in data.columns if "x" in col],
                                   static_panel=True)

learner = LassoCV()
ml_l = clone(learner)
ml_m = clone(learner)

dml_plpr_obj = DoubleMLPLPR(panel_data_obj, ml_l, ml_m, approach='fd_exact')

dml_plpr_obj.fit()
print(dml_plpr_obj.summary)

            coef   std err         t         P>|t|     2.5 %    97.5 %
d_diff  0.492487  0.025352  19.42565  4.684180e-84  0.442797  0.542176


In [None]:
# Within group approach
data = make_static_panel_CP2025(dgp_type='dgp1')
wd_data = wd_fct(data)

panel_data_obj = DoubleMLPanelData(data,
                                   y_col='y',
                                   d_cols='d',
                                   t_col='time',
                                   id_col='id',
                                   x_cols=[col for col in data.columns if "x" in col],
                                   static_panel=True)

learner = LassoCV()
ml_l = clone(learner)
ml_m = clone(learner)

dml_plpr_obj = DoubleMLPLPR(panel_data_obj, ml_l, ml_m, approach='wg_approx')
dml_plpr_obj.fit()
print(dml_plpr_obj.summary)

              coef   std err          t          P>|t|     2.5 %   97.5 %
d_demean  0.545586  0.021263  25.659316  3.327986e-145  0.503912  0.58726


In [19]:
# Within group approach, polynomials
data = make_static_panel_CP2025(dgp_type='dgp1')

panel_data_obj = DoubleMLPanelData(data,
                                   y_col='y',
                                   d_cols='d',
                                   t_col='time',
                                   id_col='id',
                                   x_cols=[col for col in data.columns if "x" in col],
                                   static_panel=True)

# learner = LassoCV()

# preprocessor = ColumnTransformer([
#     ('poly', make_pipeline(
#         PolynomialFeatures(degree=2, include_bias=False, interaction_only=False),
#         StandardScaler()
#     ), ['x1', 'x2']),       # Columns to expand
#     ('pass', 'passthrough', ['cat'])  # Columns to keep unchanged
# ])

# learner = make_pipeline(
#     PolynomialFeatures(degree=2, include_bias=False, interaction_only=False),
#     StandardScaler(),
#     LassoCV()
# )

preprocessor = ColumnTransformer([
    ('poly', make_pipeline(
        PolynomialFeatures(degree=2, include_bias=False)
    ), [0, 1])
], remainder='passthrough')

learner = make_pipeline(
    preprocessor,
    StandardScaler(),
    LassoCV()
)

ml_l = clone(learner)
ml_m = clone(learner)

dml_plpr_obj = DoubleMLPLPR(panel_data_obj, ml_l, ml_m, approach='wg_approx')
dml_plpr_obj.fit(store_models=True)
print(dml_plpr_obj.summary)

# dml_plpr_obj.transform_cols['x_cols']

              coef   std err          t         P>|t|     2.5 %    97.5 %
d_demean  0.472861  0.022779  20.758218  1.033280e-95  0.428214  0.517507


In [41]:
x_cols_tranform = dml_plpr_obj.transform_cols['x_cols']

x_cols_for_poly = ['x1_demean', 'x2_demean', 'x12_demean']

indices = [i for i, c in enumerate(x_cols_tranform) if c in x_cols_for_poly]
indices

[0, 1, 11]

In [32]:
dml_plpr_obj.models['ml_l']['d_demean'][0][0].named_steps['lassocv'].n_features_in_

33

In [38]:
dml_plpr_obj.models['ml_l']['d_demean'][0][0].named_steps['columntransformer']['poly'].get_feature_names_out()

array(['x0', 'x1', 'x0^2', 'x0 x1', 'x1^2'], dtype=object)

In [42]:
# dml_plpr_obj.models['ml_l']['d_demean'][0][0].named_steps['polynomialfeatures'].get_feature_names_out()

In [None]:
# simulation with built-in transformations

n_reps = 100
theta = 0.5

learner = make_pipeline(StandardScaler(), LassoCV())

res_cre_general = np.full((n_reps, 3), np.nan)
res_cre_normal = np.full((n_reps, 3), np.nan)
res_fd = np.full((n_reps, 3), np.nan)
res_wd = np.full((n_reps, 3), np.nan)

np.random.seed(1)

for i in range(n_reps):
    print(f"\rProcessing: {round((i+1)/n_reps*100, 3)} %", end="")
    data = make_static_panel_CP2025(num_n=100, theta=theta, dgp_type='dgp1')

    dml_data = DoubleMLPanelData(data, y_col='y', d_cols='d', t_col='time', id_col='id', 
                                 x_cols=[col for col in data.columns if "x" in col],
                                 static_panel=True)
    
    # CRE general Lasso
    dml_plpr = DoubleMLPLPR(dml_data, clone(learner), clone(learner), n_folds=5, 
                            approach='cre_general')
    dml_plpr.fit()
    res_cre_general[i, 0] = dml_plpr.coef[0]
    res_cre_general[i, 1] = dml_plpr.coef[0] - theta
    confint = dml_plpr.confint()
    res_cre_general[i, 2] = (confint['2.5 %'].iloc[0] <= theta) & (confint['97.5 %'].iloc[0] >= theta)

    # CRE normality
    dml_plpr = DoubleMLPLPR(dml_data, clone(learner), clone(learner), n_folds=5, 
                            approach='cre_normal')
    dml_plpr.fit()
    res_cre_normal[i, 0] = dml_plpr.coef[0]
    res_cre_normal[i, 1] = dml_plpr.coef[0] - theta
    confint = dml_plpr.confint()
    res_cre_normal[i, 2] = (confint['2.5 %'].iloc[0] <= theta) & (confint['97.5 %'].iloc[0] >= theta)

    # FD approach
    dml_plpr = DoubleMLPLPR(dml_data, clone(learner), clone(learner), n_folds=5, 
                            approach='fd_exact')
    dml_plpr.fit()
    res_fd[i, 0] = dml_plpr.coef[0]
    res_fd[i, 1] = dml_plpr.coef[0] - theta
    confint = dml_plpr.confint()
    res_fd[i, 2] = (confint['2.5 %'].iloc[0] <= theta) & (confint['97.5 %'].iloc[0] >= theta)
    
    # WD approach, for now need new data object as FD approach overwrites _cluster_vars
    dml_plpr = DoubleMLPLPR(dml_data, clone(learner), clone(learner), n_folds=5, 
                            approach='wg_approx')
    dml_plpr.fit()
    res_wd[i, 0] = dml_plpr.coef[0]
    res_wd[i, 1] = dml_plpr.coef[0] - theta
    confint = dml_plpr.confint()
    res_wd[i, 2] = (confint['2.5 %'].iloc[0] <= theta) & (confint['97.5 %'].iloc[0] >= theta)


pd.DataFrame(np.vstack([res_cre_general.mean(axis=0), res_cre_normal.mean(axis=0), 
                    res_fd.mean(axis=0), res_wd.mean(axis=0)]), 
                    columns=['Coef', 'Bias', 'Coverage'], 
                    index=['CRE general', 'CRE normal', 
                            'FD exact', 'WG approx'])

Processing: 100.0 %

Unnamed: 0,Coef,Bias,Coverage
CRE general,0.516684,0.016684,0.92
CRE normal,0.541518,0.041518,0.78
FD exact,0.504094,0.004094,0.94
WG approx,0.502006,0.002006,0.94


In [13]:
n_reps = 100
theta = 0.5

learner = LinearRegression()

res_cre_general = np.full((n_reps, 3), np.nan)
res_cre_normal = np.full((n_reps, 3), np.nan)
res_fd = np.full((n_reps, 3), np.nan)
res_wd = np.full((n_reps, 3), np.nan)

np.random.seed(12)

for i in range(n_reps):
    print(f"\rProcessing: {round((i+1)/n_reps*100, 3)} %", end="")
    data = make_static_panel_CP2025(num_n=100, theta=theta, dgp_type='dgp1')

    dml_data = DoubleMLPanelData(data, y_col='y', d_cols='d', t_col='time', id_col='id', 
                                 x_cols=[col for col in data.columns if "x" in col],
                                 static_panel=True)
    
    # CRE general Lasso
    dml_plpr = DoubleMLPLPR(dml_data, clone(learner), clone(learner), n_folds=5, 
                            approach='cre_general')
    dml_plpr.fit()
    res_cre_general[i, 0] = dml_plpr.coef[0]
    res_cre_general[i, 1] = dml_plpr.coef[0] - theta
    confint = dml_plpr.confint()
    res_cre_general[i, 2] = (confint['2.5 %'].iloc[0] <= theta) & (confint['97.5 %'].iloc[0] >= theta)

    # CRE normality
    dml_plpr = DoubleMLPLPR(dml_data, clone(learner), clone(learner), n_folds=5, 
                            approach='cre_normal')
    dml_plpr.fit()
    res_cre_normal[i, 0] = dml_plpr.coef[0]
    res_cre_normal[i, 1] = dml_plpr.coef[0] - theta
    confint = dml_plpr.confint()
    res_cre_normal[i, 2] = (confint['2.5 %'].iloc[0] <= theta) & (confint['97.5 %'].iloc[0] >= theta)

    # FD approach
    dml_plpr = DoubleMLPLPR(dml_data, clone(learner), clone(learner), n_folds=5, 
                            approach='fd_exact')
    dml_plpr.fit()
    res_fd[i, 0] = dml_plpr.coef[0]
    res_fd[i, 1] = dml_plpr.coef[0] - theta
    confint = dml_plpr.confint()
    res_fd[i, 2] = (confint['2.5 %'].iloc[0] <= theta) & (confint['97.5 %'].iloc[0] >= theta)
    
    # WD approach, for now need new data object as FD approach overwrites _cluster_vars
    dml_plpr = DoubleMLPLPR(dml_data, clone(learner), clone(learner), n_folds=5, 
                            approach='wg_approx')
    dml_plpr.fit()
    res_wd[i, 0] = dml_plpr.coef[0]
    res_wd[i, 1] = dml_plpr.coef[0] - theta
    confint = dml_plpr.confint()
    res_wd[i, 2] = (confint['2.5 %'].iloc[0] <= theta) & (confint['97.5 %'].iloc[0] >= theta)


pd.DataFrame(np.vstack([res_cre_general.mean(axis=0), res_cre_normal.mean(axis=0), 
                    res_fd.mean(axis=0), res_wd.mean(axis=0)]), 
                    columns=['Coef', 'Bias', 'Coverage'], 
                    index=['CRE general', 'CRE normal', 
                            'FD exact', 'WG approx'])

Processing: 100.0 %

Unnamed: 0,Coef,Bias,Coverage
CRE general,0.498318,-0.001682,0.94
CRE normal,0.497383,-0.002617,0.96
FD exact,0.494321,-0.005679,0.93
WG approx,0.497754,-0.002246,0.95
