<a href="https://colab.research.google.com/github/POLSEAN/XTDML/blob/main/examples/01_xtdml_for_cre.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **DML for panel data: CRE approach**

---

*Description*

Estimation of the structural parameter using double machine learning (DML) with partially linear regression (PLR) models in the context of panel data with fixed effects as in Clarke and Poselli(2023).

The package `XTDML` allows the estimation of the nuisance functions by machine learning methods and  the computation of the Neyman orthogonal score functions. `XTDML` is built on the CRAN package `DoubleML` (Bach et al., 2024), which uses the `mlr3` ecosystem and the `R6` package.


*References*

[1] Bach, P., Chernozhukov, V., Kurz, M. S., Spindler, M. and Klaassen, S. (2024), DoubleML - An Object-Oriented Implementation of Double Machine Learning in R, *Journal of Statistical Software*, 108(3):1-56.

[2] Chernozhukov, V., Chetverikov, D., Demirer, M., Duflo, E., Hansen, C., Newey, W., and Robins, J. (2018). Double/debiased machine learning for treatment and structural parameters. *The Econometrics Journal*, 21(1):C1-C68.

[3] Clarke, P. and Polselli, A. (2023). Double machine learning for static panel models with fixed effects. *arXiv preprint*, arXiv:2312.08174.

[4] Mundlak, Y. (1978). On the pooling of time series and cross section data. *Econometrica*, pages 69-85.

In [None]:


*Overview*

1. Installation of XTDML and other R packages
2. Loading the data
3. Data management: inclusion of individual means
4. Set up of DML data environment
5. Set up of DML estimation environment
6. Extraction of DML estimates

### **The Installation of `XTDML` package**
The `XTDML` package can be installed following either options below:

1. **Installation directly from GitHub:**
  ```
    #install.packages("devtools")
    library(devtools)

    install_github("POLSEAN/XTDML")
    library(XTDML)
  ```
  *Note this code works **ONLY with RStudio (desktop)**, but not with online platforms such as Google Colab or Kaggle.*


2. **Download all folders in `XTDML`** from `https://github.com/POLSEAN/XTDML` pressing `<> CODE > Download ZIP`. Rename the downloaded .zip folder as `XTDML`, and upload it on Google Colab. Get the path and run the code `!unzip XTDML.zip` in Python, then change the RUNTIME to R and run
   ```
    #install.packages("devtools")
    library(devtools)

    wd = "~ your-directory/XTDML"
    devtools::load_all(wd)
   ```

For illustration purposes on Google Colab, we follow the second approach, but the first is recommended with RStudio (desktop).

**Set RUNTIME > CHANGE RUNTIME TYPE > Python 3**

The code below unzips the XTDML.zip folder that you have previously uploaded.

In [None]:
!unzip XTDML.zip

Archive:  XTDML.zip
 extracting: XTDML/.gitignore        
  inflating: XTDML/.Rbuildignore     
  inflating: XTDML/.RData            
  inflating: XTDML/.Rhistory         
   creating: XTDML/.Rproj.user/
   creating: XTDML/.Rproj.user/22C44D20/
   creating: XTDML/.Rproj.user/22C44D20/bibliography-index/
 extracting: XTDML/.Rproj.user/22C44D20/cpp-definition-cache  
   creating: XTDML/.Rproj.user/22C44D20/ctx/
   creating: XTDML/.Rproj.user/22C44D20/explorer-cache/
   creating: XTDML/.Rproj.user/22C44D20/pcs/
  inflating: XTDML/.Rproj.user/22C44D20/pcs/files-pane.pper  
 extracting: XTDML/.Rproj.user/22C44D20/pcs/source-pane.pper  
  inflating: XTDML/.Rproj.user/22C44D20/pcs/windowlayoutstate.pper  
  inflating: XTDML/.Rproj.user/22C44D20/pcs/workbench-pane.pper  
   creating: XTDML/.Rproj.user/22C44D20/presentation/
   creating: XTDML/.Rproj.user/22C44D20/profiles-cache/
 extracting: XTDML/.Rproj.user/22C44D20/rmd-outputs  
 extracting: XTDML/.Rproj.user/22C44D20/saved_source_markers  

**From now set RUNTIME > CHANGE RUNTIME TYPE > R**

In [None]:
# 1. Install and import R packages
# Install packages
list.of.packages <- c("datawizard","mlr3","mlr3learners","mlr3tuning","paradox","xgboost","ranger","glmnet","MLmetrics","devtools")
new.packages <- list.of.packages[!(list.of.packages %in% installed.packages()[,"Package"])]
if(length(new.packages)) install.packages(new.packages, repos = "http://cran.us.r-project.org")

# Load general packages
library(devtools)
library(checkmate)
library(dplyr)
library(tibble)  ##for add_column()
library(datawizard)
library(data.table)

# ML packages
library(mlr3)
library(mlr3learners)
library(rpart)
library(xgboost)
library(ranger)
library(glmnet)

# Packages for HP tuning
library(mlr3misc)
library(mlr3tuning)
library(paradox)
library(MLmetrics)

# Suppress error messages from ML packages
lgr::get_logger("bbotk")$set_threshold("warn")
lgr::get_logger("mlr3")$set_threshold("warn")

Installing packages into ‘/usr/local/lib/R/site-library’
(as ‘lib’ is unspecified)

also installing the dependencies ‘bitops’, ‘gtools’, ‘caTools’, ‘globals’, ‘listenv’, ‘PRROC’, ‘iterators’, ‘gplots’, ‘insight’, ‘checkmate’, ‘future’, ‘future.apply’, ‘lgr’, ‘mlbench’, ‘mlr3measures’, ‘mlr3misc’, ‘parallelly’, ‘palmerpenguins’, ‘bbotk’, ‘RcppEigen’, ‘foreach’, ‘shape’, ‘ROCR’


Loading required package: usethis


Attaching package: ‘dplyr’


The following object is masked from ‘package:MASS’:

    select


The following objects are masked from ‘package:stats’:

    filter, lag


The following objects are masked from ‘package:base’:

    intersect, setdiff, setequal, union



Attaching package: ‘datawizard’


The following object is masked from ‘package:mvtnorm’:

    standardize



Attaching package: ‘data.table’


The following objects are masked from ‘package:dplyr’:

    between, first, last



Attaching package: ‘xgboost’


The following object is masked from ‘package:dplyr’:

    

In [None]:
# Additional package required to install XTDML (not always necessary, depends on the R version)
list.of.packages <- c("mvtnorm","clusterGeneration","readstata13")
new.packages <- list.of.packages[!(list.of.packages %in% installed.packages()[,"Package"])]
if(length(new.packages)) install.packages(new.packages, repos = "http://cran.us.r-project.org")

library(mvtnorm)
library(clusterGeneration)
library(readstata13)

Installing packages into ‘/usr/local/lib/R/site-library’
(as ‘lib’ is unspecified)

Loading required package: MASS



In [None]:
# Install package
wd = "/content/XTDML"
devtools::load_all(wd)

[1m[22m[36mℹ[39m Loading [34mXTDML[39m


### **The Data**

We use simulated data for DGP3 as in Clarke and Polselli (2023). We use a subsample (N=250) of the original dataset (with N=1,000,000), where each unit is observed over $T=10$ periods.

In this dataset, the nuisance functions are generated as follows

\begin{align*}
    m(x_{it}) & = a \, (x_{it,1}\cdot 1[x_{it,1}>0]) + b \, (x_{it,1}\cdot x_{it,3})\\
    l(x_{it}) & = b \, (x_{it,1}\cdot x_{it,3}) + a \, (x_{it,3}\cdot 1[x_{it,3}>0])
\end{align*}

where $a=0.25$ and $b=0.5$.

* $y_{it}$ is the continuous treatment.
* $d_{it}$ is the continuous treatment variable.
* $\mathbf{x}_{it} = (x_{it,1}, \dots, x_{it,p}, \overline{x}_{i,1}, \dots, \overline{x}_{i,p})'$ are the set of $p=30$ control variables, but only $s=2$ are relevant; $\overline{x}_{i,k} = T^{-1}\sum_{t=1}^Tx_{it,k}$ is the individual mean of variable $p$.
* $\overline{d}_{i}$ is the mean of the treatment variable.



In [None]:
# 2. Load simulated data from GitHub
# The dataset already includes the individual means (m_x)
df = read.csv("https://raw.githubusercontent.com/POLSEAN/XTDML/main/data/dgp4_cre_short.csv")
names(df)

**N.B.** If the dataset does *not* include the individual means of each variable (excluding the outcome variable), these needs to be generated. In this case the dataset already includes the means of the variables (labelled `m_`).

A sample code to calculate the means and add them to the dataset is shown below:

```
# Get the names of the covariates
x_vars = paste0("x", 1:30)

# Calculate the individual means for {X,D}
df = df %>%
       group_by(id) %>%
       mutate(across(c(x,d), ~  mean(.x), .names = "m_{col}"))
```

## **Estimation and inference with DML for CRE**

The section below consists in setting up the DML data and estimation environments, and proceed with the actual estimation.

### **3. Set up DML data environment**
Initalization of `dml_cre_data`  from `data.frame`. Arguments to pass:

```
dml_approx_data_from_data_frame(data,
                  x_cols = NULL,
                  y_col = NULL,
                  d_cols = NULL,
                  xbar_cols = NULL,
                  dbar_cols = NULL,
                  cluster_cols = NULL
                  )

```              

In [None]:
# 3. Set up DML data environment
x_cols <- paste0("x", 1:30)
xbar_cols <- paste0("m_x", 1:30)

# set up data for DML procedure
obj_dml_data = dml_cre_data_from_data_frame(df,
                            x_cols = x_cols,  y_col = "y", d_cols = "d",
                            xbar_cols = xbar_cols, dbar_cols = "m_d",
                            cluster_cols = "id")
obj_dml_data$print()



------------------ Data summary ------------------
Outcome variable: y
Treatment variable(s): d
Cluster variable(s): id
Covariates: x1, x2, x3, x4, x5, x6, x7, x8, x9, x10, x11, x12, x13, x14, x15, x16, x17, x18, x19, x20, x21, x22, x23, x24, x25, x26, x27, x28, x29, x30
Mean covariates: m_x1, m_x2, m_x3, m_x4, m_x5, m_x6, m_x7, m_x8, m_x9, m_x10, m_x11, m_x12, m_x13, m_x14, m_x15, m_x16, m_x17, m_x18, m_x19, m_x20, m_x21, m_x22, m_x23, m_x24, m_x25, m_x26, m_x27, m_x28, m_x29, m_x30, m_d
No. Observations: 2500


### **4. Set up DML estimation environment**

Arguments to pass in `dml_approx_plr` function that Creates a new instance of this R6 class.

```
 dml_approx_plr$new(data,
      ml_l,
      ml_m,
      ml_g = NULL,
      ml_lbar = NULL,
      ml_mbar = NULL,
      ml_gbar = NULL,
      n_folds = 5,
      n_rep = 1,
      score = "orth-PO",                 # or "orth-IV"
      dml_procedure = "dml2",            # or "dml1"
      dml_approach  = "cre",             # or "hybrid"
      dml_type      = "non-separable",   # or "separable"
      dml_transform = "wg",              # or "fd"
      draw_sample_splitting = TRUE,
      apply_cross_fitting = TRUE
      )

```
We use four base learners: regression tree, random forest, gradient boosting, and Lasso with a dictionary of nonlinear terms.



**4.1 CART for learning nuisance parameters**

In [None]:
# 4. Set up DML estimation environment
set.seed(1408)
learner = lrn("regr.rpart")
ml_l = learner$clone()
ml_m = learner$clone()

dml_rpart = dml_cre_plr$new(obj_dml_data, ml_l = ml_l, ml_m = ml_m)

# set up a list of parameter grids
param_grid = list("ml_l" = ps(cp = p_dbl(lower = 0.001, upper = 0.02),
                              maxdepth = p_int(lower = 2, upper = 10)),
                  "ml_m" = ps(cp = p_dbl(lower = 0.001, upper = 0.02),
                              maxdepth = p_int(lower = 2, upper = 10)))

tune_settings = list(terminator = mlr3tuning::trm("evals", n_evals = 10),
                      algorithm = tnr("grid_search"), resolution = 20)

dml_rpart$tune(param_set = param_grid, tune_settings = tune_settings)

# Estimate target/causal parameter
dml_rpart$fit()
dml_rpart$print()
print(dml_rpart$params)

TuningInstanceSingleCrit is deprecated. Use TuningInstanceBatchSingleCrit instead.

TuningInstanceSingleCrit is deprecated. Use TuningInstanceBatchSingleCrit instead.






------------------ Data summary ------------------
Outcome variable: y
Treatment variable: d
Covariates: x1, x2, x3, x4, x5, x6, x7, x8, x9, x10, x11, x12, x13, x14, x15, x16, x17, x18, x19, x20, x21, x22, x23, x24, x25, x26, x27, x28, x29, x30
Means of covariates: m_x1, m_x2, m_x3, m_x4, m_x5, m_x6, m_x7, m_x8, m_x9, m_x10, m_x11, m_x12, m_x13, m_x14, m_x15, m_x16, m_x17, m_x18, m_x19, m_x20, m_x21, m_x22, m_x23, m_x24, m_x25, m_x26, m_x27, m_x28, m_x29, m_x30, m_d
Cluster variables: id
No. Observations: 2500
No. Groups: 250

------------------ Score & algorithm ------------------
Score function: orth-PO
DML algorithm: dml2
DML approach: cre
DML approach type: non-separable

------------------ Machine learner ------------------
Learner of nuisance ml_l: regr.rpart
RMSE of nuisance ml_l : 7.36675
Learner of nuisance ml_m: regr.rpart
RMSE of nuisance ml_m : 6.42942
Model RMSE: 22.27747

------------------ Resampling ------------------
No. folds: 5
No. folds per cluster: 5
No. repeate

**4.2 Random forest for learning nuisance parameters**

In [None]:
# RF
set.seed(1408)

vars = c(x_cols,xbar_cols)
K = length(vars)

learner = lrn("regr.ranger",  mtry = K, num.trees = 100)  # better 1000 but requires more computational time
ml_l = learner$clone()
learner = lrn("regr.ranger",  mtry = K+1, num.trees =  100)
ml_m = learner$clone()

# Set up DML environment
dml_rf = dml_cre_plr$new(obj_dml_data,
                          ml_l = ml_l, ml_m = ml_m,
                          score = "orth-PO",
                          dml_procedure = "dml2",
                          dml_approach  = "cre")

# Hyperparameter tuning
param_grid = list("ml_l" = ps(max.depth = p_int(lower = 2, upper = 20)),
                  "ml_m" = ps(max.depth = p_int(lower = 2, upper = 20)))

tune_settings = list(terminator = mlr3tuning::trm("evals", n_evals = 10),
                      algorithm = tnr("grid_search"), resolution = 20)

dml_rf$tune(param_set = param_grid, tune_settings = tune_settings)

# Estimate target/causal parameter
dml_rf$fit()
dml_rf$print()
print(dml_rf$params)

TuningInstanceSingleCrit is deprecated. Use TuningInstanceBatchSingleCrit instead.

TuningInstanceSingleCrit is deprecated. Use TuningInstanceBatchSingleCrit instead.






------------------ Data summary ------------------
Outcome variable: y
Treatment variable: d
Covariates: x1, x2, x3, x4, x5, x6, x7, x8, x9, x10, x11, x12, x13, x14, x15, x16, x17, x18, x19, x20, x21, x22, x23, x24, x25, x26, x27, x28, x29, x30
Means of covariates: m_x1, m_x2, m_x3, m_x4, m_x5, m_x6, m_x7, m_x8, m_x9, m_x10, m_x11, m_x12, m_x13, m_x14, m_x15, m_x16, m_x17, m_x18, m_x19, m_x20, m_x21, m_x22, m_x23, m_x24, m_x25, m_x26, m_x27, m_x28, m_x29, m_x30, m_d
Cluster variables: id
No. Observations: 2500
No. Groups: 250

------------------ Score & algorithm ------------------
Score function: orth-PO
DML algorithm: dml2
DML approach: cre
DML approach type: non-separable

------------------ Machine learner ------------------
Learner of nuisance ml_l: regr.ranger
RMSE of nuisance ml_l : 4.93698
Learner of nuisance ml_m: regr.ranger
RMSE of nuisance ml_m : 3.24057
Model RMSE: 9.98544

------------------ Resampling ------------------
No. folds: 5
No. folds per cluster: 5
No. repeat

**4.3 Gradient boosting for learning nuisance parameters**

In [None]:
# XGBOOST
set.seed(1408)

learner = lrn("regr.xgboost", nrounds = 100) # better 1000 but requires more computational time
ml_m = learner$clone()
ml_l = learner$clone()

dml_xgboost = dml_cre_plr$new(obj_dml_data,
                          ml_l = ml_l, ml_m = ml_m,
                          score = "orth-PO",
                          dml_procedure = "dml2",
                          dml_approach  = "cre")

dml_obj = dml_cre_plr$new(obj_dml_data,
                        ml_l = ml_l, ml_m = ml_m)


# ## Hyperparameter tuning
param_grid = list("ml_l" = ps(max_depth = p_int(lower = 2, upper = 10),
                              lambda = p_dbl(lower = 0, upper = 2)),
                  "ml_m" = ps(max_depth = p_int(lower = 2, upper = 10),
                              lambda = p_dbl(lower = 0, upper = 2)))

tune_settings = list(terminator = mlr3tuning::trm("evals", n_evals = 10),
                      algorithm = tnr("grid_search"), resolution = 20)

dml_xgboost$tune(param_set = param_grid, tune_settings = tune_settings)

# Estimate target/causal parameter
dml_xgboost$fit()
dml_xgboost$print()
print(dml_xgboost$params)

TuningInstanceSingleCrit is deprecated. Use TuningInstanceBatchSingleCrit instead.

TuningInstanceSingleCrit is deprecated. Use TuningInstanceBatchSingleCrit instead.






------------------ Data summary ------------------
Outcome variable: y
Treatment variable: d
Covariates: x1, x2, x3, x4, x5, x6, x7, x8, x9, x10, x11, x12, x13, x14, x15, x16, x17, x18, x19, x20, x21, x22, x23, x24, x25, x26, x27, x28, x29, x30
Means of covariates: m_x1, m_x2, m_x3, m_x4, m_x5, m_x6, m_x7, m_x8, m_x9, m_x10, m_x11, m_x12, m_x13, m_x14, m_x15, m_x16, m_x17, m_x18, m_x19, m_x20, m_x21, m_x22, m_x23, m_x24, m_x25, m_x26, m_x27, m_x28, m_x29, m_x30, m_d
Cluster variables: id
No. Observations: 2500
No. Groups: 250

------------------ Score & algorithm ------------------
Score function: orth-PO
DML algorithm: dml2
DML approach: cre
DML approach type: non-separable

------------------ Machine learner ------------------
Learner of nuisance ml_l: regr.xgboost
RMSE of nuisance ml_l : 5.32430
Learner of nuisance ml_m: regr.xgboost
RMSE of nuisance ml_m : 3.52113
Model RMSE: 11.82180

------------------ Resampling ------------------
No. folds: 5
No. folds per cluster: 5
No. rep

**4.4 LASSO for learning nuisance parameters**

In [None]:
## __________________________________________________________________________
## Polynomial expansion (for Lasso with extensive dictionary)
## __________________________________________________________________________
polyexp = function(df){
  df.polyexp = df
  colnames = colnames(df)
  for (i in 1:ncol(df)){
    for (j in i:ncol(df)){

      colnames = c(colnames, paste0(names(df)[i],'.',names(df)[j]))
      df.polyexp = cbind(df.polyexp, df[,i]*df[,j])
    }
    colnames = c(colnames,paste0(names(df)[i],'.',names(df)[i],'.',names(df)[i]))
    df.polyexp = cbind(df.polyexp, df[,i]*df[,i]*df[,i])

  }

  names(df.polyexp) = colnames
  return(df.polyexp)
}

In [None]:
install.packages("tidytable")
library(tidytable)

Installing package into ‘/usr/local/lib/R/site-library’
(as ‘lib’ is unspecified)

This can lead to most dplyr functions being overwritten by tidytable functions.


Attaching package: ‘tidytable’


The following object is masked from ‘package:MASS’:

    select


The following objects are masked from ‘package:mlr3misc’:

    cross_join, enframe, map, map_chr, map_dbl, map_int, map_lgl, pmap,
    pmap_chr, pmap_dbl, pmap_int, pmap_lgl, unnest, walk


The following object is masked from ‘package:xgboost’:

    slice


The following objects are masked from ‘package:data.table’:

    %notin%, between, first, fread, last


The following objects are masked from ‘package:tibble’:

    enframe, tribble


The following objects are masked from ‘package:dplyr’:

    across, add_count, add_tally, anti_join, arrange, between,
    bind_cols, bind_rows, c_across, case_match, case_when, coalesce,
    consecutive_id, count, cross_join, cume_dist, cur_column, cur_data,
    cur_group_id, cur_group_rows, 

In [None]:
# Create dictionary of non linear terms
xlist <- paste0("x", 1:30)
m_xlist <- paste0("m_x", 1:30)

##lasso-augment
dta_x = as.data.frame(select(df, xlist))
dta_xbar = as.data.frame(select(df, m_xlist))
dta2_x = polyexp(dta_x)
dta2_xbar = polyexp(dta_xbar)
aa = as.data.frame(c(dta2_x,dta2_xbar))

aa$y = df$y
aa$id= df$id
aa$time = df$time
aa$d = df$d
aa$m_d = df$m_d

df2 <- aa %>% select(c(id,time,y,d,m_d), everything())

# Replace NAs with 0s
df2[is.na(df2)] <- 0

x_cols = names(select(df2, starts_with("x")))
xbar_cols = names(select(df2, starts_with("m_x")))


“[1m[22mUsing an external vector in selections was deprecated in tidyselect 1.1.0.
[36mℹ[39m Please use `all_of()` or `any_of()` instead.
  # Was:
  data %>% select(xlist)

  # Now:
  data %>% select(all_of(xlist))

See <https://tidyselect.r-lib.org/reference/faq-external-vector.html>.”
“[1m[22mUsing an external vector in selections was deprecated in tidyselect 1.1.0.
[36mℹ[39m Please use `all_of()` or `any_of()` instead.
  # Was:
  data %>% select(m_xlist)

  # Now:
  data %>% select(all_of(m_xlist))

See <https://tidyselect.r-lib.org/reference/faq-external-vector.html>.”


In [None]:
library(glmnet)

Loading required package: Matrix


Attaching package: ‘Matrix’


The following object is masked from ‘package:tidytable’:

    expand


Loaded glmnet 4.1-8



In [None]:
# Dimensions of augmented dataset for LAsso
dim(df2)

In [None]:
set.seed(1408)

# set up data for DML procedure
obj_dml_data_lasso = dml_cre_data_from_data_frame(df2,
                            x_cols = x_cols,  y_col = "y", d_cols = "d",
                            xbar_cols = xbar_cols, dbar_cols = "m_d",
                            cluster_cols = "id")
# Choose CV-LASSO
learner = lrn("regr.cv_glmnet", s="lambda.min")
ml_m = learner$clone()
ml_l = learner$clone()

# Initialize DML
dml_lasso = dml_cre_plr$new(obj_dml_data_lasso,
                          ml_l = ml_l, ml_m = ml_m)

# Estimate target/causal parameter
dml_lasso$fit()
dml_lasso$print()
print(dml_lasso$params)

No parameters provided for learners. Default values are used.

No parameters provided for learners. Default values are used.






------------------ Data summary ------------------
Outcome variable: y
Treatment variable: d
Covariates: x1, x2, x3, x4, x5, x6, x7, x8, x9, x10, x11, x12, x13, x14, x15, x16, x17, x18, x19, x20, x21, x22, x23, x24, x25, x26, x27, x28, x29, x30, x1.x1, x1.x2, x1.x3, x1.x4, x1.x5, x1.x6, x1.x7, x1.x8, x1.x9, x1.x10, x1.x11, x1.x12, x1.x13, x1.x14, x1.x15, x1.x16, x1.x17, x1.x18, x1.x19, x1.x20, x1.x21, x1.x22, x1.x23, x1.x24, x1.x25, x1.x26, x1.x27, x1.x28, x1.x29, x1.x30, x1.x1.x1, x2.x2, x2.x3, x2.x4, x2.x5, x2.x6, x2.x7, x2.x8, x2.x9, x2.x10, x2.x11, x2.x12, x2.x13, x2.x14, x2.x15, x2.x16, x2.x17, x2.x18, x2.x19, x2.x20, x2.x21, x2.x22, x2.x23, x2.x24, x2.x25, x2.x26, x2.x27, x2.x28, x2.x29, x2.x30, x2.x2.x2, x3.x3, x3.x4, x3.x5, x3.x6, x3.x7, x3.x8, x3.x9, x3.x10, x3.x11, x3.x12, x3.x13, x3.x14, x3.x15, x3.x16, x3.x17, x3.x18, x3.x19, x3.x20, x3.x21, x3.x22, x3.x23, x3.x24, x3.x25, x3.x26, x3.x27, x3.x28, x3.x29, x3.x30, x3.x3.x3, x4.x4, x4.x5, x4.x6, x4.x7, x4.x8, x4.x9, x4.x10,

### **5. Extract DML estimates and compare them**

In [None]:
# 6. Display table that compares results
library(xtable)

table = matrix(0, 4, 6)
table[1,] = cbind(dml_rpart$coef_theta,dml_rpart$se_theta,dml_rpart$pval_theta,dml_rpart$model_rmse,as.numeric(dml_rpart$rmses["ml_l"]),as.numeric(dml_rpart$rmses["ml_m"]))
table[2,] = cbind(dml_rf$coef_theta,dml_rf$se_theta,dml_rf$pval_theta,dml_rf$model_rmse,as.numeric(dml_rf$rmses["ml_l"]),as.numeric(dml_rf$rmses["ml_m"]))
table[3,] = cbind(dml_xgboost$coef_theta, dml_xgboost$se_theta, dml_xgboost$pval_theta,dml_xgboost$model_rmse,as.numeric(dml_xgboost$rmses["ml_l"]),as.numeric(dml_xgboost$rmses["ml_m"]))
table[4,] = cbind(dml_lasso$coef_theta, dml_lasso$se_theta, dml_lasso$pval_theta,dml_lasso$model_rmse,as.numeric(dml_lasso$rmses["ml_l"]),as.numeric(dml_lasso$rmses["ml_m"]))

colnames(table)= c("Estimate", "Std. Error", "P-value", "Model RMSE", "MSE of l", "MSE of m")
rownames(table)= c("DML-CART", "DML-RF" , "DML-XGBOOST", "DML-LASSO")
tab = xtable(table)
tab

Unnamed: 0_level_0,Estimate,Std. Error,P-value,Model RMSE,MSE of l,MSE of m
Unnamed: 0_level_1,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
DML-CART,0.4481896,0.04535392,4.98038e-23,22.277467,7.366749,6.429422
DML-RF,1.1036854,0.07120158,3.4230190000000005e-54,9.985436,4.936983,3.240568
DML-XGBOOST,0.7686004,0.08556428,2.641436e-19,11.821798,5.324304,3.521133
DML-LASSO,0.4846126,0.08187007,3.233369e-09,7.21744,2.670172,1.576719
