# R: DoubleML for Difference-in-Differences

In this example, we demonstrate, how `DoubleML` can be used in combination with the [`did` package for R](https://bcallaway11.github.io/did/index.html) in order to estimate group-time average treatment effects in difference-in-difference (DiD) models with multiple periods.

In [None]:
library(DoubleML)
library(did)
library(mlr3learners)

set.seed(1234)
options(warn=-1)

## Demo example from `did`

We will demonstrate the use of `DoubleML` for DiD in the [introductory example](https://bcallaway11.github.io/did/articles/did-basics.html) of the `did` package. 

In [None]:
# Generate data, original code available at https://github.com/bcallaway11/did/blob/master/vignettes/did-basics.Rmd
time.periods <- 4
sp <- reset.sim()
sp$te <- 0

set.seed(1814)

# generate dataset with 4 time periods
time.periods <- 4

# add dynamic effects
sp$te.e <- 1:time.periods

# generate data set with these parameters
# here, we dropped all units who are treated in time period 1 as they do not help us recover ATT(g,t)'s.
dta <- build_sim_dataset(sp)

# How many observations remained after dropping the ``always-treated'' units
nrow(dta)
#This is what the data looks like
head(dta)

In [None]:
n <- 10000
decision_effect <- -2
instrument_effect <- 0.7

confounder <- rbinom(n, 1, 0.3)
instrument <- rbinom(n, 1, 0.5)
decision <- as.numeric(runif(n) <= instrument_effect*instrument + 0.4*confounder)
outcome <- 30 + decision_effect*decision + 10 * confounder + rnorm(n, sd=2)
df <- data.frame(instrument, decision, outcome)

### Comparison to `did` package

By default, estimation in `did` is based on (unpenalized) linear and logistic regression. Let's start with this default model first.

In [None]:
# estimate group-time average treatment effects using att_gt method
example_attgt <- att_gt(yname = "Y",
                        tname = "period",
                        idname = "id",
                        gname = "G",
                        xformla = ~X,
                        data = dta
                        )

# summarize the results
summary(example_attgt)

### Using ML for DiD: Integrating `DoubleML` in `did`

As described in our [Section on DiD models in the user guide](https://docs.doubleml.org/stable/guide/models.html#difference-in-differences-models-did), [Sant'Anna and Zhao (2020)](https://linkinghub.elsevier.com/retrieve/pii/S0304407620301901) have developed a doubly robust DiD model which is compatible with ML-based estimation. As this doubly robust model is internally used in `did`, it is possible to use `DoubleML` here to obtain valid point estimates and confidence intervals. For this, we need to write a wrapper around a `DoubleMLIRM` model and pass it to `did` as a custom estimation approach. Once this is implemented, we can use all the nice features and advantages of the `did` package.

In [None]:
# DoubleML wrapper for did

doubleml_did <- function(y1, y0, D, covariates, ml_g, ml_m, n_folds = 10) {
  # Compute difference in outcomes
  delta_y <- y1 - y0
  # Prepare data backend
  data = data.frame(delta_y, D, covariates)
  dml_data = DoubleMLData(data, y_col = "delta_y", d_cols = "D", covariates = names(covariates))
  # Compute the ATT
  dml_obj = DoubleMLIRM(dml_data, ml_g = ml_g, ml_m = ml_m, score = "ATTE", n_folds = n_folds)
  dml_obj$fit()
  att = dml_obj$coef[1]
  # Return results
  # Hier müssen wir noch die Dimensionen von psi anpassen, sodass die Werte als einfacher Vector übergeben werden
  inf.func <- dml_obj$psi
  output <- list(ATT = att, att.inf.func = inf.func)
  return(output)
}

In [None]:
# get data in a way that is used internally in 

# double check how covariates are passed through and handled internally