# Causal Effect Estimation

### Generate data

We start by generating data from the following SCM

In [None]:
set.seed(1)

generate_data <- function(){
    n <- 200
    V <- rbinom(n, 1, 0.2)
    W <- 3*V + rnorm(n)
    X <- V + rnorm(n)
    Y <- X + W^2 + 1 + rnorm(n)
    Z <- X + Y + rnorm(n)
    data.obs <- data.frame(V=V, W=W, X=X, Y=Y, Z=Z)
    return(data.obs)
}

data.obs <- generate_data()

# Visualize data set
pairs(data.obs)

Assume now that we know the causal ordering induced by the SCM and that
* $X$ is a treatment variable,
* $Y$ is the response and
* $(V, W, Z)$ are additional covariates.

Furthermore we will assume a partially linear outcome model, i.e., 
$$Y = \theta X + g(V, W) + \epsilon\quad \text{with}\quad\mathbb{E}[\epsilon\mid X, V, W]=0.$$

We are interested in estimating the causal effect of $X$ on $Y$, corresponding to the parameter $\theta$ in the partially linear model.

### Confounding and selection bias

Ignoring the causal structure can lead to wrong conclusions. In the following exercise, we will see the two most important types of bias that may occur:
* **Confounding bias:** Bias arising because of unaccounted variables that have an effect on both treatment and response.
* **Selection bias:** Bias arising due to conditioning on descendents of the response. This can occur either if we only observe a subset of the entire sample or if we mistakenly include a descendent of the response in the outcome model.

#### Exercise 1

**(a)** Below we fitted several different outcome models. Compare the resulting coefficients for $X$. Which regressions appear to lead to unbiased estimates of the causal effect?

In [None]:
library(gam)

# linear model of Y on X
lin_YX <- lm(Y ~ X, data=data.obs)
# linear model of Y on X and V
lin_YV <- lm(Y ~ X + V, data=data.obs)
# linear model Y on X and W
lin_YW <- lm(Y ~ X + W, data=data.obs)
# gam model of Y on X and s(W)
gam_YW <- gam(Y ~ X + s(W), data=data.obs)
# gam model of Y on X, V and s(W)
gam_YVW <- gam(Y ~ X + V + s(W), data=data.obs)
# gam model of Y on X, V, s(W), s(Z)
gam_YVWZ <- lm(Y ~ X + V + W + Z, data=data.obs)

# Print each model
results = list(linear_X = unname(coefficients(lin_YX)['X']),
               linear_V = unname(coefficients(lin_YV)['X']),
               linear_W = unname(coefficients(lin_YW)['X']),
               gam_W = unname(coefficients(gam_YW)['X']),
               gam_VW = unname(coefficients(gam_YVW)['X']),
               gam_VWZ = unname(coefficients(gam_YVWZ)['X']))
results

**(b)** List all valid adjustment sets for this causal structure.

The valid adjustment sets are: $(X, V)$, $(X, W)$ and $(X, V, W)$

**(c)** Assume now that you only have access to the subset $\texttt{data.cond}$ constructed below. Use a gam regression Y ~ X + s(W) to estimate the causal effect. What do you observe?

In [None]:
data.cond = data.obs[data.obs$Z<1,]

In [None]:
# Fit outcome model
gam_YW <- gam(Y ~ X + s(W), data=data.cond)
print(unname(coefficients(gam_YW)['X']))

Since the data set $\texttt{data.cond}$ was constructed by conditioning on the variable $Z$, which is a a descendent of the response, the causal effect estimate has a selection bias.

### Outcome model estimation (aka Adjustment)

In [None]:
library(gam)

In [None]:
# Fit outcome model
gamfit <- gam(Y ~ X + s(W), data=data.obs)
summary(gamfit)

# Extract estimator
ate_outcome <- unname(coefficients(gamfit)['X'])
coefficients(gamfit)

### Propensity score matching

In [None]:
library(MatchIt)

# Create binary treatment (more complicated matching procedures also exist of continuous responses)
data.matching <- data.obs
T <- as.numeric(X > median(X))
upperT <- mean(X[T == 1])
lowerT <- mean(X[T == 0])
adjust_factor <- upperT-lowerT
data.matching$T <- T
print(adjust_factor)

lmfit <- lm(Y ~ T, data = data.matching)
coefficients(lmfit)['T']/adjust_factor

In [None]:
# Create a matching object without matching to check if confounding exists
match0 <- matchit(T ~ V, data = data.matching,
                  method = NULL, distance = "glm")
summary(match0)
plot(match0, type="density")

In [None]:
# Match and check if its better
match1 <- matchit(T ~ V, data = data.matching,
                  method = "nearest", distance = "glm")
plot(match1, type="density")

In [None]:
# Match with a different method and check if its better
match2 <- matchit(T ~ V, data = data.matching,
                  method = "cem", distance = "glm")
plot(match2, type="density")
summary(match2)

In [None]:
# Create matched data
data.matched <- match.data(match2)

# Fit outcome model
gamfit_matched <- gam(Y ~ T, data = data.matched, weights=weights)
summary(gamfit_matched)
ate_matching <- unname(coefficients(gamfit_matched)['T'])/adjust_factor

### Propensity score weighting (aka Inverse propability weighting)

In [None]:
library(WeightIt)

In [None]:
weight0 <- weightit(X ~ V + W, data = data.obs, estimand = "ATE", method = "glm")
weight0
summary(weight0)
hist(weight0$weights)

In [None]:
# Fit outcome model
gamfit_weighted <- gam(Y ~ X, data = data.obs, weights=weight0$weights)
summary(gamfit_weighted)
ate_weighting <- unname(coefficients(gamfit_weighted)['X'])

### Double ML

In [None]:
library(DoubleML)
data.obs$T <- NULL
data.obs$Z <- NULL

In [None]:
# Format the data (this object encodes the causal structure)
obj_dml_data = DoubleMLData$new(data.obs, y_col = "Y", d_cols = "X")
obj_dml_data

In [None]:
# Initailize the ML learners (using mlr3 package and its extensions)
library(mlr3)
library(mlr3learners)

# Suppress output during estimation
lgr::get_logger("mlr3")$set_threshold("warn")

# Learner for Y given covariates (V, W)
ml_l = lrn("regr.ranger", num.trees = 100, mtry = 2, min.node.size = 2, max.depth = 5)
#ml_l = lrn("regr.lm")
# Learner for X given covariates (V, W)
ml_m = lrn("regr.ranger", num.trees = 100, mtry = 2, min.node.size = 2, max.depth = 5)
#ml_m = lrn("regr.lm")
# Learner for Y-\theta X given covariates (V, W) - only needed for score=="IV-type"
# ml_g = lrn("regr.ranger", num.trees = 100, mtry = 2, min.node.size = 2, max.depth = 5)

In [None]:
# Setup DML task
doubleml_plr = DoubleMLPLR$new(obj_dml_data,
                               ml_l, ml_m,
                               n_folds = 2,
                               score = "partialling out")

In [None]:
# Fit DML
doubleml_plr$fit()
doubleml_plr$summary()
ate_dml <- unname(doubleml_plr$all_coef[1])

### Compare all estimators

In [None]:
ate_estimates <- data.frame("Outcome model"=ate_outcome,
                            "Matching"=ate_matching,
                            "Weighting"=ate_weighting,
                            "DML"=ate_dml)

In [None]:
print(ate_estimates)