# ----------------------- Econometrics 1 - TD 6 --------------------- #

In [1]:
# load necessary libraries
library(tidyverse)

# clean the environment
rm(list=ls())

# choose a seed to be able to reproduce the results
set.seed(2024)

# set the number of observations at 10,000
n <- 10000

# create an empty dataset to store the observations
data <- data.frame(i = 1:n)


── [1mAttaching core tidyverse packages[22m ──────────────────────── tidyverse 2.0.0 ──
[32m✔[39m [34mdplyr    [39m 1.1.4     [32m✔[39m [34mreadr    [39m 2.1.5
[32m✔[39m [34mforcats  [39m 1.0.0     [32m✔[39m [34mstringr  [39m 1.5.1
[32m✔[39m [34mggplot2  [39m 3.5.1     [32m✔[39m [34mtibble   [39m 3.2.1
[32m✔[39m [34mlubridate[39m 1.9.3     [32m✔[39m [34mtidyr    [39m 1.3.1
[32m✔[39m [34mpurrr    [39m 1.0.2     
── [1mConflicts[22m ────────────────────────────────────────── tidyverse_conflicts() ──
[31m✖[39m [34mdplyr[39m::[32mfilter()[39m masks [34mstats[39m::filter()
[31m✖[39m [34mdplyr[39m::[32mlag()[39m    masks [34mstats[39m::lag()
[36mℹ[39m Use the conflicted package ([3m[34m<http://conflicted.r-lib.org/>[39m[23m) to force all conflicts to become errors


6. Check through simulations (with the parameters µ = (0, 1)′, Σ = identity matrix and
c = 1.5) the value of δ in the two models.

In [2]:


# Q6. Simulations, delta value (ATE) ----------------------------------

## Q6.1. Simulation of potential outcomes ----

# Y(0) and Y(1) follow a univariate normal distribution

# Y(0) = outcome if not treated - N(0,1)
data$Y_0 <- rnorm(n, mean = 0, sd = 1)
# Y(1) = outcome if treated - N(1,1)
data$Y_1 <- rnorm(n, mean = 1, sd = 1)

## Q6.2. Average Treatment Effect (ATE) ----

# Delta = Y(1) - Y(0)
data <- data %>% mutate(Delta = Y_1 - Y_0)
summary(data$Delta)

# mean of Delta ATE = E[Y(1) - Y(0)]
delta <- mean(data$Delta)
cat("delta (ATE) :", round(delta,3), "\n") # 1.004

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 -3.812   0.024   1.012   1.004   1.965   6.251 

delta (ATE) : 1.004 


7. Estimate, through simulations again, the values of δT and β0 in Model 2. If time permits,
check that your simulations are correct by doing analytical computations.

In [5]:
# Q7. Simulations, delta_T (ATT) and beta_D values ------------------

## Q7.1. Definition of treatment and outcome variables ----

# Model 2: difference in potential outcomes above a threshold (c=1)

# define treated individuals: D = 1 if Delta > 1.5
data <- data %>% mutate(D = as.numeric(Delta > 1.5))

# compute the share of treated individuals
cat("Share of treated individuals: ", round(mean(data$D) * 100, 1), "%\n", sep="")  # 36.2%

# compute the observed outcome : Y = D*Y(1) + (1-D)*Y(0)
data <- data %>% mutate(Y = D * Y_1 + (1 - D) * Y_0)
summary(data$Y)



## Q7.2. Estimations of treatment effect ----

# (1) delta_T : Average Treatment Effect on Treated (ATT)
delta_T <- mean(data %>% filter(D==1) %>% pull(Delta))
cat("delta_T (ATT) :", round(delta_T,3), "\n")  # 2.483 

# (2) beta_d : Difference in outcome between treated and non treated
# compute the average of the outcome for the treated and for the non treated
av_y_treated <- mean(data %>% filter(D==1) %>% pull(Y))
av_y_nontreated <- mean(data %>% filter(D==0) %>% pull(Y))  
cat("Average outcome of treated:", round(av_y_treated,3), "\n") # 1.751
cat("Average outcome of non treated:", round(av_y_nontreated,3), "\n") # 0.405 

# compute the difference between the two groups
beta_D <- av_y_treated - av_y_nontreated
cat("beta_D (outcome difference) :", round(beta_D,3), "\n") # 1.346

# alternative method to compute beta_D: with a regression
# estimate the coefficient of D in the regression of Y
reg <- lm(Y ~ D, data=data)
beta_D_reg <- coef(reg)["D"]
cat("beta_D (via regression) :", round(beta_D_reg,3), "\n") # 1.346 

# (3) selection bias B = beta_D - delta_T
B <- beta_D - delta_T
cat("Selection bias B :", round(B,3), "\n")  # -1.137

Share of treated individuals: 36.2%


   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
-2.3784  0.1175  0.8692  0.8919  1.6475  4.4809 

delta_T (ATT) : 2.483 
Average outcome of treated: 1.751 
Average outcome of non treated: 0.405 
beta_D (outcome difference) : 1.346 
beta_D (via regression) : 1.346 
Selection bias B : -1.137 
