# Grupo 4 - Laboratorio 6 - R 

Katiuska Olivera Quevedo (20172533) \
Rosemery Fernandez Sanchez (20172635) \
Aliro Cabrera Florez (20152034) \
Jose Uscamayta Quispe (20195674)

## 2. Debiased Machine Learning

## Testing the Convergence Hypothesis


We consider the problem of estimating the effect of gun
ownership on the homicide rate. For this purpose, we estimate the following partially
linear model

$$
 Y_{j,t} = \beta D_{j,(t-1)} + g(Z_{j,t}) + \epsilon_{j,t}.
$$

## Data

$Y_{j,t}$ is growth rate in county $j$ at time $t$, $D_{j, t-1}$ is initial wealth in county $j$ at time $t-1$,  and  $Z_{j,t}$ is a set of characteristics of county $j$ at time $t$. The parameter $\beta$ is the effect of initial wealth on the
growth rate, controlling for county-level characteristics. 

In [1]:
library(hdm)
library(xtable)
library(hdm)

"package 'hdm' was built under R version 3.6.3"

In [2]:
# Export data to read in R
GrowthData <- GrowthData
save(GrowthData, file = "../data/GrowthData.RData")

data <- GrowthData
attach(data)
names(data)




### Preprocessing

Now, we construct the treatment variable ($D$ ) , the outcome variable ($Y$) and the matrix $Z$ that includes the control variables.

In [3]:
# Treatment Variable
d <- data.frame(data$gdpsh465)

# Outcome Variable
y <- data.frame(data$Outcome)

# Construct matrix Z

x1<- data[4:63]
x2<- data[2]
x<-c(x2,x1)
z<- data.frame(x)
dim(z)


We have in total 91 control variables. The control variables $Z_{j,t}$ are from the GrowhtData and contain  characteristics of these counties.

In [4]:
library(lfe)

"package 'lfe' was built under R version 3.6.3"Loading required package: Matrix


## The effect of initial wealth

### OLS

### OLS without including the country characteristics

After preprocessing the data, we first look at simple regression of $Y_{j,t}$ on $D_{j,t-1}$ without controls as a baseline model.

In [5]:
#ols standard errors
baseline.ols <- felm(Outcome ~ gdpsh465,data=data)
est_baseline <- summary(baseline.ols)$coef[2,]
confint(baseline.ols)[2,]
est_baseline

In [6]:
summary(baseline.ols)


Call:
   felm(formula = Outcome ~ gdpsh465, data = data) 

Residuals:
      Min        1Q    Median        3Q       Max 
-0.147387 -0.024088  0.001209  0.027721  0.139357 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.035207   0.047318   0.744    0.459
gdpsh465    0.001317   0.006102   0.216    0.830

Residual standard error: 0.05159 on 88 degrees of freedom
Multiple R-squared(full model): 0.0005288   Adjusted R-squared: -0.01083 
Multiple R-squared(proj model): 0.0005288   Adjusted R-squared: -0.01083 
F-statistic(full model):0.04656 on 1 and 88 DF, p-value: 0.8297 
F-statistic(proj model): 0.04656 on 1 and 88 DF, p-value: 0.8297 



In [7]:
confint(baseline.ols)[2,]

The point estimate is $0.001317$ with the confidence interval ranging from -0.0108 to 0.0134. This
suggests that increases in initial wealth are related to growth rates - if initial wealth increases by 1% relative
to a trend then the predicted growth rate goes up by 0.13%, without controlling for counties' characteristics.

We next include the controls. First, we estimate the model by ols and then by an array of the modern regression methods using the double machine learning approach.

### OLS including the country characteristics

In [8]:
control_formula <- as.formula(paste("Outcome", "~", paste("gdpsh465",paste(colnames(z),collapse="+"),
                                                          sep="+")))
control_formula

Outcome ~ gdpsh465 + intercept + bmp1l + freeop + freetar + h65 + 
    hm65 + hf65 + p65 + pm65 + pf65 + s65 + sm65 + sf65 + fert65 + 
    mort65 + lifee065 + gpop1 + fert1 + mort1 + invsh41 + geetot1 + 
    geerec1 + gde1 + govwb1 + govsh41 + gvxdxe41 + high65 + highm65 + 
    highf65 + highc65 + highcm65 + highcf65 + human65 + humanm65 + 
    humanf65 + hyr65 + hyrm65 + hyrf65 + no65 + nom65 + nof65 + 
    pinstab1 + pop65 + worker65 + pop1565 + pop6565 + sec65 + 
    secm65 + secf65 + secc65 + seccm65 + seccf65 + syr65 + syrm65 + 
    syrf65 + teapri65 + teasec65 + ex1 + im1 + xr65 + tot1

In [9]:
control.ols <- felm(control_formula,data=data)
est_ols <- summary(control.ols)$coef[2,]
confint(control.ols)[2,]
est_ols

"the matrix is either rank-deficient or indefinite"

In [10]:
summary(control.ols)$coef

"the matrix is either rank-deficient or indefinite"

Unnamed: 0,Estimate,Std. Error,t value,Pr(>|t|)
(Intercept),0.247160891,0.78450164,0.31505465,0.755056171
gdpsh465,-0.009377989,0.02988773,-0.31377391,0.756018521
intercept,,,,
bmp1l,-0.068862679,0.03253065,-2.11685513,0.043289718
freeop,0.080068973,0.20786400,0.38519884,0.703000840
freetar,-0.488962603,0.41816285,-1.16931143,0.252136478
h65,-2.362098642,0.85729167,-2.75530339,0.010192435
hm65,0.707143402,0.52314510,1.35171560,0.187285917
hf65,1.693448427,0.50318881,3.36543337,0.002232683
p65,0.265526695,0.16429407,1.61616729,0.117271228


After controlling for a rich set of characteristics, the point estimate of initial wealth reduces to $-0.009$.

# DML algorithm

Here we perform inference of the predictive coefficient $\beta$ in our partially linear statistical model, 

$$
Y = D\beta + g(Z) + \epsilon, \quad E (\epsilon | D, Z) = 0,
$$

using the **double machine learning** approach. 

For $\tilde Y = Y- E(Y|Z)$ and $\tilde D= D- E(D|Z)$, we can write
$$
\tilde Y = \alpha \tilde D + \epsilon, \quad E (\epsilon |\tilde D) =0.
$$

Using cross-fitting, we employ modern regression methods
to build estimators $\hat \ell(Z)$ and $\hat m(Z)$ of $\ell(Z):=E(Y|Z)$ and $m(Z):=E(D|Z)$ to obtain the estimates of the residualized quantities:

$$
\tilde Y_i = Y_i  - \hat \ell (Z_i),   \quad \tilde D_i = D_i - \hat m(Z_i), \quad \text{ for each } i = 1,\dots,n.
$$

Finally, using ordinary least squares of $\tilde Y_i$ on $\tilde D_i$, we obtain the 
estimate of $\beta$.

The following algorithm comsumes $Y, D, Z$, and a machine learning method for learning the residuals $\tilde Y$ and $\tilde D$, where the residuals are obtained by cross-validation (cross-fitting). Then, it prints the estimated coefficient $\beta$ and the corresponding standard error from the final OLS regression.

In [11]:
DML2.for.PLM <- function(z, d, y, dreg, yreg, nfold=10) {
  nobs <- nrow(z) #number of observations
  foldid <- rep.int(1:nfold,times = ceiling(nobs/nfold))[sample.int(nobs)] #define folds indices
  I <- split(1:nobs, foldid)  #split observation indices into folds  
  ytil <- dtil <- rep(NA, nobs)
  cat("fold: ")
  for(b in 1:length(I)){
    dfit <- dreg(z[-I[[b]],], d[-I[[b]]]) #take a fold out
    yfit <- yreg(z[-I[[b]],], y[-I[[b]]]) # take a foldt out
    dhat <- predict(dfit, z[I[[b]],], type="response") #predict the left-out fold 
    yhat <- predict(yfit, z[I[[b]],], type="response") #predict the left-out fold  
    dtil[I[[b]]] <- (d[I[[b]]] - dhat) #record residual for the left-out fold
    ytil[I[[b]]] <- (y[I[[b]]] - yhat) #record residial for the left-out fold
    cat(b," ")
        }
  #rfit <- lm(ytil ~ dtil)    #estimate the main parameter by regressing one residual on the other
  
  rfit <- felm(ytil ~ dtil,data=data) 
  coef.est <- coef(rfit)[2]  #extract coefficient
  #HC <- vcovHC(rfit)
  se    <- summary(rfit,robust=T)$coefficients[2,2] #record robust standard error by County
  cat(sprintf("\ncoef (se) = %g (%g)\n", coef.est , se))  #printing output
  return( list(coef.est =coef.est , se=se, dtil=dtil, ytil=ytil, rfit=rfit) ) #save output and residuals 
}

Now, we apply the Double Machine Learning (DML) approach with different machine learning methods. First, we load the relevant libraries.

In [12]:
library(hdm)
library(glmnet)
library(sandwich)
library(randomForest)

Loading required package: foreach
Loaded glmnet 2.0-16

randomForest 4.6-14
Type rfNews() to see new features/changes/bug fixes.


In [13]:
#Matrices
y <- as.matrix(y)
d <- as.matrix(d)
z <- as.matrix(z)
head(data.frame(cbind(y,d)))

data.Outcome,data.gdpsh465
-0.02433575,6.591674
0.10047257,6.829794
0.06705148,8.895082
0.06408917,7.565275
0.02792955,7.162397
0.04640744,7.21891


### DML using Lasso 

In [14]:
#DML with Lasso:
set.seed(123)
dreg <- function(z,d){ rlasso(z,d, post=FALSE) } #ML method= lasso from hdm 
yreg <- function(z,y){ rlasso(z,y, post=FALSE) } #ML method = lasso from hdm

In [15]:
DML2.lasso = DML2.for.PLM(z, d, y, dreg, yreg, nfold=10)

fold: 1  2  3  4  5  6  7  8  9  10  
coef (se) = -0.0369952 (0.0161759)


### DML using Post-Lasso 

In [16]:
#DML with Post-Lasso:
dreg <- function(z,d){ rlasso(z,d, post=T) } #ML method= lasso from hdm 
yreg <- function(z,y){ rlasso(z,y, post=T) } #ML method = lasso from hdm
DML2.post = DML2.for.PLM(z, d, y, dreg, yreg, nfold=10)

fold: 1  2  3  4  5  6  7  8  9  10  
coef (se) = -0.0368285 (0.0141174)


In [17]:
# DML with ols
dreg <- function(z,d){  glmnet(z,d,family="gaussian", lambda=0) } #ML method = ols from glmnet 
yreg <- function(z,y){  glmnet(z,y,family="gaussian", lambda=0) }  #ML method = ols from glmnet 
DML2.ols = DML2.for.PLM(z, d, y, dreg, yreg, nfold=10)

fold: 1  2  3  4  5  6  7  8  9  10  
coef (se) = 0.0304986 (0.0118706)


### DML using Elastic Net 

In [18]:
#DML with Elastic Net:

dreg <- function(z,d){ cv.glmnet(z,d,family="gaussian", alpha=0.5) } #ML method = elastic net from glmnet 
yreg <- function(z,y){ cv.glmnet(z,y,family="gaussian", alpha=0.5) }  #ML method = elastic net from glmnet 
DML2.elnet = DML2.for.PLM(z, d, y, dreg, yreg, nfold=10)



fold: 1  2  3  4  5  6  7  8  9  10  
coef (se) = -0.0302421 (0.0168363)


### DML using Ridge 

In [19]:
#DML with Ridge:

dreg <- function(z,d){ cv.glmnet(z,d,family="gaussian", alpha=0) } #ML method = ridge from glmnet 
yreg <- function(z,y){ cv.glmnet(z,y,family="gaussian", alpha=0) }  #ML method = ridge from glmnet 

DML2.ridge = DML2.for.PLM(z, d, y, dreg, yreg, nfold=10)

fold: 1  2  3  4  5  6  7  8  9  10  
coef (se) = -0.03481 (0.0159616)


### DML using Random Forest


In [20]:
#DML with Random Forest:
dreg <- function(z,d){ randomForest(z, d) } #ML method=Forest 
yreg <- function(z,y){ randomForest(z, y) } #ML method=Forest
set.seed(1)
DML2.RF = DML2.for.PLM(z, d, y, dreg, yreg, nfold=10) # set to 2 due to computation time

fold: 1  2  3  4  5  6  7  8  9  10  
coef (se) = -0.0396444 (0.0143189)


In [21]:
if (!is.null(d) && !is.factor(d))
             max(floor(ncol(z)/3), 1) else floor(sqrt(ncol(z)))

In [22]:
if (!is.null(d) && !is.factor(d)) 5 else 1

#### Note that the number of folds applied to each of the DML was 10

### Best method

In [23]:
mods<- list(DML2.ols, DML2.lasso, DML2.post, DML2.ridge, DML2.elnet, DML2.RF)

RMSE.mdl<- function(mdl) {
RMSEY <- sqrt(mean(mdl$ytil)^2) 
RMSED <- sqrt(mean(mdl$dtil)^2) 
return( list(RMSEY=RMSEY, RMSED=RMSED))
}

#RMSE.mdl(DML2.lasso)

#DML2.lasso$ytil

Res<- lapply(mods, RMSE.mdl)


prRes.Y<- c( Res[[1]]$RMSEY,Res[[2]]$RMSEY, Res[[3]]$RMSEY, Res[[4]]$RMSEY, Res[[5]]$RMSEY,  Res[[6]]$RMSEY)
prRes.D<- c( Res[[1]]$RMSED,Res[[2]]$RMSED, Res[[3]]$RMSED, Res[[4]]$RMSED, Res[[5]]$RMSED, Res[[6]]$RMSED)

prRes<- rbind(prRes.Y, prRes.D); 
rownames(prRes)<- c("RMSE D", "RMSE Y");
colnames(prRes)<- c("OLS", "Lasso", "Post-Lasso", "CV Ridge", "CV Elnet", "RF")
print(prRes,digit=6)

               OLS       Lasso  Post-Lasso    CV Ridge   CV Elnet          RF
RMSE D 0.000774823 0.000826522 0.000333837 7.90925e-05 0.00147464 0.000212644
RMSE Y 0.012473827 0.012439662 0.000541272 7.31453e-03 0.01495892 0.014425179


It looks like the best method for predicting D is CV Ridge, and the best method for predicting Y is CV Ridge. We can choose these methods when we compare the RMSE, the best model going to be whic one have the smallest RMSE. 


In [24]:
dreg <- function(z,d){ cv.glmnet(z,d,family="gaussian", alpha=0) } #ML method = ridge from glmnet 
yreg <- function(z,y){ cv.glmnet(z,y,family="gaussian", alpha=0) }  #ML method = ridge from glmnet 
DML2.best= DML2.for.PLM(z, d, y, dreg, yreg, nfold=10)

fold: 1  2  3  4  5  6  7  8  9  10  
coef (se) = -0.0392122 (0.0159824)


In [25]:
est_baseline

In [26]:
library(xtable)

table <- matrix(0,8,2)
table[1,1] <- as.numeric(est_baseline[1])
table[2,1] <- as.numeric(est_ols[1])
table[3,1]   <- as.numeric(DML2.lasso$coef.est)
table[4,1]   <- as.numeric(DML2.post$coef.est)
table[5,1] <-as.numeric(DML2.elnet$coef.est)
table[6,1] <-as.numeric(DML2.ridge$coef.est)
table[7,1] <-as.numeric(DML2.RF$coef.est)
table[8,1] <-as.numeric(DML2.best$coef.est)
table[1,2] <- as.numeric(est_baseline[2])
table[2,2] <- as.numeric(est_ols[2])
table[3,2]   <- as.numeric(DML2.lasso$se)
table[4,2]   <- as.numeric(DML2.post$se)
table[5,2] <-as.numeric(DML2.elnet$se)
table[6,2] <-as.numeric(DML2.ridge$se)
table[7,2] <-as.numeric(DML2.RF$se)
table[8,2] <-as.numeric(DML2.best$se)




################################# Print Results #################################

colnames(table) <- c("Estimate","Standard Error")
rownames(table) <- c("Baseline OLS", "Least Squares with controls", "Lasso", "Post-Lasso","CV Elnet", "CV Ridge", "Random Forest", 
                     "Best")

table

Unnamed: 0,Estimate,Standard Error
Baseline OLS,0.001316713,0.0061022
Least Squares with controls,-0.009377989,0.02988773
Lasso,-0.036995153,0.01617587
Post-Lasso,-0.036828539,0.0141174
CV Elnet,-0.03024211,0.01683631
CV Ridge,-0.034809993,0.01596155
Random Forest,-0.0396444,0.01431894
Best,-0.039212236,0.01598242


In [27]:
print(table, digit=3)


                            Estimate Standard Error
Baseline OLS                 0.00132         0.0061
Least Squares with controls -0.00938         0.0299
Lasso                       -0.03700         0.0162
Post-Lasso                  -0.03683         0.0141
CV Elnet                    -0.03024         0.0168
CV Ridge                    -0.03481         0.0160
Random Forest               -0.03964         0.0143
Best                        -0.03921         0.0160
