* Integrante:
* Morales López Erik Brandon (20163041)

# A case of study: Testing the Convergence Hypothesis

# Introduction

We provide an additional empirical example of partialling-out with Lasso to estimate the regression coefficient $\beta_1$ in the high-dimensional linear regression model:
  $$
  Y = \beta_1 D +  \beta_2'W + \epsilon.
  $$
  
Specifically, we are interested in how the rates  at which economies of different countries grow ($Y$) are related to the initial wealth levels in each country ($D$) controlling for country's institutional, educational, and other similar characteristics ($W$).

The outcome $Y$ is the realized annual growth rate of a country's wealth  (Gross Domestic Product per capita). The target regressor ($D$) is the initial level of the country's wealth. The target parameter $\beta_1$ is the speed of convergence, which measures the speed at which poor countries catch up with rich countries. The controls ($W$) include measures of education levels, quality of institutions, trade openness, and political stability in the country.

In [1]:
#install.packages("hdm")
library(hdm)
library(xtable)

"package 'hdm' was built under R version 3.6.3"

In [2]:
# Export data to read in python
GrowthData <- GrowthData

In [3]:
library(hdm)
growth <- GrowthData
attach(growth)
names(growth)

In [4]:
dim(growth)

In [5]:
head(growth)

Outcome,intercept,gdpsh465,bmp1l,freeop,freetar,h65,hm65,hf65,p65,...,seccf65,syr65,syrm65,syrf65,teapri65,teasec65,ex1,im1,xr65,tot1
-0.02433575,1,6.591674,0.2837,0.153491,0.043888,0.007,0.013,0.001,0.29,...,0.04,0.033,0.057,0.01,47.6,17.3,0.0729,0.0667,0.348,-0.014727
0.10047257,1,6.829794,0.6141,0.313509,0.061827,0.019,0.032,0.007,0.91,...,0.64,0.173,0.274,0.067,57.1,18.0,0.094,0.1438,0.525,0.00575
0.06705148,1,8.895082,0.0,0.204244,0.009186,0.26,0.325,0.201,1.0,...,18.14,2.573,2.478,2.667,26.5,20.7,0.1741,0.175,1.082,-0.01004
0.06408917,1,7.565275,0.1997,0.248714,0.03627,0.061,0.07,0.051,1.0,...,2.63,0.438,0.453,0.424,27.8,22.7,0.1265,0.1496,6.625,-0.002195
0.02792955,1,7.162397,0.174,0.299252,0.037367,0.017,0.027,0.007,0.82,...,2.11,0.257,0.287,0.229,34.5,17.6,0.1211,0.1308,2.5,0.003283
0.04640744,1,7.21891,0.0,0.258865,0.02088,0.023,0.038,0.006,0.5,...,1.46,0.16,0.174,0.146,34.3,8.1,0.0634,0.0762,1.0,-0.001747


# Preprocessing


In [33]:

y <- growth[, 1, drop = F] # output variable
z <- as.matrix(growth)[, -c(1, 2,3)] # controls
d <- growth[, 3, drop = F] # target regressor
dim(z)
dim(y)
dim(d)

# OLS without including the country characteristics.

In [34]:
baseline_formula <- as.formula(paste(y, "~", d ))
baseline_formula
baseline.ols <- lm(baseline_formula,data=growth)

est_baseline <- summary(baseline.ols)$coef[2,]
confint(baseline.ols)[2,]
est_baseline


c(-0.024335751, 0.100472567, 0.067051482, 0.064089166, 0.027929548, 
    0.046407439, 0.06733234, 0.02097768, 0.033551236, 0.039146523, 
    0.076126507, 0.127951209, -0.024326089, 0.078293425, 0.112911547, 
    0.052308191, 0.03639089, 0.029738225, -0.056643579, 0.019204802, 
    0.085206004, 0.133982213, 0.173024738, 0.109699147, 0.015989904, 
    0.062249766, 0.109870689, 0.092106277, 0.083376041, 0.076233453, 
    0.084023841, 0.052940778, 0.116989631, 0.067851712, 0.073904906, 
    0.074070649, 0.065837688, 0.094851094, 0.052705361, 0.047180483, 
    0.039024165, 0.01677521, 0.060046086, 0.066629433, -0.063992932, 
    -0.003020401, 0.045473355, 0.031129837, -0.04871243, 0.024477353, 
    0.090677457, -0.019161499, 0.05075726, 0.000407972, -0.01568049, 
    -0.018356539, -0.025639382, 0.014256047, 0.011897288, 0.034172432, 
    -0.034045389, -0.033806346, 0.069914881, -0.081725598, 0.046010052, 
    0.066598094, -0.011384238, -0.100989896, 0.054750874, 0.094618168, 
    0.04571528

In [10]:
summary(baseline.ols)


Call:
lm(formula = baseline_formula, data = growth)

Residuals:
      Min        1Q    Median        3Q       Max 
-0.147387 -0.024088  0.001209  0.027721  0.139357 

Coefficients:
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  

# OLS including the country characteristics

In [11]:
control_formula <- as.formula(paste("Outcome", "~", paste("gdpsh465",paste(colnames(z),collapse="+"), sep = "+")))

control_formula

Outcome ~ gdpsh465 + bmp1l + freeop + freetar + h65 + hm65 + 
    hf65 + p65 + pm65 + pf65 + s65 + sm65 + sf65 + fert65 + mort65 + 
    lifee065 + gpop1 + fert1 + mort1 + invsh41 + geetot1 + geerec1 + 
    gde1 + govwb1 + govsh41 + gvxdxe41 + high65 + highm65 + highf65 + 
    highc65 + highcm65 + highcf65 + human65 + humanm65 + humanf65 + 
    hyr65 + hyrm65 + hyrf65 + no65 + nom65 + nof65 + pinstab1 + 
    pop65 + worker65 + pop1565 + pop6565 + sec65 + secm65 + secf65 + 
    secc65 + seccm65 + seccf65 + syr65 + syrm65 + syrf65 + teapri65 + 
    teasec65 + ex1 + im1 + xr65 + tot1

In [12]:
baseline.ols <- lm(control_formula,data=growth)

est_ols <- summary(baseline.ols)$coef[2,]
confint(baseline.ols)[2,]
est_ols

In [13]:
summary(baseline.ols)$coef

Unnamed: 0,Estimate,Std. Error,t value,Pr(>|t|)
(Intercept),0.247160893,0.78450163,0.31505466,0.755056170
gdpsh465,-0.009377989,0.02988773,-0.31377391,0.756018518
bmp1l,-0.068862679,0.03253065,-2.11685513,0.043289718
freeop,0.080068974,0.20786400,0.38519885,0.703000838
freetar,-0.488962605,0.41816285,-1.16931143,0.252136477
h65,-2.362098638,0.85729167,-2.75530338,0.010192435
hm65,0.707143400,0.52314511,1.35171560,0.187285919
hf65,1.693448425,0.50318881,3.36543337,0.002232683
p65,0.265526695,0.16429407,1.61616729,0.117271229
pm65,0.136952626,0.15121749,0.90566657,0.372840111


# DML algorithm

Here we perform inference of the predictive coefficient $\beta$ in our partially linear statistical model, 

$$
Y = D\beta + g(Z) + \epsilon, \quad E (\epsilon | D, Z) = 0,
$$

using the **double machine learning** approach. 

For $\tilde Y = Y- E(Y|Z)$ and $\tilde D= D- E(D|Z)$, we can write
$$
\tilde Y = \alpha \tilde D + \epsilon, \quad E (\epsilon |\tilde D) =0.
$$

Using cross-fitting, we employ modern regression methods
to build estimators $\hat \ell(Z)$ and $\hat m(Z)$ of $\ell(Z):=E(Y|Z)$ and $m(Z):=E(D|Z)$ to obtain the estimates of the residualized quantities:

$$
\tilde Y_i = Y_i  - \hat \ell (Z_i),   \quad \tilde D_i = D_i - \hat m(Z_i), \quad \text{ for each } i = 1,\dots,n.
$$

Finally, using ordinary least squares of $\tilde Y_i$ on $\tilde D_i$, we obtain the 
estimate of $\beta$.

The following algorithm comsumes $Y, D, Z$, and a machine learning method for learning the residuals $\tilde Y$ and $\tilde D$, where the residuals are obtained by cross-validation (cross-fitting). Then, it prints the estimated coefficient $\beta$ and the corresponding standard error from the final OLS regression.

Warning: This DML´s part is too important, because the cluster argument is not necessary for moddeling the algothrim, so I am going to **drop** it !!!

In [14]:
DML2.for.PLM <- function(z, d, y, dreg, yreg,nfold=2) {
  nobs <- nrow(z) #number of observations
  foldid <- rep.int(1:nfold,times = ceiling(nobs/nfold))[sample.int(nobs)] #define folds indices
  I <- split(1:nobs, foldid)  #split observation indices into folds  
  ytil <- dtil <- rep(NA, nobs)
  cat("fold: ")
  for(b in 1:length(I)){
    dfit <- dreg(z[-I[[b]],], d[-I[[b]]]) #take a fold out
    yfit <- yreg(z[-I[[b]],], y[-I[[b]]]) # take a foldt out
    dhat <- predict(dfit, z[I[[b]],], type="response") #predict the left-out fold 
    yhat <- predict(yfit, z[I[[b]],], type="response") #predict the left-out fold  
    dtil[I[[b]]] <- (d[I[[b]]] - dhat) #record residual for the left-out fold
    ytil[I[[b]]] <- (y[I[[b]]] - yhat) #record residial for the left-out fold
    cat(b," ")
        }
  #rfit <- lm(ytil ~ dtil)    #estimate the main parameter by regressing one residual on the other
  data <- data.frame(cbind(ytil, dtil))
  rfit <- lm(ytil ~ dtil,data=data) 
  coef.est <- coef(rfit)[2]  #extract coefficient
  #HC <- vcovHC(rfit)
  se    <- summary(rfit,robust=T)$coefficients[2,2] #record robust standard error by County
  cat(sprintf("\ncoef (se) = %g (%g)\n", coef.est , se))  #printing output
  return( list(coef.est =coef.est , se=se, dtil=dtil, ytil=ytil, rfit=rfit) ) #save output and residuals 
}

In [15]:
library(hdm)
library(glmnet)
library(sandwich)
library(randomForest)

Loading required package: Matrix
Loading required package: foreach
Loaded glmnet 2.0-16

randomForest 4.6-14
Type rfNews() to see new features/changes/bug fixes.


In [17]:
y <- as.matrix(y)
d <- as.matrix(d)
z <- as.matrix(z)

head(data.frame(cbind(y,d,z)))

Outcome,gdpsh465,bmp1l,freeop,freetar,h65,hm65,hf65,p65,pm65,...,seccf65,syr65,syrm65,syrf65,teapri65,teasec65,ex1,im1,xr65,tot1
-0.02433575,6.591674,0.2837,0.153491,0.043888,0.007,0.013,0.001,0.29,0.37,...,0.04,0.033,0.057,0.01,47.6,17.3,0.0729,0.0667,0.348,-0.014727
0.10047257,6.829794,0.6141,0.313509,0.061827,0.019,0.032,0.007,0.91,1.0,...,0.64,0.173,0.274,0.067,57.1,18.0,0.094,0.1438,0.525,0.00575
0.06705148,8.895082,0.0,0.204244,0.009186,0.26,0.325,0.201,1.0,1.0,...,18.14,2.573,2.478,2.667,26.5,20.7,0.1741,0.175,1.082,-0.01004
0.06408917,7.565275,0.1997,0.248714,0.03627,0.061,0.07,0.051,1.0,1.0,...,2.63,0.438,0.453,0.424,27.8,22.7,0.1265,0.1496,6.625,-0.002195
0.02792955,7.162397,0.174,0.299252,0.037367,0.017,0.027,0.007,0.82,0.85,...,2.11,0.257,0.287,0.229,34.5,17.6,0.1211,0.1308,2.5,0.003283
0.04640744,7.21891,0.0,0.258865,0.02088,0.023,0.038,0.006,0.5,0.55,...,1.46,0.16,0.174,0.146,34.3,8.1,0.0634,0.0762,1.0,-0.001747


In [18]:
dim(y)

In [19]:
dim(d)

In [20]:
dim(z)

# DML using Lasso to predict y an d.

In [21]:
#DML with Lasso:
set.seed(123)
dreg <- function(z,d){ rlasso(z,d, post=FALSE) } #ML method= lasso from hdm 

yreg <- function(z,y){ rlasso(z,y, post=FALSE) } #ML method = lasso from hdm

In [22]:
DML2.lasso = DML2.for.PLM(z, d, y, dreg,yreg,nfold=10)

fold: 1  2  3  4  5  6  7  8  9  10  
coef (se) = -0.0370317 (0.0147678)


# DML using Post-Lasso to predict y an d

In [23]:
#DML with Post-Lasso:
dreg <- function(z,d){ rlasso(z,d, post=T) } #ML method= lasso from hdm 
yreg <- function(z,y){ rlasso(z,y, post=T) } #ML method = lasso from hdm
DML2.post = DML2.for.PLM(z, d, y, dreg, yreg, nfold=10)

fold: 1  2  3  4  5  6  7  8  9  10  
coef (se) = -0.0368285 (0.0130791)


# Optional:

In [24]:
#DML with cross-validated Lasso:
dreg <- function(z,d){ cv.glmnet(z,d,family="gaussian", alpha=1) } #ML method = lasso from glmnet 
yreg <- function(z,y){ cv.glmnet(z,y,family="gaussian", alpha=1) }  #ML method = lasso from glmnet 
DML2.lasso.cv = DML2.for.PLM(z, d, y, dreg, yreg, nfold=5)


fold: 1  2  3  4  5  
coef (se) = -0.0404043 (0.0137764)


# DML using Elastic Net to predict y an d

In [25]:
dreg <- function(z,d){ cv.glmnet(z,d,family="gaussian", alpha=0.5) } #ML method = elastic net from glmnet 
yreg <- function(z,y){ cv.glmnet(z,y,family="gaussian", alpha=0.5) }  #ML method = elastic net from glmnet 
DML2.elnet = DML2.for.PLM(z, d, y, dreg, yreg, nfold=5)

fold: 1  2  3  4  5  
coef (se) = -0.037759 (0.0147713)


# DML using Ridge to predict y an d.

In [26]:
dreg <- function(z,d){ cv.glmnet(z,d,family="gaussian", alpha=0) } #ML method = ridge from glmnet 
yreg <- function(z,y){ cv.glmnet(z,y,family="gaussian", alpha=0) }  #ML method = ridge from glmnet 
DML2.ridge = DML2.for.PLM(z, d, y, dreg, yreg, nfold=5)

fold: 1  2  3  4  5  
coef (se) = -0.0377865 (0.0139965)


# Optional: Here we also compute DML with OLS used as the ML method

In [27]:
reg <- function(z,d){  glmnet(z,d,family="gaussian", lambda=0) } #ML method = ols from glmnet 
yreg <- function(z,y){  glmnet(z,y,family="gaussian", lambda=0) }  #ML method = ols from glmnet 
DML2.ols = DML2.for.PLM(z, d, y, dreg, yreg, nfold=10)

fold: 1  2  3  4  5  6  7  8  9  10  
coef (se) = 0.0242785 (0.0135431)


# DML using Random Forest to predict y an d.


In [28]:
#DML with Random Forest:
dreg <- function(z,d){ randomForest(z, d) } #ML method=Forest 
yreg <- function(z,y){ randomForest(z, y) } #ML method=Forest
set.seed(1)
DML2.RF = DML2.for.PLM(z, d, y, dreg, yreg, nfold=2) # set to 2 due to computation time

fold: 1  2  
coef (se) = -0.0290514 (0.0115981)


# Run the best method i.e. the best combination of methods to predict y an d.

In [29]:
mods<- list(DML2.ols, DML2.lasso, DML2.post, DML2.lasso.cv, DML2.ridge, DML2.elnet, DML2.RF)

RMSE.mdl<- function(mdl) {
RMSEY <- sqrt(mean(mdl$ytil)^2) 
RMSED <- sqrt(mean(mdl$dtil)^2) 
return( list(RMSEY=RMSEY, RMSED=RMSED))
}

#RMSE.mdl(DML2.lasso)

#DML2.lasso$ytil

Res<- lapply(mods, RMSE.mdl)


prRes.Y<- c( Res[[1]]$RMSEY,Res[[2]]$RMSEY, Res[[3]]$RMSEY, Res[[4]]$RMSEY, Res[[5]]$RMSEY,  Res[[6]]$RMSEY, Res[[7]]$RMSEY)
prRes.D<- c( Res[[1]]$RMSED,Res[[2]]$RMSED, Res[[3]]$RMSED, Res[[4]]$RMSED, Res[[5]]$RMSED, Res[[6]]$RMSED, Res[[7]]$RMSED)

prRes<- rbind(prRes.Y, prRes.D); 
rownames(prRes)<- c("RMSE D", "RMSE Y");
colnames(prRes)<- c("OLS", "Lasso", "Post-Lasso", "CV Lasso", "CV Ridge", "CV Elnet", "RF")
print(prRes,digit=6)

               OLS       Lasso  Post-Lasso    CV Lasso   CV Ridge    CV Elnet
RMSE D 2.96704e-05 0.000831523 0.000333837 3.73815e-18 0.00036176 3.81930e-18
RMSE Y 1.25074e-03 0.012439826 0.000541272 4.16895e-03 0.02094855 1.14092e-02
                RF
RMSE D 0.000844267
RMSE Y 0.029621814


In [30]:
dreg <- function(z,d){ rlasso(z,d, post=T) } #ML method= lasso from hdm 
yreg <- function(z,y){ cv.glmnet(z,y,family="gaussian", alpha=0) }  #ML method = ridge from glmnet 
DML2.best= DML2.for.PLM(z, d, y, dreg, yreg, nfold=10)

fold: 1  2  3  4  5  6  7  8  9  10  
coef (se) = -0.0341125 (0.0140251)


# The results

In [31]:
library(xtable)

table <- matrix(0,9,2)
table[1,1] <- as.numeric(est_baseline[1])
table[2,1] <- as.numeric(est_ols[1])
table[3,1]   <- as.numeric(DML2.lasso$coef.est)
table[4,1]   <- as.numeric(DML2.post$coef.est)
table[5,1]  <-as.numeric(DML2.lasso.cv$coef.est)
table[6,1] <-as.numeric(DML2.elnet$coef.est)
table[7,1] <-as.numeric(DML2.ridge$coef.est)
table[8,1] <-as.numeric(DML2.RF$coef.est)
table[9,1] <-as.numeric(DML2.best$coef.est)
table[1,2] <- as.numeric(est_baseline[2])
table[2,2] <- as.numeric(est_ols[2])
table[3,2]   <- as.numeric(DML2.lasso$se)
table[4,2]   <- as.numeric(DML2.post$se)
table[5,2]  <-as.numeric(DML2.lasso.cv$se)
table[6,2] <-as.numeric(DML2.elnet$se)
table[7,2] <-as.numeric(DML2.ridge$se)
table[8,2] <-as.numeric(DML2.RF$se)
table[9,2] <-as.numeric(DML2.best$se)




################################# Print Results #################################

colnames(table) <- c("Estimate","Standard Error")
rownames(table) <- c("Baseline OLS", "Least Squares with controls", "Lasso", "Post-Lasso", "CV Lasso","CV Elnet", "CV Ridge", "Random Forest", 
                     "Best")

table

Unnamed: 0,Estimate,Standard Error
Baseline OLS,0.001316713,0.0061022
Least Squares with controls,-0.009377989,0.02988773
Lasso,-0.037031673,0.01476785
Post-Lasso,-0.036828539,0.01307913
CV Lasso,-0.04040433,0.01377641
CV Elnet,-0.037759028,0.01477132
CV Ridge,-0.037786498,0.01399646
Random Forest,-0.029051448,0.01159808
Best,-0.034112491,0.01402506
