# A Case Study: The Effect of initial wealth on growth rate

We consider the problem of estimating the effect of initial wealth on growth rate for different countries. For this purpose, we estimate the following partially linear model.

$$
 Y_{j,t} = \beta D_{j,(t-1)} + g(Z_{j,t}) + \epsilon_{j,t}.
$$

## Data

In [42]:
#Install necessary packages and libraries
#install.packages("rio")
library(rio)

# Import data: 
rdata<-import("C:/Users/PC-1/Documents/GitHub/ECO224/labs/data/GrowthData.RData")
head(rdata)

Outcome,intercept,gdpsh465,bmp1l,freeop,freetar,h65,hm65,hf65,p65,...,seccf65,syr65,syrm65,syrf65,teapri65,teasec65,ex1,im1,xr65,tot1
-0.02433575,1,6.591674,0.2837,0.153491,0.043888,0.007,0.013,0.001,0.29,...,0.04,0.033,0.057,0.01,47.6,17.3,0.0729,0.0667,0.348,-0.014727
0.10047257,1,6.829794,0.6141,0.313509,0.061827,0.019,0.032,0.007,0.91,...,0.64,0.173,0.274,0.067,57.1,18.0,0.094,0.1438,0.525,0.00575
0.06705148,1,8.895082,0.0,0.204244,0.009186,0.26,0.325,0.201,1.0,...,18.14,2.573,2.478,2.667,26.5,20.7,0.1741,0.175,1.082,-0.01004
0.06408917,1,7.565275,0.1997,0.248714,0.03627,0.061,0.07,0.051,1.0,...,2.63,0.438,0.453,0.424,27.8,22.7,0.1265,0.1496,6.625,-0.002195
0.02792955,1,7.162397,0.174,0.299252,0.037367,0.017,0.027,0.007,0.82,...,2.11,0.257,0.287,0.229,34.5,17.6,0.1211,0.1308,2.5,0.003283
0.04640744,1,7.21891,0.0,0.258865,0.02088,0.023,0.038,0.006,0.5,...,1.46,0.16,0.174,0.146,34.3,8.1,0.0634,0.0762,1.0,-0.001747


In [43]:
names(rdata)

In [44]:
# Outcome variable statistics
summary(rdata$Outcome)

    Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
-0.10099  0.02104  0.04621  0.04535  0.07403  0.18553 

In [45]:
# Treatment variable statistics
summary(rdata$gdpsh465) 

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  5.762   7.132   7.726   7.703   8.442   9.230 

In [46]:
# Treatment Variable
d     <- "gdpsh465"

# Outcome Variable
y     <- "Outcome"

# Treatment Variable
D     <- rdata[which(colnames(rdata) == "gdpsh465")]

# Outcome Variable
Y     <- rdata[which(colnames(rdata) == "Outcome")]

# Construct matrix Z

Z     <- rdata[,-c(which(colnames(rdata)=="Outcome"),which(colnames(rdata)=="gdpsh465"),which(colnames(rdata)=="intercept"))]

## The effect of initial wealth

### OLS

After preprocessing the data, we first look at simple regression of $Y_{j,t}$ on $D_{j,t-1}$ without controls as a baseline model.

#### - OLS without including the country characteristics

In [47]:
baseline_formula <- as.formula(paste(y, "~", d ))
simple.ols <- lm(baseline_formula,data=rdata)

In [48]:
est_simple.ols <- summary(simple.ols)$coef[2,]
confint(simple.ols)[2,]
est_simple.ols

#### - OLS including the country characteristics

In [49]:
control_formula <- as.formula(paste(y, "~", paste(d,paste(colnames(Z),collapse="+"),
                                                          sep="+")))
full.ols <- lm(control_formula,data=rdata)

In [50]:
est_ols <- summary(full.ols)$coef[2,]
confint(full.ols)[2,]
est_ols

In [51]:
summary(full.ols)$coef

Unnamed: 0,Estimate,Std. Error,t value,Pr(>|t|)
(Intercept),0.247160893,0.78450163,0.31505466,0.755056170
gdpsh465,-0.009377989,0.02988773,-0.31377391,0.756018518
bmp1l,-0.068862679,0.03253065,-2.11685513,0.043289718
freeop,0.080068974,0.20786400,0.38519885,0.703000838
freetar,-0.488962605,0.41816285,-1.16931143,0.252136477
h65,-2.362098638,0.85729167,-2.75530338,0.010192435
hm65,0.707143400,0.52314511,1.35171560,0.187285919
hf65,1.693448425,0.50318881,3.36543337,0.002232683
p65,0.265526695,0.16429407,1.61616729,0.117271229
pm65,0.136952626,0.15121749,0.90566657,0.372840111


## DML algorithm

Here we perform inference of the predictive coefficient $\beta$ in our partially linear statistical model, 

$$
Y = D\beta + g(Z) + \epsilon, \quad E (\epsilon | D, Z) = 0,
$$

using the **double machine learning** approach. 

For $\tilde Y = Y- E(Y|Z)$ and $\tilde D= D- E(D|Z)$, we can write
$$
\tilde Y = \alpha \tilde D + \epsilon, \quad E (\epsilon |\tilde D) =0.
$$

Using cross-fitting, we employ modern regression methods
to build estimators $\hat \ell(Z)$ and $\hat m(Z)$ of $\ell(Z):=E(Y|Z)$ and $m(Z):=E(D|Z)$ to obtain the estimates of the residualized quantities:

$$
\tilde Y_i = Y_i  - \hat \ell (Z_i),   \quad \tilde D_i = D_i - \hat m(Z_i), \quad \text{ for each } i = 1,\dots,n.
$$

Finally, using ordinary least squares of $\tilde Y_i$ on $\tilde D_i$, we obtain the 
estimate of $\beta$.

The following algorithm comsumes $Y, D, Z$, and a machine learning method for learning the residuals $\tilde Y$ and $\tilde D$, where the residuals are obtained by cross-validation (cross-fitting). Then, it prints the estimated coefficient $\beta$ and the corresponding standard error from the final OLS regression.

In [52]:
DML2.for.PLM <- function(z, d, y, dreg, yreg, nfold=2) {
  nobs <- nrow(z) #number of observations
  foldid <- rep.int(1:nfold,times = ceiling(nobs/nfold))[sample.int(nobs)] #define folds indices
  I <- split(1:nobs, foldid)  #split observation indices into folds  
  ytil <- dtil <- rep(NA, nobs)
  cat("fold: ")
  for(b in 1:length(I)){
    dfit <- dreg(z[-I[[b]],], d[-I[[b]]]) #take a fold out
    yfit <- yreg(z[-I[[b]],], y[-I[[b]]]) # take a foldt out
    dhat <- predict(dfit, z[I[[b]],], type="response") #predict the left-out fold 
    yhat <- predict(yfit, z[I[[b]],], type="response") #predict the left-out fold  
    dtil[I[[b]]] <- (d[I[[b]]] - dhat) #record residual for the left-out fold
    ytil[I[[b]]] <- (y[I[[b]]] - yhat) #record residial for the left-out fold
    cat(b," ")
        }
  #rfit <- lm(ytil ~ dtil)    #estimate the main parameter by regressing one residual on the other
  data <- data.frame(cbind(ytil, dtil))
  rfit <- lm(ytil ~ dtil,data=data) 
  coef.est <- coef(rfit)[2]  #extract coefficient
  #HC <- vcovHC(rfit)
  se    <- summary(rfit,robust=T)$coefficients[2,2] #record robust standard error 
  cat(sprintf("\ncoef (se) = %g (%g)\n", coef.est , se))  #printing output
  return( list(coef.est =coef.est , se=se, dtil=dtil, ytil=ytil, rfit=rfit) ) #save output and residuals 
}

Now, we apply the Double Machine Learning (DML) approach with different machine learning methods. First, we load the relevant libraries.

In [53]:
library(hdm)
library(glmnet)
library(sandwich)
library(randomForest)

Let us, construct the input matrices.

In [54]:
y <- as.matrix(Y)
d <- as.matrix(D)
z <- as.matrix(Z)
head(data.frame(cbind(y,d)))

Outcome,gdpsh465
-0.02433575,6.591674
0.10047257,6.829794
0.06705148,8.895082
0.06408917,7.565275
0.02792955,7.162397
0.04640744,7.21891


In the following, we apply the DML approach with the differnt versions of lasso.


### Lasso

In [80]:
#DML with Lasso:
set.seed(123)
dreg <- function(z,d){ rlasso(z,d, post=FALSE) } #ML method= lasso from hdm 
yreg <- function(z,y){ rlasso(z,y, post=FALSE) } #ML method = lasso from hdm

In [57]:
DML2.lasso = DML2.for.PLM(z, d, y, dreg, yreg, nfold=10)

fold: 1  2  3  4  5  6  7  8  9  10  
coef (se) = -0.0386376 (0.0144816)


### Post-Lasso

In [58]:
#DML with Post-Lasso:
dreg <- function(z,d){ rlasso(z,d, post=T) } #ML method= lasso from hdm 
yreg <- function(z,y){ rlasso(z,y, post=T) } #ML method = lasso from hdm
DML2.post = DML2.for.PLM(z, d, y, dreg, yreg, nfold=10)

fold: 1  2  3  4  5  6  7  8  9  10  
coef (se) = -0.0363543 (0.0134263)


### Cross-validated Lasso, elastic net and ridge

In [59]:
#DML with cross-validated Lasso:
dreg <- function(z,d){ cv.glmnet(z,d,family="gaussian", alpha=1) } #ML method = lasso from glmnet 
yreg <- function(z,y){ cv.glmnet(z,y,family="gaussian", alpha=1) }  #ML method = lasso from glmnet 
DML2.lasso.cv = DML2.for.PLM(z, d, y, dreg, yreg, nfold=10)

dreg <- function(z,d){ cv.glmnet(z,d,family="gaussian", alpha=0.5) } #ML method = elastic net from glmnet 
yreg <- function(z,y){ cv.glmnet(z,y,family="gaussian", alpha=0.5) }  #ML method = elastic net from glmnet 
DML2.elnet = DML2.for.PLM(z, d, y, dreg, yreg, nfold=10)

dreg <- function(z,d){ cv.glmnet(z,d,family="gaussian", alpha=0) } #ML method = ridge from glmnet 
yreg <- function(z,y){ cv.glmnet(z,y,family="gaussian", alpha=0) }  #ML method = ridge from glmnet 
DML2.ridge = DML2.for.PLM(z, d, y, dreg, yreg, nfold=10)

fold: 1  2  3  4  5  6  7  8  9  10  
coef (se) = -0.0300068 (0.0146343)
fold: 1  2  3  4  5  6  7  8  9  10  
coef (se) = -0.0355272 (0.0149189)
fold: 1  2  3  4  5  6  7  8  9  10  
coef (se) = -0.0312497 (0.0136185)


### OLS

Here we also compute DML with OLS used as the ML method

In [60]:
dreg <- function(z,d){  glmnet(z,d,family="gaussian", lambda=0) } #ML method = ols from glmnet 
yreg <- function(z,y){  glmnet(z,y,family="gaussian", lambda=0) }  #ML method = ols from glmnet 
DML2.ols = DML2.for.PLM(z, d, y, dreg, yreg, nfold=10)

fold: 1  2  3  4  5  6  7  8  9  10  
coef (se) = 0.0274504 (0.0120743)


Next, we also apply Random Forest for comparison purposes.

### Random Forest

In [61]:
#DML with Random Forest:
dreg <- function(z,d){ randomForest(z, d) } #ML method=Forest 
yreg <- function(z,y){ randomForest(z, y) } #ML method=Forest
set.seed(1)
DML2.RF = DML2.for.PLM(z, d, y, dreg, yreg, nfold=10) # set to 2 due to computation time

fold: 1  2  3  4  5  6  7  8  9  10  
coef (se) = -0.0365831 (0.0122163)


In [62]:
if (!is.null(d) && !is.factor(d))
             max(floor(ncol(z)/3), 1) else floor(sqrt(ncol(z)))

In [63]:
if (!is.null(d) && !is.factor(d)) 5 else 1

## Compare models

Finally, let's see which method is actually better. We compute RMSE for predicting D and Y, and see which
of the methods works better.


In [67]:
mods<- list(DML2.ols, DML2.lasso, DML2.post, DML2.lasso.cv, DML2.ridge, DML2.elnet, DML2.RF)

RMSE.mdl<- function(mdl) {
RMSEY <- sqrt(mean(mdl$ytil)^2) 
RMSED <- sqrt(mean(mdl$dtil)^2) 
return( list(RMSEY=RMSEY, RMSED=RMSED))
}

#RMSE.mdl(DML2.lasso)

#DML2.lasso$ytil

Res<- lapply(mods, RMSE.mdl)


prRes.Y<- c( Res[[1]]$RMSEY,Res[[2]]$RMSEY, Res[[3]]$RMSEY, Res[[4]]$RMSEY, Res[[5]]$RMSEY,  Res[[6]]$RMSEY, Res[[7]]$RMSEY)
prRes.D<- c( Res[[1]]$RMSED,Res[[2]]$RMSED, Res[[3]]$RMSED, Res[[4]]$RMSED, Res[[5]]$RMSED, Res[[6]]$RMSED, Res[[7]]$RMSED)

prRes<- rbind(prRes.Y, prRes.D); 
rownames(prRes)<- c("RMSE D", "RMSE Y");
colnames(prRes)<- c("OLS", "Lasso", "Post-Lasso", "CV Lasso", "CV Ridge", "CV Elnet", "RF")
print(prRes,digit=10)

                  OLS           Lasso      Post-Lasso       CV Lasso
RMSE D 0.002622469765 0.0001077023406 0.0004845085056 0.001360668824
RMSE Y 0.035840874635 0.0030237868251 0.0040177951661 0.017974024859
              CV Ridge       CV Elnet             RF
RMSE D 0.0002137531629 0.001285107001 0.000133789617
RMSE Y 0.0002625647397 0.014170597094 0.014000371251


## The best model

It looks like the best method for predicting D is Lasso, and the best method for predicting Y is CV Ridge.


In [68]:
dreg <- function(z,d){ rlasso(z,d, post=T) } #ML method= lasso from hdm 
yreg <- function(z,y){ cv.glmnet(z,y,family="gaussian", alpha=0) }  #ML method = ridge from glmnet 
DML2.best= DML2.for.PLM(z, d, y, dreg, yreg, nfold=10)

fold: 1  2  3  4  5  6  7  8  9  10  
coef (se) = -0.0352265 (0.0138243)


## Final results

Let's organize the results in a table.

In [77]:
library(xtable)

table <- matrix(0,9,2)
table[1,1] <- as.numeric(est_simple.ols[1])
table[2,1] <- as.numeric(est_ols[1])
table[3,1]   <- as.numeric(DML2.lasso$coef.est)
table[4,1]   <- as.numeric(DML2.post$coef.est)
table[5,1]  <-as.numeric(DML2.lasso.cv$coef.est)
table[6,1] <-as.numeric(DML2.elnet$coef.est)
table[7,1] <-as.numeric(DML2.ridge$coef.est)
table[8,1] <-as.numeric(DML2.RF$coef.est)
table[9,1] <-as.numeric(DML2.best$coef.est)
table[1,2] <- as.numeric(est_simple.ols[2])
table[2,2] <- as.numeric(est_ols[2])
table[3,2]   <- as.numeric(DML2.lasso$se)
table[4,2]   <- as.numeric(DML2.post$se)
table[5,2]  <-as.numeric(DML2.lasso.cv$se)
table[6,2] <-as.numeric(DML2.elnet$se)
table[7,2] <-as.numeric(DML2.ridge$se)
table[8,2] <-as.numeric(DML2.RF$se)
table[9,2] <-as.numeric(DML2.best$se)




################################# Print Results #################################

colnames(table) <- c("Estimate","Standard Error")
rownames(table) <- c("Baseline OLS", "Least Squares with controls", "Lasso", "Post-Lasso", "CV Lasso","CV Elnet", "CV Ridge", "Random Forest", 
                     "Best")

table

Unnamed: 0,Estimate,Standard Error
Baseline OLS,0.001316713,0.0061022
Least Squares with controls,-0.009377989,0.02988773
Lasso,-0.038637624,0.01448156
Post-Lasso,-0.036354255,0.01342625
CV Lasso,-0.030006808,0.01463428
CV Elnet,-0.035527155,0.01491894
CV Ridge,-0.031249703,0.01361846
Random Forest,-0.036583132,0.01221635
Best,-0.035226526,0.01382434


In [79]:
print(table, digit=3)

                            Estimate Standard Error
Baseline OLS                 0.00132         0.0061
Least Squares with controls -0.00938         0.0299
Lasso                       -0.03864         0.0145
Post-Lasso                  -0.03635         0.0134
CV Lasso                    -0.03001         0.0146
CV Elnet                    -0.03553         0.0149
CV Ridge                    -0.03125         0.0136
Random Forest               -0.03658         0.0122
Best                        -0.03523         0.0138
