# A Case Study: The Effect of initial wealth on growth rate

We consider the problem of estimating the effect of initial wealth on growth rate for different countries. For this purpose, we estimate the following partially linear model.

$$
 Y_{j,t} = \beta D_{j,(t-1)} + g(Z_{j,t}) + \epsilon_{j,t}.
$$

## Data

In [39]:
#Install necessary packages and libraries
#install.packages("rio")
library(rio)

# Import data: 
rdata<-import("C:/Users/PC-1/Documents/GitHub/ECO224/labs/data/GrowthData.RData")
head(data)

Outcome,intercept,gdpsh465,bmp1l,freeop,freetar,h65,hm65,hf65,p65,...,seccf65,syr65,syrm65,syrf65,teapri65,teasec65,ex1,im1,xr65,tot1
-0.02433575,1,6.591674,0.2837,0.153491,0.043888,0.007,0.013,0.001,0.29,...,0.04,0.033,0.057,0.01,47.6,17.3,0.0729,0.0667,0.348,-0.014727
0.10047257,1,6.829794,0.6141,0.313509,0.061827,0.019,0.032,0.007,0.91,...,0.64,0.173,0.274,0.067,57.1,18.0,0.094,0.1438,0.525,0.00575
0.06705148,1,8.895082,0.0,0.204244,0.009186,0.26,0.325,0.201,1.0,...,18.14,2.573,2.478,2.667,26.5,20.7,0.1741,0.175,1.082,-0.01004
0.06408917,1,7.565275,0.1997,0.248714,0.03627,0.061,0.07,0.051,1.0,...,2.63,0.438,0.453,0.424,27.8,22.7,0.1265,0.1496,6.625,-0.002195
0.02792955,1,7.162397,0.174,0.299252,0.037367,0.017,0.027,0.007,0.82,...,2.11,0.257,0.287,0.229,34.5,17.6,0.1211,0.1308,2.5,0.003283
0.04640744,1,7.21891,0.0,0.258865,0.02088,0.023,0.038,0.006,0.5,...,1.46,0.16,0.174,0.146,34.3,8.1,0.0634,0.0762,1.0,-0.001747


In [40]:
names(rdata)

In [35]:
summary(rdata$Outcome)

    Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
-0.10099  0.02104  0.04621  0.04535  0.07403  0.18553 

In [None]:
# Treatment Variable
d     <- "?"

# Outcome Variable
y     <- "Outcome"

# Treatment Variable
D     <- rdata[which(colnames(rdata) == "?")]

# Outcome Variable
Y     <- rdata[which(colnames(rdata) == "Outcome")]

# Construct matrix Z

Z     <- rdata[,-c(which(colnames(rdata)=="Outcome"),which(colnames(rdata)=="?"))]

## The effect of initial wealth

### OLS

After preprocessing the data, we first look at simple regression of $Y_{j,t}$ on $D_{j,t-1}$ without controls as a baseline model.

In [4]:
baseline_formula <- as.formula(paste(y, "~", d ))
simple.ols <- lm(baseline_formula,data=rdata)

ERROR: Error in paste(y, "~", d): objeto 'y' no encontrado


In [78]:
summary(simple.ols)


Call:
   felm(formula = logghomr ~ logfssl | 0 | 0 | CountyCode, data = data) 

Residuals:
    Min      1Q  Median      3Q     Max 
-4.0162 -0.2253  0.0189  0.2561  1.5753 

Coefficients:
             Estimate Cluster s.e. t value Pr(>|t|)    
(Intercept) 5.189e-18    2.895e-16   0.018    0.986    
logfssl     2.823e-01    6.481e-02   4.356 1.36e-05 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.4199 on 3898 degrees of freedom
Multiple R-squared(full model): 0.006193   Adjusted R-squared: 0.005938 
Multiple R-squared(proj model): 0.006193   Adjusted R-squared: 0.005938 
F-statistic(full model, *iid*):24.29 on 1 and 3898 DF, p-value: 8.623e-07 
F-statistic(proj model): 18.97 on 1 and 194 DF, p-value: 2.146e-05 



In [79]:
confint(simple.ols)[2,]

""""""""""""""""Comentario""""""""""""""""""""""""""""""

In [81]:
control_formula <- as.formula(paste("logghomr", "~", paste("logfssl",paste(colnames(Z),collapse="+"),
                                                          sep="+"),"|0|0| CountyCode"))
control_formula

logghomr ~ logfssl + logrobr + logburg + burg_missing + robrate_missing + 
    newblack + newfhh + newmove + newdens + newmal + AGE010D + 
    AGE050D + AGE110D + AGE170D + AGE180D + AGE270D + AGE310D + 
    AGE320D + AGE350D + AGE380D + AGE410D + AGE470D + AGE570D + 
    AGE640D + AGE670D + AGE760D + BNK010D + BNK050D + BPS030D + 
    BPS130D + BPS230D + BPS020D + BPS120D + BPS220D + BPS820D + 
    BZA010D + BZA110D + BZA210D + EDU100D + EDU200D + EDU600D + 
    EDU610D + EDU620D + EDU630D + EDU635D + EDU640D + EDU650D + 
    EDU680D + EDU685D + ELE010D + ELE020D + ELE025D + ELE030D + 
    ELE035D + ELE060D + ELE065D + ELE210D + ELE220D + HIS010D + 
    HIS020D + HIS030D + HIS040D + HIS110D + HIS120D + HIS130D + 
    HIS140D + HIS200D + HIS300D + HIS500D + HIS700D + HSD010D + 
    HSD020D + HSD030D + HSD110D + HSD120D + HSD130D + HSD140D + 
    HSD150D + HSD170D + HSD200D + HSD210D + HSD230D + HSD300D + 
    HSD310D + HSG030D + HSG195D + HSG200D + HSG220D + HSG440D + 
    HSG445D + HS

In [82]:
control.ols <- felm(control_formula,data=data)
est_ols <- summary(control.ols)$coef[2,]
confint(control.ols)[2,]
est_ols

"the matrix is either rank-deficient or indefinite"
"the matrix is either rank-deficient or indefinite"


In [83]:
summary(control.ols)$coef

"the matrix is either rank-deficient or indefinite"


Unnamed: 0,Estimate,Cluster s.e.,t value,Pr(>|t|)
(Intercept),1.675988e-15,1.659040e-15,1.0102155,0.3124577934
logfssl,1.906447e-01,5.244756e-02,3.6349591,0.0002817867
logrobr,1.890471e-01,5.385086e-02,3.5105680,0.0004524588
logburg,2.192630e-01,6.282510e-02,3.4900546,0.0004885304
burg_missing,1.529614e+00,4.443831e-01,3.4421076,0.0005835660
robrate_missing,1.133081e+00,3.047595e-01,3.7179522,0.0002038034
newblack,2.434718e+01,4.480200e+01,0.5434397,0.5868597027
newfhh,-6.875928e+01,1.146440e+02,-0.5997632,0.5487006184
newmove,4.211082e+01,2.869630e+01,1.4674649,0.1423341971
newdens,,0.000000e+00,,


After controlling for a rich set of characteristics, the point estimate of gun ownership reduces to $0.19$.

# DML algorithm

Here we perform inference of the predictive coefficient $\beta$ in our partially linear statistical model, 

$$
Y = D\beta + g(Z) + \epsilon, \quad E (\epsilon | D, Z) = 0,
$$

using the **double machine learning** approach. 

For $\tilde Y = Y- E(Y|Z)$ and $\tilde D= D- E(D|Z)$, we can write
$$
\tilde Y = \alpha \tilde D + \epsilon, \quad E (\epsilon |\tilde D) =0.
$$

Using cross-fitting, we employ modern regression methods
to build estimators $\hat \ell(Z)$ and $\hat m(Z)$ of $\ell(Z):=E(Y|Z)$ and $m(Z):=E(D|Z)$ to obtain the estimates of the residualized quantities:

$$
\tilde Y_i = Y_i  - \hat \ell (Z_i),   \quad \tilde D_i = D_i - \hat m(Z_i), \quad \text{ for each } i = 1,\dots,n.
$$

Finally, using ordinary least squares of $\tilde Y_i$ on $\tilde D_i$, we obtain the 
estimate of $\beta$.

The following algorithm comsumes $Y, D, Z$, and a machine learning method for learning the residuals $\tilde Y$ and $\tilde D$, where the residuals are obtained by cross-validation (cross-fitting). Then, it prints the estimated coefficient $\beta$ and the corresponding standard error from the final OLS regression.

In [87]:
DML2.for.PLM <- function(z, d, y, dreg, yreg, nfold=2) {
  nobs <- nrow(z) #number of observations
  foldid <- rep.int(1:nfold,times = ceiling(nobs/nfold))[sample.int(nobs)] #define folds indices
  I <- split(1:nobs, foldid)  #split observation indices into folds  
  ytil <- dtil <- rep(NA, nobs)
  cat("fold: ")
  for(b in 1:length(I)){
    dfit <- dreg(z[-I[[b]],], d[-I[[b]]]) #take a fold out
    yfit <- yreg(z[-I[[b]],], y[-I[[b]]]) # take a foldt out
    dhat <- predict(dfit, z[I[[b]],], type="response") #predict the left-out fold 
    yhat <- predict(yfit, z[I[[b]],], type="response") #predict the left-out fold  
    dtil[I[[b]]] <- (d[I[[b]]] - dhat) #record residual for the left-out fold
    ytil[I[[b]]] <- (y[I[[b]]] - yhat) #record residial for the left-out fold
    cat(b," ")
        }
  #rfit <- lm(ytil ~ dtil)    #estimate the main parameter by regressing one residual on the other
  data <- data.frame(cbind(ytil, dtil))
  rfit <- lm(ytil ~ dtil,data=data) 
  coef.est <- coef(rfit)[2]  #extract coefficient
  #HC <- vcovHC(rfit)
  se    <- summary(rfit,robust=T)$coefficients[2,2] #record robust standard error 
  cat(sprintf("\ncoef (se) = %g (%g)\n", coef.est , se))  #printing output
  return( list(coef.est =coef.est , se=se, dtil=dtil, ytil=ytil, rfit=rfit) ) #save output and residuals 
}

Now, we apply the Double Machine Learning (DML) approach with different machine learning methods. First, we load the relevant libraries.

In [43]:
library(hdm)
library(glmnet)
library(sandwich)
library(randomForest)

Let us, construct the input matrices.

In [89]:
y <- as.matrix(Y)
d <- as.matrix(D)
z <- as.matrix(Z)
head(data.frame(cbind(y,d)))

Unnamed: 0_level_0,logghomr,logfssl,CountyCode
Unnamed: 0_level_1,<dbl>,<dbl>,<dbl>
1,-0.13477752,0.096127077,1073
2,-0.23962152,0.080809373,1073
3,-0.07867716,0.057339916,1073
4,-0.33146546,0.081694483,1073
5,-0.3166398,0.025365514,1073
6,0.1051319,-0.006777264,1073


In the following, we apply the DML approach with the differnt versions of lasso.


## Lasso

In [91]:
#DML with Lasso:
set.seed(123)
dreg <- function(z,d){ rlasso(z,d, post=FALSE) } #ML method= lasso from hdm 
yreg <- function(z,y){ rlasso(z,y, post=FALSE) } #ML method = lasso from hdm

In [92]:
DML2.lasso = DML2.for.PLM(z, d, y, dreg, yreg, nfold=10, clu)

fold: 1  2  3  4  5  6  7  8  9  10  
coef (se) = 0.222959 (0.0570325)


In [93]:
#DML with Post-Lasso:
dreg <- function(z,d){ rlasso(z,d, post=T) } #ML method= lasso from hdm 
yreg <- function(z,y){ rlasso(z,y, post=T) } #ML method = lasso from hdm
DML2.post = DML2.for.PLM(z, d, y, dreg, yreg, nfold=10, clu)

fold: 1  2  3  4  5  6  7  8  9  10  
coef (se) = 0.226934 (0.0561918)


In [94]:
#DML with cross-validated Lasso:
dreg <- function(z,d){ cv.glmnet(z,d,family="gaussian", alpha=1) } #ML method = lasso from glmnet 
yreg <- function(z,y){ cv.glmnet(z,y,family="gaussian", alpha=1) }  #ML method = lasso from glmnet 
DML2.lasso.cv = DML2.for.PLM(z, d, y, dreg, yreg, nfold=5, clu)

dreg <- function(z,d){ cv.glmnet(z,d,family="gaussian", alpha=0.5) } #ML method = elastic net from glmnet 
yreg <- function(z,y){ cv.glmnet(z,y,family="gaussian", alpha=0.5) }  #ML method = elastic net from glmnet 
DML2.elnet = DML2.for.PLM(z, d, y, dreg, yreg, nfold=5, clu)

dreg <- function(z,d){ cv.glmnet(z,d,family="gaussian", alpha=0) } #ML method = ridge from glmnet 
yreg <- function(z,y){ cv.glmnet(z,y,family="gaussian", alpha=0) }  #ML method = ridge from glmnet 
DML2.ridge = DML2.for.PLM(z, d, y, dreg, yreg, nfold=5, clu)

fold: 1  2  3  4  5  
coef (se) = 0.194926 (0.0569378)
fold: 1  2  3  4  5  
coef (se) = 0.208474 (0.0600804)
fold: 1  2  3  4  5  
coef (se) = 0.200234 (0.0598422)


Here we also compute DML with OLS used as the ML method

In [95]:
dreg <- function(z,d){  glmnet(z,d,family="gaussian", lambda=0) } #ML method = ols from glmnet 
yreg <- function(z,y){  glmnet(z,y,family="gaussian", lambda=0) }  #ML method = ols from glmnet 
DML2.ols = DML2.for.PLM(z, d, y, dreg, yreg, nfold=10, clu)

fold: 1  2  3  4  5  6  7  8  9  10  
coef (se) = 0.203079 (0.051136)


Next, we also apply Random Forest for comparison purposes.

### Random Forest


In [66]:
#DML with Random Forest:
dreg <- function(z,d){ randomForest(z, d) } #ML method=Forest 
yreg <- function(z,y){ randomForest(z, y) } #ML method=Forest
set.seed(1)
DML2.RF = DML2.for.PLM(z, d, y, dreg, yreg, nfold=2, clu) # set to 2 due to computation time

fold: 1  2  
coef (se) = 0.153017 (0.0605311)


In [91]:
if (!is.null(d) && !is.factor(d))
             max(floor(ncol(z)/3), 1) else floor(sqrt(ncol(z)))

In [92]:
if (!is.null(d) && !is.factor(d)) 5 else 1

We conclude that the gun ownership rates are related to gun homicide rates - if gun ownership increases by 1% relative
to a trend then the predicted gun homicide rate goes up by about 0.20% controlling for counties' characteristics.

Finally, let's see which method is actually better. We compute RMSE for predicting D and Y, and see which
of the methods works better.


In [96]:
mods<- list(DML2.ols, DML2.lasso, DML2.post, DML2.lasso.cv, DML2.ridge, DML2.elnet, DML2.RF)

RMSE.mdl<- function(mdl) {
RMSEY <- sqrt(mean(mdl$ytil)^2) 
RMSED <- sqrt(mean(mdl$dtil)^2) 
return( list(RMSEY=RMSEY, RMSED=RMSED))
}

#RMSE.mdl(DML2.lasso)

#DML2.lasso$ytil

Res<- lapply(mods, RMSE.mdl)


prRes.Y<- c( Res[[1]]$RMSEY,Res[[2]]$RMSEY, Res[[3]]$RMSEY, Res[[4]]$RMSEY, Res[[5]]$RMSEY,  Res[[6]]$RMSEY, Res[[7]]$RMSEY)
prRes.D<- c( Res[[1]]$RMSED,Res[[2]]$RMSED, Res[[3]]$RMSED, Res[[4]]$RMSED, Res[[5]]$RMSED, Res[[6]]$RMSED, Res[[7]]$RMSED)

prRes<- rbind(prRes.Y, prRes.D); 
rownames(prRes)<- c("RMSE D", "RMSE Y");
colnames(prRes)<- c("OLS", "Lasso", "Post-Lasso", "CV Lasso", "CV Ridge", "CV Elnet", "RF")
print(prRes,digit=6)

               OLS       Lasso  Post-Lasso    CV Lasso    CV Ridge    CV Elnet
RMSE D 0.000407561 3.25471e-05 1.32656e-04 0.000376929 7.24337e-04 9.66559e-04
RMSE Y 0.000134575 3.35791e-05 6.89649e-05 0.000044933 8.41741e-05 5.80479e-19
               RF
RMSE D 0.01086246
RMSE Y 0.00152755


It looks like the best method for predicting D is Lasso, and the best method for predicting Y is CV Ridge.


In [97]:
dreg <- function(z,d){ rlasso(z,d, post=T) } #ML method= lasso from hdm 
yreg <- function(z,y){ cv.glmnet(z,y,family="gaussian", alpha=0) }  #ML method = ridge from glmnet 
DML2.best= DML2.for.PLM(z, d, y, dreg, yreg, nfold=10, clu)

fold: 1  2  3  4  5  6  7  8  9  10  
coef (se) = 0.222066 (0.0565614)


Let's organize the results in a table.

In [84]:
est_baseline

In [18]:
library(xtable)

table <- matrix(0,9,2)
table[1,1] <- as.numeric(est_baseline[1])
table[2,1] <- as.numeric(est_ols[1])
table[3,1]   <- as.numeric(DML2.lasso$coef.est)
table[4,1]   <- as.numeric(DML2.post$coef.est)
table[5,1]  <-as.numeric(DML2.lasso.cv$coef.est)
table[6,1] <-as.numeric(DML2.elnet$coef.est)
table[7,1] <-as.numeric(DML2.ridge$coef.est)
table[8,1] <-as.numeric(DML2.RF$coef.est)
table[9,1] <-as.numeric(DML2.best$coef.est)
table[1,2] <- as.numeric(est_baseline[2])
table[2,2] <- as.numeric(est_ols[2])
table[3,2]   <- as.numeric(DML2.lasso$se)
table[4,2]   <- as.numeric(DML2.post$se)
table[5,2]  <-as.numeric(DML2.lasso.cv$se)
table[6,2] <-as.numeric(DML2.elnet$se)
table[7,2] <-as.numeric(DML2.ridge$se)
table[8,2] <-as.numeric(DML2.RF$se)
table[9,2] <-as.numeric(DML2.best$se)




################################# Print Results #################################

colnames(table) <- c("Estimate","Standard Error")
rownames(table) <- c("Baseline OLS", "Least Squares with controls", "Lasso", "Post-Lasso", "CV Lasso","CV Elnet", "CV Ridge", "Random Forest", 
                     "Best")

table

Unnamed: 0,Estimate,Standard Error
Baseline OLS,0.2823045,0.0648108
Least Squares with controls,0.1906447,0.05244756
Lasso,0.2228074,0.05702673
Post-Lasso,0.2269338,0.05619181
CV Lasso,0.2004742,0.05764115
CV Elnet,0.206117,0.05746222
CV Ridge,0.2013789,0.05790663
Random Forest,0.1921739,0.05814101
Best,0.2190048,0.05721956


In [19]:
print(table, digit=3)


                            Estimate Standard Error
Baseline OLS                   0.282         0.0648
Least Squares with controls    0.191         0.0524
Lasso                          0.223         0.0570
Post-Lasso                     0.227         0.0562
CV Lasso                       0.200         0.0576
CV Elnet                       0.206         0.0575
CV Ridge                       0.201         0.0579
Random Forest                  0.192         0.0581
Best                           0.219         0.0572


In [20]:
tab<- xtable(table, digits=3)
print(tab, type="latex")

% latex table generated in R 3.6.3 by xtable 1.8-4 package
% Sat Feb 13 17:41:19 2021
\begin{table}[ht]
\centering
\begin{tabular}{rrr}
  \hline
 & Estimate & Standard Error \\ 
  \hline
Baseline OLS & 0.282 & 0.065 \\ 
  Least Squares with controls & 0.191 & 0.052 \\ 
  Lasso & 0.223 & 0.057 \\ 
  Post-Lasso & 0.227 & 0.056 \\ 
  CV Lasso & 0.200 & 0.058 \\ 
  CV Elnet & 0.206 & 0.057 \\ 
  CV Ridge & 0.201 & 0.058 \\ 
  Random Forest & 0.192 & 0.058 \\ 
  Best & 0.219 & 0.057 \\ 
   \hline
\end{tabular}
\end{table}
