This notebook contains an example for teaching.

# Double Lasso for Testing the Convergence Hypothesis

## Introduction

We provide an additional empirical example of partialling-out with Lasso to estimate the regression coefficient $\beta_1$ in the high-dimensional linear regression model:
  $$
  Y = \beta_1 D +  \beta_2'W + \epsilon.
  $$

Specifically, we are interested in how the rates  at which economies of different countries grow ($Y$) are related to the initial wealth levels in each country ($D$) controlling for country's institutional, educational, and other similar characteristics ($W$).

The relationship is captured by $\beta_1$, the *speed of convergence/divergence*, which measures the speed at which poor countries catch up $(\beta_1< 0)$ or fall behind $(\beta_1> 0)$ rich countries, after controlling for $W$. Our inference question here is: do poor countries grow faster than rich countries, controlling for educational and other characteristics? In other words, is the speed of convergence negative: $ \beta_1 <0?$ This is the Convergence Hypothesis predicted by the Solow Growth Model. This is a structural economic model. Under some strong assumptions that we won't state here, the predictive exercise we are doing here can be given a causal interpretation.


The outcome $Y$ is the realized annual growth rate of a country's wealth  (Gross Domestic Product per capita). The target regressor ($D$) is the initial level of the country's wealth. The target parameter $\beta_1$ is the speed of convergence, which measures the speed at which poor countries catch up with rich countries. The controls ($W$) include measures of education levels, quality of institutions, trade openness, and political stability in the country.

## Data analysis


In [1]:
install.packages(c("hdm","xtable"))

Installing packages into ‘/usr/local/lib/R/site-library’
(as ‘lib’ is unspecified)

also installing the dependencies ‘iterators’, ‘foreach’, ‘shape’, ‘RcppEigen’, ‘glmnet’, ‘checkmate’, ‘Formula’




We consider the data set GrowthData which is included in the package *hdm*.

In [2]:
library(hdm) # package of ``high dimensional models (hdm)" estimators
growth <- GrowthData
attach(growth)
names(growth)

**Exercise 1:** First, get familiar with the data. Determine the dimensions of our data set and calculate the $p/n$ ratio. Do we have a high-dimensional setting?

**Exercise 2:** To check the convergence hypothesis, analyze the relationship between the country's growth rate $Y$ and the country's other characteristics by running a linear regression (ols) in the first step. Determine the regression coefficient $\beta_1$ of the target regressor *gdpsh465* (initial wealth level, $D$), its 95% confidence interval and the standard error.

**Exercise 3:** In contrast, use the partialling-out approach based on lasso regression ("Double Lasso"). Again, determine the regression coefficient $\beta_1$ of the target regressor *gdpsh465* (initial wealth level, $D$), its 95% confidence interval and the standard error.

## Summary


**Exercise 4:** Finally, let us have a look at the results. Compare the results of Exercise 2 and Exercise 3 and interpret your findings.