# HIGH-DIMENSIONAL METRICS IN R

## 2. How to get started

In [56]:
import Pkg; Pkg.add("HDMjl")

or

In [57]:
import Pkg; Pkg.add(url = "https://github.com/d2cml-ai/HDMjl.jl")

In [102]:
using HDMjl, CodecXz, GLM


## 3. Prediction using Approximate Sparsity

### 3.2. A Joint Significance Test for Lasso Regression.

In [95]:
using Random
Random.seed!(1234);
n = 100;
p = 100;
s = 3;
X = randn(n, p);
beta = vcat(fill(5, s), zeros(p - s));
Y = X * beta + randn(n);

In [103]:
rlasso(X, Y, post = false)

Dict{String, Any} with 15 entries:
  "tss"          => 8466.4
  "dev"          => [8.56479, -6.50463, -1.51601, 8.77722, -5.1338, -7.07815, 7…
  "model"        => [0.970656 0.262456 … 1.86802 -0.460151; -0.979218 -0.022244…
  "loadings"     => [1.99904, 1.58734, 1.67152, 1.86208, 1.859, 1.8311, 1.45692…
  "sigma"        => 1.56664
  "lambda0"      => 81.3601
  "lambda"       => [162.642, 129.146, 135.995, 151.499, 151.248, 148.978, 118.…
  "intercept"    => -0.196115
  "iter"         => 16
  "residuals"    => [0.119342, -0.25299, -1.17148, 0.282269, -0.643579, -0.5525…
  "rss"          => 242.981
  "index"        => Bool[1, 1, 1, 0, 0, 0, 0, 0, 0, 0  …  0, 0, 0, 0, 0, 0, 0, …
  "beta"         => [4.31658, 4.39195, 4.45657, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0…
  "options"      => Dict{String, Any}("intercept"=>true, "post"=>false, "meanx"…
  "coefficients" => [-0.196115, 4.31658, 4.39195, 4.45657, 0.0, 0.0, 0.0, 0.0, …

In [89]:
post_lasso_reg = rlasso(X, Y, post = true) #now use post-lasso
post_lasso_reg["coefficients"]'

1×101 adjoint(::Vector{Float64}) with eltype Float64:
 -0.00682754  5.00958  4.93178  5.17705  …  0.0  0.0  0.0  0.0  0.0  0.0  0.0

## 4. Inference on Target Regression Coefficients

### 4.1. Intuition for the Orthogonality Principle in Linear Models via Partialling Out.

### 4.2. Inference: Confidence Intervals and Significance Testing. The function rlassoEffects

### 4.3. Application: the effect of gender on wage

### 4.4. Application: Estimation of the treatment effect in a linear model with many confounding factors

In [92]:
using CodecXz
using RData
using DataFrames
url = "https://github.com/cran/hdm/raw/master/data/GrowthData.rda";
GrowthData = load(download(url))["GrowthData"];
y = GrowthData[:, 1];
d = GrowthData[:, 3];
X = Matrix(GrowthData[:, Not(1, 2, 3)]);

In [93]:
rlassoEffect(X, y, d, method = "double selection")

Dict{String, Any} with 10 entries:
  "alpha"            => -0.0453558
  "t"                => -2.43116
  "se"               => 0.018656
  "no_select"        => 0
  "coefficients_reg" => [0.2247, -0.0453558, -0.064512, 0.215358, -0.0960046, -…
  "sample_size"      => 90
  "coefficient"      => -0.0453558
  "selection_index"  => Bool[1, 1, 1, 1, 1, 1, 1, 1, 0, 0  …  0, 0, 0, 0, 0, 0,…
  "residuals"        => Dict("v"=>[0.210235, 0.201381, 0.00253076, -0.0671351, …
  "coefficients"     => -0.0453558

In [11]:
rlassoEffect(X, y, d, method = "partialling out")

Dict{String, Any} with 9 entries:
  "alpha"            => -0.05333
  "t"                => -3.722
  "se"               => 0.0143283
  "coefficients_reg" => [-0.0845872, -0.0461165, 0.18946, -0.0299086, 0.0, 0.0,…
  "sample_size"      => 90
  "coefficient"      => -0.05333
  "selection_index"  => Any[true, true, true, true, true, true, true, true, fal…
  "residuals"        => Dict("v"=>[0.181221, 0.116676, 0.0971757, -0.133713, 0.…
  "coefficients"     => -0.05333

## 5. Instrumental Variable Estimation in a High-Dimensional Setting

### 5.2. Application: Economic Development and Institutions.

### 5.3. Application: Impact of Eminent Domain Decisions on Economic Outcomes.

In [43]:
using Statistics
url = "https://github.com/cran/hdm/raw/master/data/EminentDomain.rda";
EminentDomain = load(download(url))["EminentDomain"];
z = EminentDomain["logGDP"]["z"];
x = EminentDomain["logGDP"]["x"];
d = EminentDomain["logGDP"]["d"];
y = EminentDomain["logGDP"]["y"];
x = x[:, (mean(x, dims = 1) .> 0.05)'];
z = z[:, (mean(z, dims = 1) .> 0.05)'];
rlassoIV(x, d, y, z)

Dict{String, Any} with 5 entries:
  "se"           => [0.0801865]
  "sample_size"  => 312
  "vcov"         => [0.00642988;;]
  "residuals"    => [-0.111753; 0.0588269; … ; 0.218765; 0.301602;;]
  "coefficients" => [-0.0449578;;]

In [47]:
rlassoIV(x, d, y, z)

Dict{String, Any} with 5 entries:
  "se"           => [0.0801865]
  "sample_size"  => 312
  "vcov"         => [0.00642988;;]
  "residuals"    => [-0.111753; 0.0588269; … ; 0.218765; 0.301602;;]
  "coefficients" => [-0.0449578;;]

## 6. Inference on Treatment Effects in a High-Dimensional Setting

### 6.3. Application: 401(k) plan participation.

In [18]:
url = "https://github.com/cran/hdm/raw/master/data/pension.rda";
pension = load(download(url))["pension"];
y = pension[:, "tw"];
d = pension[:, "p401"];
z = pension[:, "e401"];
X = Matrix(pension[:, ["i2", "i3", "i4", "i5", "i6", "i7", "a2", "a3", "a4", "a5", "fsize", "hs", "smcol", "col", "marr", "twoearn", "db", "pira", "hown"]]);
rlassoATE(X, d, y)

Dict{String, Any} with 5 entries:
  "se"          => 1930.68
  "individual"  => [-30618.3, -57537.6, -71442.9, 21383.3, -2.32925e5, 3.40765e…
  "sample_size" => 9915
  "te"          => 10180.1
  "type"        => "ATE"

In [51]:
rlassoATET(X, d, y)

Dict{String, Any} with 5 entries:
  "se"          => 2944.43
  "individual"  => [-21536.4, -52877.2, -1.44867e5, -2739.29, -307741.0, 7.3912…
  "sample_size" => 9915
  "te"          => 12628.5
  "type"        => "ATET"

In [52]:
rlassoLATE(X, d, y, z)

Dict{String, Any} with 5 entries:
  "se"          => 2326.9
  "individual"  => [-50526.8, -1.39158e5, -1.37102e5, 38508.0, -6.5644e5, 7.943…
  "sample_size" => 9915
  "te"          => 12992.1
  "type"        => "LATE"

In [53]:
rlassoLATET(X, d, y, z)

Dict{String, Any} with 5 entries:
  "se"          => 3645.28
  "individual"  => [-35580.5, -90558.0, -1.83628e5, -5303.13, -8.0766e5, 1.8866…
  "sample_size" => 9915
  "te"          => 15323.2
  "type"        => "LATET"