# HIGH-DIMENSIONAL METRICS IN R

## 2. How to get started

or

In [445]:
#import Pkg; Pkg.add(url = "https://github.com/d2cml-ai/HDMjl.jl")

In [1]:
using CodecXz, RData, DataFrames, StatsModels, Statistics, Distributions, PrettyTables

In [1]:
import Pkg

In [88]:
include("E:/causal_ml/hdm_paper/prueba/HDMjl.jl/src/HDMjl.jl")



Main.HDMjl

In [444]:
#Pkg.develop(path = "E:/causal_ml/hdm_paper/HDMjl.jl")

## 3. Prediction using Approximate Sparsity

### 3.2. A Joint Significance Test for Lasso Regression.

In [3]:
using Random
Random.seed!(1234);
n = 100;
p = 100;
s = 3;
X = randn(n, p);
beta = vcat(fill(5, s), zeros(p - s));
Y = X * beta + randn(n);

In [4]:
lasso_reg = HDMjl.rlasso(X, Y, post = false);

In [5]:
post_lasso_reg = HDMjl.rlasso(X, Y, post = true) #now use post-lasso
post_lasso_reg["coefficients"]'

1×101 adjoint(::Vector{Float64}) with eltype Float64:
 -0.00682754  5.00958  4.93178  5.17705  …  0.0  0.0  0.0  0.0  0.0  0.0  0.0

## 4. Inference on Target Regression Coefficients

### 4.1. Intuition for the Orthogonality Principle in Linear Models via Partialling Out.

In [1]:
using CSV
using DataFrames

### 4.2. Inference: Confidence Intervals and Significance Testing. The function rlassoEffects

In [1092]:
using Random, Distributions, PrettyTables, DataFrames
Random.seed!(1234);
n = 100;
p = 100;
s = 3;
x = randn(n, p);
beta = vcat(fill(3, s), zeros(p - s));
y =1 .+ x * beta + randn(n);

In [1093]:
lasso_effects = HDMjl.rlassoEffects(x, y, index = [1,2,3,50]);

In [1094]:
HDMjl.r_print(lasso_effects)

Coefficients:

 [1m    X1     [0m [1m    X2     [0m [1m    X3     [0m [1m    X50    [0m

    2.925       2.903       3.101      -0.227


In [1096]:
HDMjl.r_summary(lasso_effects)

Estimates and significance testing of the effect of target variables




 [1m     [0m [1m Estimate. [0m [1m Std. Error [0m [1m  t value [0m [1m     Pr(>|t|) [0m

 [1m  X1 [0m    2.92541     0.103597    28.2384   1.97828e-175
 [1m  X2 [0m    2.90258     0.105907     27.407   2.26599e-165
 [1m  X3 [0m    3.10095     0.110626     28.031   6.80893e-173
 [1m X50 [0m   -0.22712    0.0910927   -2.49329      0.0126566
---
Signif. codes:
0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1


In [1097]:
HDMjl.r_confint(lasso_effects)

 [1m     [0m [1m      2.5% [0m [1m     97.5% [0m

 [1m  X1 [0m    2.72236     3.12845
 [1m  X2 [0m    2.69501     3.11016
 [1m  X3 [0m    2.88413     3.31778
 [1m X50 [0m  -0.405659   -0.048582


### 4.3. Application: the effect of gender on wage

### 4.4. Application: Estimation of the treatment effect in a linear model with many confounding factors

In [1084]:
include("E:/causal_ml/hdm_paper/prueba/HDMjl.jl/src/HDMjl.jl")



Main.HDMjl

In [1085]:
using CodecXz
using RData
using DataFrames
url = "https://github.com/cran/hdm/raw/master/data/GrowthData.rda";
GrowthData = load(download(url))["GrowthData"];
y = GrowthData[:, 1];
d = GrowthData[:, 3];
X = Matrix(GrowthData[:, Not(1, 2, 3)]);

In [1086]:
lasso_effect = HDMjl.rlassoEffect(X, y, d, method = "partialling out");

In [1087]:
HDMjl.r_print(lasso_effect)

Coefficients:

 [1m      1 [0m

  -0.053


In [1088]:
HDMjl.r_summary(lasso_effect)

Estimates and significance testing of the effect of target variables
 [1m Row [0m [1m Estimate. [0m [1m Std. Error [0m [1m t value [0m [1m    Pr(>|t|) [0m

    1    -0.05333    0.0143283    -3.722   0.000197655
---
Signif. codes:
0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1


Unnamed: 0_level_0,Estimate.,Std. Error,t value,Pr(>|t|)
Unnamed: 0_level_1,Float64,Float64,Float64,Float64
1,-0.05333,0.0143283,-3.722,0.000197655


In [1089]:
doublesel_effect = HDMjl.rlassoEffect(X, y, d, method = "double selection");

In [1090]:
HDMjl.r_print(doublesel_effect)

Coefficients:

 [1m      1 [0m

  -0.045


In [1091]:
HDMjl.r_summary(doublesel_effect);

Estimates and significance testing of the effect of target variables
 [1m Row [0m [1m  Estimate. [0m [1m Std. Error [0m [1m  t value [0m [1m  Pr(>|t|) [0m

    1   -0.0453558     0.018656   -2.43116   0.0150506
---
Signif. codes:
0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1


## 5. Instrumental Variable Estimation in a High-Dimensional Setting

### 5.2. Application: Economic Development and Institutions.

In [1066]:
include("E:/causal_ml/hdm_paper/prueba/HDMjl.jl/src/HDMjl.jl")



Main.HDMjl

In [1101]:
using Statistics, StatsModels
url = "https://github.com/cran/hdm/raw/master/data/AJR.rda";
AJR = load(download(url))["AJR"];
y = AJR[!,"GDP"]
d = AJR[!,"Exprop"]
z = AJR[!,"logMort"];
x_formula = @formula(GDP ~ -1 + Latitude + Latitude2 + Africa + Asia + Namer + Samer
    + Latitude*Latitude2 + Latitude*Africa + Latitude*Asia + Latitude*Namer + Latitude*Samer
    + Latitude2*Africa + Latitude2*Asia + Latitude2*Namer + Latitude2*Samer
    + Africa*Asia + Africa*Namer + Africa*Samer
    + Asia*Namer + Asia*Samer
    + Namer*Samer)
x_dframe = ModelFrame( x_formula, AJR)
x1 = ModelMatrix(x_dframe)
x = x1.m
size(x)

(64, 21)

In [1102]:
AJR_Xselect  = HDMjl.rlassoIV(x, d, y, z, select_X=false, select_Z=false);

In [1103]:
HDMjl.r_print(AJR_Xselect)

 [1m     d1     [0m [1m intercept  [0m [1m     x1     [0m [1m     x2     [0m [1m     x3     [0m [1m     x4     [0m [1m     x5     [0m [1m     x6     [0m [1m     x7     [0m [1m     x8     [0m

    1.267       -60.876      326.819      -429.951      60.911       59.749       63.264       61.148      -14.248      -329.563

 [1m     x9     [0m [1m    x10     [0m [1m    x11     [0m [1m    x12     [0m [1m    x13     [0m [1m    x14     [0m [1m    x15     [0m [1m    x16     [0m [1m    x17     [0m [1m    x18     [0m

   -330.137     -340.404     -331.822     442.238      452.631      448.365      448.799        0.0          0.0          0.0

 [1m    x19     [0m [1m    x20     [0m [1m    x21     [0m

     0.0          0.0          0.0


In [1104]:
HDMjl.r_summary(AJR_Xselect);

Estimates and Significance Testing of the effect of target variables in the IV regression model
 [1m           [0m [1m   coeff. [0m [1m       se. [0m [1m     t-value [0m [1m  p-value [0m

         d1    1.26721     6341.02   0.000199843   0.999841
  intercept   -60.8759   1.26403e6   -4.81602e-5   0.999962
         x1    326.819   6.69943e6    4.87831e-5   0.999961
         x2   -429.951   9.15873e6   -4.69444e-5   0.999963
         x3    60.9109   1.22669e6    4.96545e-5    0.99996
         x4    59.7488   1.21222e6    4.92887e-5   0.999961
         x5    63.2635    1.2256e6    5.16182e-5   0.999959
         x6     61.148   1.22139e6    5.00644e-5    0.99996
         x7   -14.2483   4.30739e5   -3.30787e-5   0.999974
         x8   -329.563   6.63294e6   -4.96858e-5    0.99996
         x9   -330.137   6.61657e6   -4.98955e-5    0.99996
        x10   -340.404   6.63192e6   -5.13281e-5   0.999959
        x11   -331.822   6.66048e6   -4.98196e-5    0.99996
        x12    442.238

In [1105]:
HDMjl.r_confint(AJR_Xselect);

 [1m           [0m [1m       2.5% [0m [1m     97.5% [0m

         d1     -12426.9     12429.4
  intercept   -2.47751e6   2.47739e6
         x1   -1.31303e7    1.3131e7
         x2   -1.79512e7   1.79503e7
         x3   -2.40421e6   2.40433e6
         x4   -2.37585e6   2.37597e6
         x5   -2.40208e6    2.4022e6
         x6   -2.39381e6   2.39394e6
         x7    -844248.0   8.44219e5
         x8   -1.30007e7       1.3e7
         x9   -1.29686e7   1.29679e7
        x10   -1.29987e7    1.2998e7
        x11   -1.30546e7    1.3054e7
        x12   -1.72152e7   1.72161e7
        x13   -1.74247e7   1.74256e7
        x14   -1.71117e7   1.71126e7
        x15   -1.74433e7   1.74442e7
        x16          0.0         0.0
        x17          0.0         0.0
        x18          0.0         0.0
        x19          0.0         0.0
        x20          0.0         0.0
        x21          0.0         0.0


### 5.3. Application: Impact of Eminent Domain Decisions on Economic Outcomes.

In [179]:
using Crayons, Distributions

In [1056]:
include("E:/causal_ml/hdm_paper/prueba/HDMjl.jl/src/HDMjl.jl")



Main.HDMjl

In [1057]:
using Statistics, GLM
url = "https://github.com/cran/hdm/raw/master/data/EminentDomain.rda";
EminentDomain = load(download(url))["EminentDomain"];
z = EminentDomain["logGDP"]["z"];
x = EminentDomain["logGDP"]["x"];
d = EminentDomain["logGDP"]["d"];
y = EminentDomain["logGDP"]["y"];
x = x[:, (mean(x, dims = 1) .> 0.05)'];
z = z[:, (mean(z, dims = 1) .> 0.05)'];

In [1058]:
lasso_IV_Z = HDMjl.rlassoIV(x, d, y, z, select_X = false, select_Z = true);

In [1059]:
HDMjl.r_summary(lasso_IV_Z);

Estimates and Significance Testing of the effect of target variables in the IV regression model
 [1m    [0m [1m     coeff. [0m [1m       se. [0m [1m   t-value [0m [1m  p-value [0m

  d1   -0.0122757   0.0376474   -0.326069   0.744372
---
Signif. codes:
0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1


In [1060]:
HDMjl.r_confint(lasso_IV_Z);

 [1m    [0m [1m       2.5% [0m [1m     97.5% [0m

  d1   -0.0860632   0.0615119


In [1061]:
HDMjl.r_print(lasso_IV_Z)

Coefficients:

 [1m     d1     [0m

    -0.012


In [1062]:
lasso_IV_XZ = HDMjl.rlassoIV(x, d, y, z, select_X = true, select_Z = true);

In [1063]:
HDMjl.r_summary(lasso_IV_XZ);

Estimates and Significance Testing of the effect of target variables in the IV regression model
 [1m    [0m [1m     coeff. [0m [1m       se. [0m [1m   t-value [0m [1m  p-value [0m

  d1   -0.0449578   0.0801865   -0.560665   0.575026
---
Signif. codes:
0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1


In [1064]:
HDMjl.r_confint(lasso_IV_XZ);

 [1m    [0m [1m      2.5% [0m [1m    97.5% [0m

  d1   -0.202121   0.112205


In [1065]:
HDMjl.r_print(lasso_IV_XZ)

Coefficients:

 [1m     d1     [0m

    -0.045


## 6. Inference on Treatment Effects in a High-Dimensional Setting

### 6.3. Application: 401(k) plan participation.

In [26]:
url = "https://github.com/cran/hdm/raw/master/data/pension.rda";
pension = load(download(url))["pension"];
y = pension[:, "tw"];
d = pension[:, "p401"];
z = pension[:, "e401"];
X = Matrix(pension[:, ["i2", "i3", "i4", "i5", "i6", "i7", "a2", "a3", "a4", "a5", "fsize", "hs", "smcol", "col", "marr", "twoearn", "db", "pira", "hown"]]);
rlassoATE(X, d, y)

Dict{String, Any} with 5 entries:
  "se"          => 1930.68
  "individual"  => [-30618.3, -57537.6, -71442.9, 21383.3, -2.32925e5, 3.40765e…
  "sample_size" => 9915
  "te"          => 10180.1
  "type"        => "ATE"

In [51]:
rlassoATET(X, d, y)

Dict{String, Any} with 5 entries:
  "se"          => 2944.43
  "individual"  => [-21536.4, -52877.2, -1.44867e5, -2739.29, -307741.0, 7.3912…
  "sample_size" => 9915
  "te"          => 12628.5
  "type"        => "ATET"

In [52]:
rlassoLATE(X, d, y, z)

Dict{String, Any} with 5 entries:
  "se"          => 2326.9
  "individual"  => [-50526.8, -1.39158e5, -1.37102e5, 38508.0, -6.5644e5, 7.943…
  "sample_size" => 9915
  "te"          => 12992.1
  "type"        => "LATE"

In [53]:
rlassoLATET(X, d, y, z)

Dict{String, Any} with 5 entries:
  "se"          => 3645.28
  "individual"  => [-35580.5, -90558.0, -1.83628e5, -5303.13, -8.0766e5, 1.8866…
  "sample_size" => 9915
  "te"          => 15323.2
  "type"        => "LATET"