In [None]:
# Julia


## Introduction

In [None]:
In labor economics an important question is what determines the wage of workers. This is a causal question, but we could begin to investigate from a predictive perspective.

In the following wage example, $Y$ is the hourly wage of a worker and $X$ is a vector of worker's characteristics, e.g., education, experience, gender.
Two main questions here are:    

* How to use job-relevant characteristics, such as education and experience, to best predict wages?

* What is the difference in predicted wages between men and women with the same job-relevant characteristics?

In this lab, we focus on the prediction question first.

In [None]:
## Data

In [None]:
The data set we consider is from the March Supplement of the U.S. Current Population Survey, year 2015. We select white non-hispanic individuals, aged 25 to 64 years, and working more than 35 hours per week during at least 50 weeks of the year. We exclude self-employed workers; individuals living in group quarters; individuals in the military, agricultural or private household sectors; individuals with inconsistent reports on earnings and employment status; individuals with allocated or missing information in any of the variables used in the analysis; and individuals with hourly wage below $3$.

The variable of interest $Y$ is the hourly wage rate constructed as the ratio of the annual earnings to the total number of hours worked, which is constructed in turn as the product of number of weeks worked and the usual number of hours worked per week. In our analysis, we also focus on single (never married) workers. The final sample is of size $n = 5150$.

In [None]:
## Data Analysis

In [1]:

using CSV
using DataFrames
using Dates
using Plots

In [2]:
#Reading the CSV file into a DataFrame
#We have to set the category type for some variable
data = CSV.File("data/wage2015_subsample_inference.csv"; types = Dict("occ" => Float64,"occ2"=> Float64,"ind"=>Float64,"ind2"=>Float64)) |> DataFrame
println("Number of Rows : ", size(data)[1],"\n","Number of Columns : ", size(data)[2],) #rows

Number of Rows : 5150
Number of Columns : 21


In [6]:
[eltype(col) for col = eachcol(data)]

21-element Vector{DataType}:
 Int64
 Float64
 Float64
 Float64
 Float64
 Float64
 Float64
 Float64
 Float64
 Float64
 Float64
 Float64
 Float64
 Float64
 Float64
 Float64
 Float64
 String
 String
 String
 String

In [7]:
first(data,10)

Unnamed: 0_level_0,rownames,wage,lwage,sex,shs,hsg,scl,clg,ad
Unnamed: 0_level_1,Int64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64
1,10,9.61538,2.26336,1.0,0.0,0.0,0.0,1.0,0.0
2,12,48.0769,3.8728,0.0,0.0,0.0,0.0,1.0,0.0
3,15,11.0577,2.40313,0.0,0.0,1.0,0.0,0.0,0.0
4,18,13.9423,2.63493,1.0,0.0,0.0,0.0,0.0,1.0
5,19,28.8462,3.36198,1.0,0.0,0.0,0.0,1.0,0.0
6,30,11.7308,2.46222,1.0,0.0,0.0,0.0,1.0,0.0
7,43,19.2308,2.95651,1.0,0.0,1.0,0.0,0.0,0.0
8,44,19.2308,2.95651,0.0,0.0,1.0,0.0,0.0,0.0
9,47,12.0,2.48491,1.0,0.0,1.0,0.0,0.0,0.0
10,71,19.2308,2.95651,1.0,0.0,0.0,0.0,1.0,0.0


In [8]:
describe(data)

Unnamed: 0_level_0,variable,mean,min,median,max,nunique,nmissing,eltype
Unnamed: 0_level_1,Symbol,Union…,Any,Union…,Any,Union…,Nothing,DataType
1,rownames,15636.3,10.0,15260.0,32643.0,,,Int64
2,wage,23.4104,3.02198,19.2308,528.846,,,Float64
3,lwage,2.97079,1.10591,2.95651,6.2707,,,Float64
4,sex,0.444466,0.0,0.0,1.0,,,Float64
5,shs,0.023301,0.0,0.0,1.0,,,Float64
6,hsg,0.243883,0.0,0.0,1.0,,,Float64
7,scl,0.278058,0.0,0.0,1.0,,,Float64
8,clg,0.31767,0.0,0.0,1.0,,,Float64
9,ad,0.137087,0.0,0.0,1.0,,,Float64
10,mw,0.259612,0.0,0.0,1.0,,,Float64


In [5]:
n = size(data)[1]
z = select(data, Not([:rownames, :lwage, :wage]))
p = size(z)[2] 

println("Number of observations : ", n, "\n","Number of raw regressors:", p )

Number of observations : 5150
Number of raw regressors:18


In [10]:
z_subset = select(data, ["lwage","sex","shs","hsg","scl","clg","ad","mw","so","we","ne","exp1"])
describe(z_subset, :mean)

Unnamed: 0_level_0,variable,mean
Unnamed: 0_level_1,Symbol,Float64
1,lwage,2.97079
2,sex,0.444466
3,shs,0.023301
4,hsg,0.243883
5,scl,0.278058
6,clg,0.31767
7,ad,0.137087
8,mw,0.259612
9,so,0.296505
10,we,0.216117


In [None]:
## Prediction Question

In [None]:
Now, we will construct a prediction rule for hourly wage $Y$ , which depends linearly on job-relevant characteristics  $X$:

$$Y = \beta' X + \epsilon $$
 
Our goals are

* Predict wages using various characteristics of workers.

* Assess the predictive performance using the (adjusted) sample MSE, the (adjusted) sample $R^2$ and the out-of-sample $MSE$ and $R^2$.

We employ two different specifications for prediction:

- **Basic Model**: $X$ consists of a set of raw regressors (e.g. gender, experience, education indicators, occupation and industry indicators, regional indicators).

- **Flexible Model**: $X$ consists of all raw regressors from the basic model plus occupation and industry indicators, transformations (e.g.,$exp2$ and $exp3$) and additional two-way interactions of polynomial in experience with other regressors. An example of a regressor created through a two-way interaction is experience times the indicator of having a college degree.

Using the **Flexible Model**, enables us to approximate the real relationship by a more complex regression model and therefore to reduce the bias. The **Flexible Model** increases the range of potential shapes of the estimated regression function. In general, flexible models often deliver good prediction accuracy but give models which are harder to interpret.

Now, let us fit both models to our data by running ordinary least squares (ols):

In [8]:

#Pkg.add("Plots")
#Pkg.add("Lathe")
#Pkg.add("GLM")
#Pkg.add("StatsPlots")
#Pkg.add("MLBase")
#Pkg.add("StatsModels")
#Pkg.add("Combinatorics")
# Load the installed packages
using DataFrames
using CSV
using Plots
using Lathe
using GLM
using Statistics
using StatsPlots
using MLBase
using StatsModels
using Combinatorics

### basic model
basic  = @formula(lwage ~ (sex + exp1 + shs + hsg + mw + so + we + occ2+ ind2))
basic_results  = lm(basic, data)

In [34]:
# couples variables combinations
combinations_upto(x, n) = Iterators.flatten(combinations(x, i) for i in 1:n)
# combinations without same couple
expand_exp(args, deg::ConstantTerm) =
    tuple(((&)(terms...) for terms in combinations_upto(args, deg.n))...)

StatsModels.apply_schema(t::FunctionTerm{typeof(^)}, sch::StatsModels.Schema, ctx::Type) =
    apply_schema.(expand_exp(t.args_parsed...), Ref(sch), ctx)


In [35]:
flex_model = @formula(lwage ~ sex + (exp1+exp2+exp3+exp4+shs+hsg+occ2+ind2+mw+so+we)^2)
flex_model = apply_schema(flex_model, schema(flex_model, data))

flex_results = lm(flex_model, data)

StatsModels.TableRegressionModel{LinearModel{GLM.LmResp{Vector{Float64}}, GLM.DensePredChol{Float64, LinearAlgebra.CholeskyPivoted{Float64, Matrix{Float64}}}}, Matrix{Float64}}

lwage ~ 1 + sex + exp1 + exp2 + exp3 + exp4 + shs + hsg + occ2 + ind2 + mw + so + we + exp1 & exp2 + exp1 & exp3 + exp1 & exp4 + exp1 & shs + exp1 & hsg + exp1 & occ2 + exp1 & ind2 + exp1 & mw + exp1 & so + exp1 & we + exp2 & exp3 + exp2 & exp4 + exp2 & shs + exp2 & hsg + exp2 & occ2 + exp2 & ind2 + exp2 & mw + exp2 & so + exp2 & we + exp3 & exp4 + exp3 & shs + exp3 & hsg + exp3 & occ2 + exp3 & ind2 + exp3 & mw + exp3 & so + exp3 & we + exp4 & shs + exp4 & hsg + exp4 & occ2 + exp4 & ind2 + exp4 & mw + exp4 & so + exp4 & we + shs & hsg + shs & occ2 + shs & ind2 + shs & mw + shs & so + shs & we + hsg & occ2 + hsg & ind2 + hsg & mw + hsg & so + hsg & we + occ2 & ind2 + occ2 & mw + occ2 & so + occ2 & we + ind2 & mw + ind2 & so + ind2 & we + mw & so + mw & we + so & we

Coefficients:
────────────────────────────────

In [36]:
flexi = @formula(sex ~ (exp1+exp2+exp3+exp4 +shs+hsg+occ2+ind2+mw+so+we)^2)
flexi = apply_schema(flexi, schema(flexi, data))

flexi_results = lm(flexi, data)

StatsModels.TableRegressionModel{LinearModel{GLM.LmResp{Vector{Float64}}, GLM.DensePredChol{Float64, LinearAlgebra.CholeskyPivoted{Float64, Matrix{Float64}}}}, Matrix{Float64}}

sex ~ 1 + exp1 + exp2 + exp3 + exp4 + shs + hsg + occ2 + ind2 + mw + so + we + exp1 & exp2 + exp1 & exp3 + exp1 & exp4 + exp1 & shs + exp1 & hsg + exp1 & occ2 + exp1 & ind2 + exp1 & mw + exp1 & so + exp1 & we + exp2 & exp3 + exp2 & exp4 + exp2 & shs + exp2 & hsg + exp2 & occ2 + exp2 & ind2 + exp2 & mw + exp2 & so + exp2 & we + exp3 & exp4 + exp3 & shs + exp3 & hsg + exp3 & occ2 + exp3 & ind2 + exp3 & mw + exp3 & so + exp3 & we + exp4 & shs + exp4 & hsg + exp4 & occ2 + exp4 & ind2 + exp4 & mw + exp4 & so + exp4 & we + shs & hsg + shs & occ2 + shs & ind2 + shs & mw + shs & so + shs & we + hsg & occ2 + hsg & ind2 + hsg & mw + hsg & so + hsg & we + occ2 & ind2 + occ2 & mw + occ2 & so + occ2 & we + ind2 & mw + ind2 & so + ind2 & we + mw & so + mw & we + so & we

Coefficients:
────────────────────────────────────────

In [None]:
## Lasso

In [26]:

using Lasso

[32m[1m   Resolving[22m[39m package versions...
[32m[1m  No Changes[22m[39m to `C:\Users\Carol\.julia\environments\v1.7\Project.toml`
[32m[1m  No Changes[22m[39m to `C:\Users\Carol\.julia\environments\v1.7\Manifest.toml`


In [41]:
flex_model = @formula(lwage ~ sex + (exp1+exp2+exp3+exp4+shs+hsg+occ2+ind2+mw+so+we)^2)
flex_model = apply_schema(flex_model, schema(flex_model, data))
lasso_model = fit(LassoModel, flex_model, data, standarize=false)

flexi = @formula(sex ~ (exp1+exp2+exp3+exp4 +shs+hsg+occ2+ind2+mw+so+we)^2)
flexi = apply_schema(flexi, schema(flexi, data))
lasso_model = fit(LassoModel, flexi, data, standardize=false)



│ To include a constant predicator set standardize = false and intercept = false
└ @ Lasso C:\Users\Carol\.julia\packages\Lasso\H8WCl\src\Lasso.jl:360


LoadError: ArgumentError: start and stop must be finite, got NaN and NaN

### Partilliang Out using hdmjl.jl 

In [7]:
import Pkg
Pkg.add("Distributions")
Pkg.add("DataStructures")
Pkg.add("NamedArrays")
Pkg.add("PrettyTables")
Pkg.add("CodecBzip2")

[32m[1m    Updating[22m[39m registry at `C:\Users\Kenia\.julia\registries\General.toml`
[32m[1m   Resolving[22m[39m package versions...
[32m[1m    Updating[22m[39m `C:\Users\Kenia\.julia\environments\v1.7\Project.toml`
 [90m [31c24e10] [39m[92m+ Distributions v0.24.18[39m
[32m[1m  No Changes[22m[39m to `C:\Users\Kenia\.julia\environments\v1.7\Manifest.toml`
[32m[1m   Resolving[22m[39m package versions...
[32m[1m  No Changes[22m[39m to `C:\Users\Kenia\.julia\environments\v1.7\Project.toml`
[32m[1m  No Changes[22m[39m to `C:\Users\Kenia\.julia\environments\v1.7\Manifest.toml`
[32m[1m   Resolving[22m[39m package versions...
[32m[1m  No Changes[22m[39m to `C:\Users\Kenia\.julia\environments\v1.7\Project.toml`
[32m[1m  No Changes[22m[39m to `C:\Users\Kenia\.julia\environments\v1.7\Manifest.toml`
[32m[1m   Resolving[22m[39m package versions...
[32m[1m  No Changes[22m[39m to `C:\Users\Kenia\.julia\environments\v1.7\Project.toml`
[32m[1m  N

In [8]:
Pkg.add("TableOperations")

[32m[1m   Resolving[22m[39m package versions...
[32m[1m    Updating[22m[39m `C:\Users\Kenia\.julia\environments\v1.7\Project.toml`
 [90m [ab02a1b2] [39m[92m+ TableOperations v1.2.0[39m
[32m[1m  No Changes[22m[39m to `C:\Users\Kenia\.julia\environments\v1.7\Manifest.toml`


In [9]:
Pkg.add("StatsBase")

[32m[1m   Resolving[22m[39m package versions...
[32m[1m    Updating[22m[39m `C:\Users\Kenia\.julia\environments\v1.7\Project.toml`
 [90m [2913bbd2] [39m[92m+ StatsBase v0.33.16[39m
[32m[1m  No Changes[22m[39m to `C:\Users\Kenia\.julia\environments\v1.7\Manifest.toml`


In [10]:
Pkg.add("FreqTables")

[32m[1m   Resolving[22m[39m package versions...
[32m[1m   Installed[22m[39m FreqTables ─ v0.4.5
[32m[1m    Updating[22m[39m `C:\Users\Kenia\.julia\environments\v1.7\Project.toml`
 [90m [da1fdf0e] [39m[92m+ FreqTables v0.4.5[39m
[32m[1m    Updating[22m[39m `C:\Users\Kenia\.julia\environments\v1.7\Manifest.toml`
 [90m [da1fdf0e] [39m[92m+ FreqTables v0.4.5[39m
[32m[1mPrecompiling[22m[39m project...
[32m  ✓ [39mFreqTables
  1 dependency successfully precompiled in 6 seconds (202 already precompiled, 2 skipped during auto due to previous errors)


In [11]:
using RData, LinearAlgebra, GLM, DataFrames, Statistics, Random, Distributions, DataStructures, NamedArrays, PrettyTables
import CodecBzip2

### Including the exclusive package for Julia 

In [13]:
Pkg.add("Tables")

[32m[1m   Resolving[22m[39m package versions...
[32m[1m    Updating[22m[39m `C:\Users\Kenia\.julia\environments\v1.7\Project.toml`
 [90m [bd369af6] [39m[92m+ Tables v1.7.0[39m
[32m[1m  No Changes[22m[39m to `C:\Users\Kenia\.julia\environments\v1.7\Manifest.toml`


In [14]:
include("hdmjl/hdmjl.jl")

## 1. Basic model

In [15]:
#Defining Y
Y = data[!, "lwage"]
Y = DataFrame([Y], [:Y])

Unnamed: 0_level_0,Y
Unnamed: 0_level_1,Float64
1,2.26336
2,3.8728
3,2.40313
4,2.63493
5,3.36198
6,2.46222
7,2.95651
8,2.95651
9,2.48491
10,2.95651


In [16]:
# Defining D
D = data[!, "sex"]
D = DataFrame([D], [:D])

Unnamed: 0_level_0,D
Unnamed: 0_level_1,Float64
1,1.0
2,0.0
3,0.0
4,1.0
5,1.0
6,1.0
7,1.0
8,0.0
9,1.0
10,1.0


In [17]:
# Defining W
W = DataFrame(select(data, Not(["lwage", "sex", "exp2","exp3","exp4","occ","ind","ad","wage","ne","scl","clg"])))


Unnamed: 0_level_0,rownames,shs,hsg,mw,so,we,exp1,occ2,ind2
Unnamed: 0_level_1,Int64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64
1,10,0.0,0.0,0.0,0.0,0.0,7.0,11.0,18.0
2,12,0.0,0.0,0.0,0.0,0.0,31.0,10.0,9.0
3,15,0.0,1.0,0.0,0.0,0.0,18.0,19.0,4.0
4,18,0.0,0.0,0.0,0.0,0.0,25.0,1.0,12.0
5,19,0.0,0.0,0.0,0.0,0.0,22.0,6.0,22.0
6,30,0.0,0.0,0.0,0.0,0.0,1.0,5.0,14.0
7,43,0.0,1.0,0.0,0.0,0.0,42.0,17.0,14.0
8,44,0.0,1.0,0.0,0.0,0.0,37.0,17.0,9.0
9,47,0.0,1.0,0.0,0.0,0.0,31.0,13.0,19.0
10,71,0.0,0.0,0.0,0.0,0.0,4.0,10.0,18.0


In [18]:
res_Y_0 = rlasso_arg( W, Y, nothing, true, true, true, false, false, 
                    nothing, 1.1, nothing, 5000, 15, 10^(-5), -Inf, true, Inf, true )


rlasso_arg([1m5150×9 DataFrame[0m
[1m  Row [0m│[1m rownames [0m[1m shs     [0m[1m hsg     [0m[1m mw      [0m[1m so      [0m[1m we      [0m[1m exp1    [0m[1m occ2  [0m ⋯
[1m      [0m│[90m Int64    [0m[90m Float64 [0m[90m Float64 [0m[90m Float64 [0m[90m Float64 [0m[90m Float64 [0m[90m Float64 [0m[90m Float6[0m ⋯
──────┼─────────────────────────────────────────────────────────────────────────
    1 │       10      0.0      0.0      0.0      0.0      0.0      7.0     11. ⋯
    2 │       12      0.0      0.0      0.0      0.0      0.0     31.0     10.
    3 │       15      0.0      1.0      0.0      0.0      0.0     18.0     19.
    4 │       18      0.0      0.0      0.0      0.0      0.0     25.0      1.
    5 │       19      0.0      0.0      0.0      0.0      0.0     22.0      6. ⋯
    6 │       30      0.0      0.0      0.0      0.0      0.0      1.0      5.
    7 │       43      0.0      1.0      0.0      0.0      0.0     42.0     17.
    8 │     

### Then we need to use the rlasso function including the arguments declared above

In [19]:
res_Y = rlasso(res_Y_0)

Dict{String, Any} with 19 entries:
  "tss"          => 1675.17
  "dev"          => [-0.707422, 0.902016, -0.56766, -0.335859, 0.39119, -0.5085…
  "model"        => [-15626.3 -0.023301 … -0.670874 4.68311; -15624.3 -0.023301…
  "loadings"     => [5033.26 0.0986192 … 3.53181 2.92093]
  "sigma"        => [0.51999]
  "lambda0"      => 507.737
  "lambda"       => [1m9×2 DataFrame[0m…
  "intercept"    => 3.42062
  "Xy"           => [247976.0, -40.9142, -306.146, -75.374, -0.513431, 42.5347,…
  "iter"         => 4
  "residuals"    => [-0.680446, 0.595516, -0.353694, -0.781175, 0.234989, -0.64…
  "rss"          => 1675.17
  "index"        => Bool[0, 1, 1, 0, 0, 0, 1, 1, 1]
  "beta"         => [1m9×2 DataFrame[0m…
  "options"      => Dict{String, Any}("intercept"=>true, "post"=>true, "meanx"=…
  "x1"           => [-0.023301 -0.243883 … -0.670874 4.68311; -0.023301 -0.2438…
  "pen"          => Dict{String, Any}("lambda0"=>507.737, "lambda"=>[2.58707e6;…
  "startingval"  => [-0.749531, 0.6526

In [20]:
res_Y = res_Y["residuals"]

5150-element Vector{Float64}:
 -0.680446330400001
  0.5955162741329553
 -0.35369428399556235
 -0.7811745744255665
  0.23498938686962662
 -0.6433468658054966
  0.10818162637635162
  0.07575656966208404
 -0.3104501350599294
  0.010263623844916955
  0.22351698545515586
 -0.29017389042907893
 -0.44594140493993983
  ⋮
 -0.0959806597139039
 -0.2604701718246917
  0.8649128850182287
  0.5294580777643463
  0.3833467153335651
  0.14085902644197112
 -0.7617953873922643
 -0.26714666846279433
  0.23558963056136373
  0.33072008146945264
  0.6460965297130965
 -0.35287630224592237

## We do the same for the second equation 

In [21]:
res_D_0 = rlasso_arg( W, D, nothing, true, true, true, false, false, 
                    nothing, 1.1, nothing, 5000, 15, 10^(-5), -Inf, true, Inf, true )
res_D = rlasso(res_D_0)["residuals"]
# We need to convert the vector into matrix because the lm function requires "X" to be matrix 
res_D = reshape(res_D, length(res_D), 1)

5150×1 Matrix{Float64}:
  0.4394226397849019
 -0.39795301750804835
 -0.16984393059733044
  0.5007478967756236
  0.3396754207736148
  0.48366978526542914
  0.6337653647901771
 -0.2730847441336458
  0.5204336287937236
  0.4343771785548328
  0.539063607008959
 -0.45589093176097195
  0.4394226397849019
  ⋮
  0.41419533363455646
 -0.6669219071614954
  0.4343771785548328
 -0.24785743798330034
 -0.4815080990262637
  0.34976634323375294
 -0.3451668162724239
 -0.3676802501276338
  0.41225360581456927
 -0.3723358502427566
 -0.3507083904046694
 -0.516330214734571

## Regress errors

In [22]:
lm(res_D, res_Y)

LinearModel{GLM.LmResp{Vector{Float64}}, GLM.DensePredChol{Float64, CholeskyPivoted{Float64, Matrix{Float64}}}}:

Coefficients:
─────────────────────────────────────────────────────────────────
        Coef.  Std. Error      t  Pr(>|t|)  Lower 95%   Upper 95%
─────────────────────────────────────────────────────────────────
x1  -0.100664   0.0151253  -6.66    <1e-10  -0.130316  -0.0710115
─────────────────────────────────────────────────────────────────


Here the HDMJL coefficient is smaller than the OLS coefficient (-0.073309). Also, the standard error is smaller than the OLS (0.031). 

    - This meaning that the HDMJL method find a relation between sex and lwage smaller with a standard error also smaller. So, maybe we have been considered some correlation variables, that the HDMJL method with the lambda or alpha eliminate. 

## 2. FLEXIBLE MODEL

In [53]:
flex_y = @formula(lwage ~ (exp1+exp2+exp3+exp4+shs+hsg+occ2+ind2+mw+so+we)^2)

FormulaTerm
Response:
  lwage(unknown)
Predictors:
  (exp1,exp2,exp3,exp4,shs,hsg,occ2,ind2,mw,so,we)->(exp1 + exp2 + exp3 + exp4 + shs + hsg + occ2 + ind2 + mw + so + we) ^ 2

### Regression of W and Y 

In [61]:
res_f_0 = rlasso_arg( flex_y, nothing, true, true, true, false, false, 
                    nothing, 1.1, nothing, 5000, 15, 10^(-5), -Inf, true, Inf, true )

LoadError: MethodError: no method matching rlasso_arg(::FormulaTerm{Term, FunctionTerm{typeof(^), var"#13#14", (:exp1, :exp2, :exp3, :exp4, :shs, :hsg, :occ2, :ind2, :mw, :so, :we)}}, ::Nothing, ::Bool, ::Bool, ::Bool, ::Bool, ::Bool, ::Nothing, ::Float64, ::Nothing, ::Int64, ::Int64, ::Float64, ::Float64, ::Bool, ::Float64, ::Bool)
[0mClosest candidates are:
[0m  rlasso_arg(::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, [91m::Any[39m) at C:\Users\Kenia\Documents\Draft_kenia\hdmjl\hdmjl.jl:351
[0m  rlasso_arg([91m::DataFrame[39m, [91m::DataFrame[39m, [91m::Nothing[39m, ::Bool, ::Bool, ::Bool, ::Bool, [91m::Bool[39m, [91m::Nothing[39m, [91m::Float64[39m, [91m::Nothing[39m, ::Int64, [91m::Int64[39m, ::Float64, [91m::Float64[39m, [91m::Bool[39m, [91m::Float64[39m, [91m::Bool[39m) at C:\Users\Kenia\Documents\Draft_kenia\hdmjl\hdmjl.jl:351

In [62]:
res_f = rlasso(res_f_0)["residuals"]
# We need to convert the vector into matrix because the lm function requires "X" to be matrix 
res_f = reshape(res_f, length(res_f), 1)

5150×1 Matrix{Float64}:
 -0.7389789725609149
  0.662588330925215
 -0.25650250660195495
 -0.7275216622270138
  0.13313154864219778
 -0.6261356571109215
  0.06693609134971468
  0.10564329502946973
 -0.4078170963081189
 -0.04468350107826287
  0.14422187630958844
 -0.37639834292004787
 -0.5042232601427787
  ⋮
 -0.13926077657228098
 -0.3338521805030794
  0.8097985687896655
  0.5476032952422458
  0.2396174553951123
  0.03366925809829102
 -0.5962444627471757
 -0.2149000628816049
  0.14193462368352247
  0.4618174066191475
  0.6770504453516633
 -0.337838580521331

### Regression W and D

In [63]:
flex_d = @formula(sex ~ (exp1+exp2+exp3+exp4+shs+hsg+occ2+ind2+mw+so+we)^2)

FormulaTerm
Response:
  sex(unknown)
Predictors:
  (exp1,exp2,exp3,exp4,shs,hsg,occ2,ind2,mw,so,we)->(exp1 + exp2 + exp3 + exp4 + shs + hsg + occ2 + ind2 + mw + so + we) ^ 2

In [64]:
res_f_D_0 = rlasso_arg(flex_d, nothing, true, true, true, false, false, 
                    nothing, 1.1, nothing, 5000, 15, 10^(-5), -Inf, true, Inf, true )


LoadError: MethodError: no method matching rlasso_arg(::FormulaTerm{Term, FunctionTerm{typeof(^), var"#15#16", (:exp1, :exp2, :exp3, :exp4, :shs, :hsg, :occ2, :ind2, :mw, :so, :we)}}, ::Nothing, ::Bool, ::Bool, ::Bool, ::Bool, ::Bool, ::Nothing, ::Float64, ::Nothing, ::Int64, ::Int64, ::Float64, ::Float64, ::Bool, ::Float64, ::Bool)
[0mClosest candidates are:
[0m  rlasso_arg(::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, [91m::Any[39m) at C:\Users\Kenia\Documents\Draft_kenia\hdmjl\hdmjl.jl:351
[0m  rlasso_arg([91m::DataFrame[39m, [91m::DataFrame[39m, [91m::Nothing[39m, ::Bool, ::Bool, ::Bool, ::Bool, [91m::Bool[39m, [91m::Nothing[39m, [91m::Float64[39m, [91m::Nothing[39m, ::Int64, [91m::Int64[39m, ::Float64, [91m::Float64[39m, [91m::Bool[39m, [91m::Float64[39m, [91m::Bool[39m) at C:\Users\Kenia\Documents\Draft_kenia\hdmjl\hdmjl.jl:351

In [65]:
res_f_D = rlasso(res_f_D_0)["residuals"]
# We need to convert the vector into matrix because the lm function requires "X" to be matrix 
res_f_D = reshape(res_f_D, length(res_f_D), 1)

5150×1 Matrix{Float64}:
  0.5182231939074587
 -0.4909311679273292
 -0.2987583057257248
  0.4266795755595798
  0.4724513847335193
  0.46329702289873137
  0.6829329706046995
 -0.3170670293953006
  0.6463155232655479
  0.5090688320726708
  0.6463155232655479
 -0.3445301148996643
  0.5182231939074587
  ⋮
  0.4724513847335193
 -0.5733204244404202
  0.5090688320726708
 -0.27129522022136116
 -0.2896039438909369
  0.49076010840309503
 -0.5641660626056324
 -0.4360049969186019
  0.5365319175770344
 -0.5458573389360566
 -0.39030192407360376
 -0.5367029771012687

### Regressor errors

In [66]:
lm(res_f_D, res_f)

LoadError: MethodError: no method matching fit(::Type{LinearModel}, ::Matrix{Float64}, ::Matrix{Float64}, ::Nothing)
[0mClosest candidates are:
[0m  fit(::Type{LinearModel}, ::AbstractMatrix{<:Real}, [91m::AbstractVector{<:Real}[39m, ::Union{Nothing, Bool}; wts, dropcollinear) at C:\Users\Kenia\.julia\packages\GLM\oMEVz\src\lm.jl:161
[0m  fit(::Type{T}, [91m::FormulaTerm[39m, ::Any, ::Any...; contrasts, kwargs...) where T<:RegressionModel at C:\Users\Kenia\.julia\packages\StatsModels\57Kc9\src\statsmodel.jl:78
[0m  fit(::Type{T}, [91m::FormulaTerm[39m, ::Any, ::Any...; contrasts, kwargs...) where T<:StatisticalModel at C:\Users\Kenia\.julia\packages\StatsModels\57Kc9\src\statsmodel.jl:78
[0m  ...