# Part 1: Heterogeneous treatment effects using causal trees and forests
For this part, we will be using experimental data for computing heterogeneous effects through causal trees and forests. For all exercises, the predictors $X$ are all variables that are not the outcome $Y$ or the treatment $D$.

### 1.1. Load the data (1 points). 
This is data for and experiment regarding the National Supported Work Demonstration (NSW) job-training program. You can find the data here, and read a description of the data here. For further details of the experiment and the program, you can use this link

In [1]:
using MLJ, MLJModels, RDatasets, DataFrames
using CSV, DataFrames, StatsModels, GLM, Random, RData, MLBase, MLJ, PrettyTables, FixedEffectModels

[32m[1mPrecompiling[22m[39m RDatasets
[32m  ✓ [39mRData
[32m  ✓ [39mCSV
[32m  ✓ [39mRDatasets
  3 dependencies successfully precompiled in 21 seconds. 65 already precompiled.
[32m[1mPrecompiling[22m[39m BangBangDataFramesExt
[32m  ✓ [39m[90mBangBang → BangBangDataFramesExt[39m
  1 dependency successfully precompiled in 3 seconds. 56 already precompiled.
[32m[1mPrecompiling[22m[39m TransducersDataFramesExt
[32m  ✓ [39m[90mTransducers → TransducersDataFramesExt[39m
  1 dependency successfully precompiled in 2 seconds. 68 already precompiled.


In [2]:
# Leer el archivo CSV
df = CSV.read("/Users/gabriel/Documents/GitHub/CausalAI-Course/Labs/Assignment/Assignment_5/data/experimental/experimental_control.csv", DataFrame)
df

Row,treat,age,educ,black,hisp,marr,nodegree,re74,re75,re78
Unnamed: 0_level_1,Int64,Int64,Int64,Int64,Int64,Int64,Int64,Float64,Float64,Float64
1,1,37,11,1,0,1,1,0.0,0.0,9930.05
2,1,22,9,0,1,0,1,0.0,0.0,3595.89
3,1,30,12,1,0,0,0,0.0,0.0,24909.4
4,1,27,11,1,0,0,1,0.0,0.0,7506.15
5,1,33,8,1,0,0,1,0.0,0.0,289.79
6,1,22,9,1,0,0,1,0.0,0.0,4056.49
7,1,23,12,1,0,0,0,0.0,0.0,0.0
8,1,32,11,1,0,0,1,0.0,0.0,8472.16
9,1,22,16,1,0,0,0,0.0,0.0,2164.02
10,1,33,12,0,0,1,0,0.0,0.0,12418.1


### 1.2. Find the ATE (1.5 points). 
With `re78` as the outcome variable of interest, find the Average Treatment Effect of participation in the program. Specifically, you should find it by calculating the difference between the means of the treatment group and the control group (the Simple Difference of Means or SDM). What can you say about the program?

In [22]:
using GLM
using DataFrames


# Fit the linear model
lin_model = lm(@formula(re78 ~ treat), df)

# Display the summary of the model
lin_model

StatsModels.TableRegressionModel{LinearModel{GLM.LmResp{Vector{Float64}}, GLM.DensePredChol{Float64, LinearAlgebra.CholeskyPivoted{Float64, Matrix{Float64}, Vector{Int64}}}}, Matrix{Float64}}

re78 ~ 1 + treat

Coefficients:
───────────────────────────────────────────────────────────────────────
               Coef.  Std. Error      t  Pr(>|t|)  Lower 95%  Upper 95%
───────────────────────────────────────────────────────────────────────
(Intercept)  4554.8      408.046  11.16    <1e-24   3752.85     5356.75
treat        1794.34     632.853   2.84    0.0048    550.574    3038.11
───────────────────────────────────────────────────────────────────────

### 1.3. Heterogeneous effects with causal trees (3 points). 
Use causal trees like we saw in class. For Python, you should use the `econml` package; for R, use the `grf` package; and for Julia, you will need to create the auxiliary variable $Y$∗ and fit a decision tree regressor. Report the splits the tree finds and interpret them.

In [None]:
using MLJ 

#Acquire our target vector and our feature matrix.
y, X = unpack(df, ==(:re78), !=(:re78));
coerce!(X, Textual => Multiclass)
D, X = unpack(X, ==(:treat), !=(:treat));

In [None]:
coerce!(X, Count => Continuous);

In [43]:
using MLJ, MLJModels, RDatasets, DataFrames

LogisticClassifier = @load LogisticClassifier pkg=MLJScikitLearnInterface verbosity=0

log_model = LogisticClassifier()

log_model_machine = machine(log_model, X, D)

fit!(log_model_machine)

[33m[1m│ [22m[39msupports. Suppress this type check by specifying `scitype_check_level=0`.
[33m[1m│ [22m[39m
[33m[1m│ [22m[39mRun `@doc MLJScikitLearnInterface.LogisticClassifier` to learn more about your model's requirements.
[33m[1m│ [22m[39m
[33m[1m│ [22m[39mCommonly, but non exclusively, supervised models are constructed using the syntax
[33m[1m│ [22m[39m`machine(model, X, y)` or `machine(model, X, y, w)` while most other models are
[33m[1m│ [22m[39mconstructed with `machine(model, X)`.  Here `X` are features, `y` a target, and `w`
[33m[1m│ [22m[39msample or class weights.
[33m[1m│ [22m[39m
[33m[1m│ [22m[39mIn general, data in `machine(model, data...)` is expected to satisfy
[33m[1m│ [22m[39m
[33m[1m│ [22m[39m    scitype(data) <: MLJ.fit_data_scitype(model)
[33m[1m│ [22m[39m
[33m[1m│ [22m[39mIn the present case:
[33m[1m│ [22m[39m
[33m[1m│ [22m[39mscitype(data) = Tuple{Table{AbstractVector{Continuous}}, AbstractVector{C

LoadError: DomainError with 1:
Can only convert categorical elements to integers. 

In [27]:
LogisticClassifier = @load LogisticClassifier pkg=MLJScikitLearnInterface #regresion logistica

log_model = LogisticClassifier()

log_model_machine = machine(log_model,X, D) #maquina que da acceso ala interfaz de 

fit!(log_model_machine)

[36m[1m[ [22m[39m[36m[1mInfo: [22m[39mFor silent loading, specify `verbosity=0`. 


import MLJScikitLearnInterface ✔


[33m[1m│ [22m[39msupports. Suppress this type check by specifying `scitype_check_level=0`.
[33m[1m│ [22m[39m
[33m[1m│ [22m[39mRun `@doc MLJScikitLearnInterface.LogisticClassifier` to learn more about your model's requirements.
[33m[1m│ [22m[39m
[33m[1m│ [22m[39mCommonly, but non exclusively, supervised models are constructed using the syntax
[33m[1m│ [22m[39m`machine(model, X, y)` or `machine(model, X, y, w)` while most other models are
[33m[1m│ [22m[39mconstructed with `machine(model, X)`.  Here `X` are features, `y` a target, and `w`
[33m[1m│ [22m[39msample or class weights.
[33m[1m│ [22m[39m
[33m[1m│ [22m[39mIn general, data in `machine(model, data...)` is expected to satisfy
[33m[1m│ [22m[39m
[33m[1m│ [22m[39m    scitype(data) <: MLJ.fit_data_scitype(model)
[33m[1m│ [22m[39m
[33m[1m│ [22m[39mIn the present case:
[33m[1m│ [22m[39m
[33m[1m│ [22m[39mscitype(data) = Tuple{Table{Union{AbstractVector{Continuous}, AbstractVec

trained Machine; caches model-specific representations of data
  model: LogisticClassifier(penalty = l2, …)
  args: 
    1:	Source @691 ⏎ Table{Union{AbstractVector{Continuous}, AbstractVector{Multiclass{14}}, AbstractVector{Multiclass{2}}}}
    2:	Source @755 ⏎ AbstractVector{Multiclass{2}}


In [None]:
DecisionTreeRegressor = (@load DecisionTreeRegressor pkg=DecisionTree verbosity=0)
tree_model = DecisionTreeRegressor(max_depth = 2)
tree_machine = machine(tree_model, X, y_star)
fit!(tree_machine);

In [None]:
fitted_params(tree_machine)[1]

### 1.4. Heterogeneous effects with causal forests (3 points). 
Use causal forests like we saw in class. For Python, you should use the `econml` package; for R, use the `grf` package; and for Julia, you will need to use the auxiliary variable $Y$ ∗ computed in the previous exercise and fit a random forest regressor. Report the importance of the prediction variables.

### 1.5. Plot heterogeneous effects (1.5 points). 
Plot how the predicted treatment effect changes depending on a variable of your choice. (You can see the last example in PD11 for clarification of what you should do in this exercise)

# Part 2: Double/Debiased machine learning in observational data
In this part, we will be using observational data for computing the average treatment effect of the same program as in Part 1. This data is constructed by taking the treatment group from the same dataset as in Part 1, but constructing the control group from a different dataset; that is, the entirety of the control is comprised of observations from the Current Population Survey. Therefor, we may not have comparable treatment and control groups. To tackle this issue, we can use Double/Debiased machine learning.

### 2.1. Load the data (1 points). 
You can find the data here, and read a description of the data here. For further details on how this data was created, you can use this link.

In [5]:
df2 = CSV.read("/Users/gabriel/Documents/GitHub/CausalAI-Course/Labs/Assignment/Assignment_5/data/observational/biased_control.csv", DataFrame)
coerce!(df2, Count => Continuous);

### 2.2. Group comparisons (1.5 points). 
For the treatment and control group separately, report summary statistics of three variables of your choice. Can you spot any big differences between the treatment and control groups?

In [6]:
# Agrupar el DataFrame por la columna 'treat'
grouped = groupby(df2, :treat)

# Acceder al primer grupo (treat == 0) y al segundo grupo (treat == 1)
control = grouped[1]
treat = grouped[2]
control = DataFrame(control)
treat = DataFrame(treat)

Row,treat,age,educ,black,hisp,marr,nodegree,re74,re75,re78,agesq,agecube,educsq,u74,u75,interaction1,re74sq,re75sq
Unnamed: 0_level_1,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64
1,1.0,37.0,11.0,1.0,0.0,1.0,1.0,0.0,0.0,9.93005,1369.0,50653.0,121.0,1.0,1.0,0.0,0.0,0.0
2,1.0,22.0,9.0,0.0,1.0,0.0,1.0,0.0,0.0,3.59589,484.0,10648.0,81.0,1.0,1.0,0.0,0.0,0.0
3,1.0,30.0,12.0,1.0,0.0,0.0,0.0,0.0,0.0,24.9094,900.0,27000.0,144.0,1.0,1.0,0.0,0.0,0.0
4,1.0,27.0,11.0,1.0,0.0,0.0,1.0,0.0,0.0,7.50615,729.0,19683.0,121.0,1.0,1.0,0.0,0.0,0.0
5,1.0,33.0,8.0,1.0,0.0,0.0,1.0,0.0,0.0,0.28979,1089.0,35937.0,64.0,1.0,1.0,0.0,0.0,0.0
6,1.0,22.0,9.0,1.0,0.0,0.0,1.0,0.0,0.0,4.05649,484.0,10648.0,81.0,1.0,1.0,0.0,0.0,0.0
7,1.0,23.0,12.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,529.0,12167.0,144.0,1.0,1.0,0.0,0.0,0.0
8,1.0,32.0,11.0,1.0,0.0,0.0,1.0,0.0,0.0,8.47216,1024.0,32768.0,121.0,1.0,1.0,0.0,0.0,0.0
9,1.0,22.0,16.0,1.0,0.0,0.0,0.0,0.0,0.0,2.16402,484.0,10648.0,256.0,1.0,1.0,0.0,0.0,0.0
10,1.0,33.0,12.0,0.0,0.0,1.0,0.0,0.0,0.0,12.4181,1089.0,35937.0,144.0,1.0,1.0,0.0,0.0,0.0


In [7]:
# Calcular estadísticas descriptivas para un DataFrame
function calc_stats(df::DataFrame)
    stats = DataFrame()  # Crear un DataFrame vacío para almacenar las estadísticas
    
    for col in names(df)
        # Evitar cálculo si la columna no es numérica
        if eltype(df[!, col]) <: Number
            # Calcular las estadísticas por columna
            mean_val = mean(skipmissing(df[!, col]))
            sd_val = std(skipmissing(df[!, col]))
            min_val = minimum(skipmissing(df[!, col]))
            max_val = maximum(skipmissing(df[!, col]))
            
            # Crear un NamedTuple para cada columna con los resultados
            row = (variable=col, mean=mean_val, sd=sd_val, min=min_val, max=max_val)
            
            # Agregar la fila como NamedTuple
            push!(stats, row)
        end
    end
    return stats
end

# Convertir el SubDataFrame a DataFrame antes de pasar a calc_stats
control_df = DataFrame(control)
treat_df = DataFrame(treat)

# Calcular estadísticas descriptivas para el grupo 'control'
control_stats = calc_stats(control_df[:, [:agesq, :educsq, :u75]])

# Calcular estadísticas descriptivas para el grupo 'treat'
treat_stats = calc_stats(treat_df[:, [:agesq, :educsq, :u75]])

# Mostrar los resultados
println("\nNúmero de observaciones de control: ", nrow(control))
println(control_stats)
println("\nNúmero de observaciones de tratamiento: ", nrow(treat))
println(treat_stats)



Número de observaciones de control: 15992
[1m3×5 DataFrame[0m
[1m Row [0m│[1m variable [0m[1m mean        [0m[1m sd         [0m[1m min     [0m[1m max     [0m
     │[90m String   [0m[90m Float64     [0m[90m Float64    [0m[90m Float64 [0m[90m Float64 [0m
─────┼─────────────────────────────────────────────────────
   1 │ agesq     1225.91      784.738       256.0   3025.0
   2 │ educsq     152.902      67.1663        0.0    324.0
   3 │ u75          0.109305    0.312031      0.0      1.0

Número de observaciones de tratamiento: 185
[1m3×5 DataFrame[0m
[1m Row [0m│[1m variable [0m[1m mean    [0m[1m sd         [0m[1m min     [0m[1m max     [0m
     │[90m String   [0m[90m Float64 [0m[90m Float64    [0m[90m Float64 [0m[90m Float64 [0m
─────┼─────────────────────────────────────────────────
   1 │ agesq     717.395  431.252       289.0   2304.0
   2 │ educsq    111.059   39.3039       16.0    256.0
   3 │ u75         0.6      0.491227      0.0  

#### There are a large number of observations in the control group compared to the treatment group. We find that the age and years of education of the control group are on average much higher than those of the treatment group, which contains younger and less educated individuals. We also see that the probability of individuals in the treatment group being unemployed in 1975 is much higher than that of the control individuals.

### 2.3. Compute the SMD (1.5 points). 
Find the simple difference of means, which we can use as a naive estimate of the ATE. How does the result in this case compare to the result in point 1.2.?

In [8]:
y = df2[!,"re78"]
d = df2[!,"treat"]
x = df2[!, Not([:"treat", :"re78"])]

Row,age,educ,black,hisp,marr,nodegree,re74,re75,agesq,agecube,educsq,u74,u75,interaction1,re74sq,re75sq
Unnamed: 0_level_1,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64
1,45.0,11.0,0.0,0.0,1.0,1.0,21.5167,25.2436,2025.0,91125.0,121.0,0.0,0.0,236.683,462.967,637.237
2,21.0,14.0,0.0,0.0,0.0,0.0,3.17597,5.85256,441.0,9261.0,196.0,0.0,0.0,44.4636,10.0868,34.2525
3,38.0,12.0,0.0,0.0,1.0,0.0,23.039,25.1308,1444.0,54872.0,144.0,0.0,0.0,276.468,530.796,631.555
4,48.0,6.0,0.0,0.0,1.0,1.0,24.9944,25.2436,2304.0,110592.0,36.0,0.0,0.0,149.966,624.718,637.237
5,18.0,8.0,0.0,0.0,1.0,1.0,1.6693,10.7276,324.0,5832.0,64.0,0.0,0.0,13.3544,2.78655,115.082
6,22.0,11.0,0.0,0.0,1.0,1.0,16.3658,18.4493,484.0,10648.0,121.0,0.0,0.0,180.023,267.838,340.376
7,48.0,10.0,0.0,0.0,1.0,1.0,16.8046,16.3546,2304.0,110592.0,100.0,0.0,0.0,168.046,282.396,267.473
8,18.0,11.0,0.0,0.0,0.0,1.0,1.14421,3.62003,324.0,5832.0,121.0,0.0,0.0,12.5863,1.30922,13.1046
9,48.0,9.0,0.0,0.0,1.0,1.0,25.8623,25.2436,2304.0,110592.0,81.0,0.0,0.0,232.761,668.86,637.237
10,45.0,12.0,0.0,0.0,1.0,0.0,25.8623,0.0,2025.0,91125.0,144.0,0.0,1.0,310.348,668.86,0.0


In [9]:
println("\n length of y is \n", size(y,1) )
println("\n num features x is \n", size(x,2 ) )

# Naive OLS
print( "\n Naive OLS that uses all features w/o cross-fitting \n" )
fm = term(:re78) ~ term(:treat) +sum(term.(Symbol.(names(x))));
lres = reg(df2, fm);
first(DataFrame(GLM.coeftable(lres)))


 length of y is 
16177

 num features x is 
16

 Naive OLS that uses all features w/o cross-fitting 


Row,Name,Estimate,Std. Error,t-stat,Pr(>|t|),Lower 95%,Upper 95%
Unnamed: 0_level_1,String,Float64,Float64,Float64,Float64,Float64,Float64
1,treat,1.5716,0.557171,2.82067,0.00479818,0.479479,2.66371


#### Compared to the result in point 1.2, we find that the effect is much smaller if we control for other covariates, as evidenced by a much smaller bias in the treatment coefficient. This may also be because in the first part, by assigning similar treatment and control groups, we found that ATE was much higher in older, highly educated, and well-performing individuals in 1975, but since the treatment group in the observational data in this case was much smaller than the control group, were young individuals, with fewer years of education, and a high probability of unemployment in 1975, it is expected that the effect would be much reduced, especially if we control for bias by other covariates.

### 2.4. Using DML (6 points). 
Use the DML procedure as we saw in the Lab, in order to find a better estimate of the ATE. You may use the `doubleML` packages for Python and R, but this package does not exist for Julia, so you will have to build your own procedure like we saw in class. You will be rewarded extra points for using more than one method for predictions. At the end, report the treatment effect you found, as well as the MSE for $D$ and $Y$ achieved by the method(s) you used.

In [10]:
function training_sample_append(cv_split, test_sample_index)
    training_indices = []
    for vector in cv_split[Not(test_sample_index)]
            training_indices = [training_indices; vector]
    end
    return training_indices, cv_split[test_sample_index]
end

function dml(x, d, y, dreg, yreg, nfold)
    n = length(y)
    cv = [partition(eachindex(y), fill(1/nfold, nfold-1)..., shuffle = true, rng = 1234)...]
    machine_y = machine(yreg, x, y, scitype_check_level=0)
    machine_d = machine(dreg, x, d, scitype_check_level=0)
    y_hat = zeros(n)
    d_hat = zeros(n)

    for fold in 1:nfold
            training_fold, test_fold = training_sample_append(cv, fold)
            y_hat[test_fold] = MLJ.predict(MLJ.fit!(machine_y, rows = training_fold), x[test_fold, :])
            d_hat[test_fold] = MLJ.predict(MLJ.fit!(machine_d, rows = training_fold), x[test_fold, :])
    end

    resy = y .- y_hat
    resd = reshape(d .- d_hat, (n, 1))
    estimate = lm(resd, resy)
    coef_est = GLM.coef(estimate)[1]
    se = GLM.coeftable(estimate).cols[2][1]
    println(" coef (se) = ", coef_est ,"(",se,")")
    return coef_est, se, resy, resd;
end

function summarize(point, stderr, resy, resd, name)
    return DataFrame(
            model = [name],
            estimate = [point], stderr = [stderr], 
            rmse_y = [sqrt(mean(resy .^ 2))], 
            rmse_d = [sqrt(mean(resd .^ 2))]
    )
end

summarize (generic function with 1 method)

In [None]:
# Pkg.add("MLJScikitLearnInterface")
# Pkg.add("LIBSVM")
# Pkg.add("NearestNeighborModels")
# Pkg.add("MLJXGBoostInterface")

[32m[1m   Resolving[22m[39m package versions...
[32m[1m   Installed[22m[39m CUDA_Runtime_jll ──── v0.15.5+0
[32m[1m   Installed[22m[39m CUDA_Driver_jll ───── v0.10.4+0
[32m[1m   Installed[22m[39m MLJXGBoostInterface ─ v0.3.12
[32m[1m   Installed[22m[39m XGBoost_jll ───────── v2.0.1+0
[32m[1m   Installed[22m[39m SparseMatricesCSR ─── v0.6.8
[32m[1m   Installed[22m[39m XGBoost ───────────── v2.5.1
[32m[1m    Updating[22m[39m `~/.julia/environments/v1.10/Project.toml`
  [90m[54119dfa] [39m[92m+ MLJXGBoostInterface v0.3.12[39m
[32m[1m    Updating[22m[39m `~/.julia/environments/v1.10/Manifest.toml`
  [90m[54119dfa] [39m[92m+ MLJXGBoostInterface v0.3.12[39m
  [90m[a0a7dd2c] [39m[92m+ SparseMatricesCSR v0.6.8[39m
  [90m[009559a3] [39m[92m+ XGBoost v2.5.1[39m
  [90m[4ee394cb] [39m[92m+ CUDA_Driver_jll v0.10.4+0[39m
  [90m[76a88914] [39m[92m+ CUDA_Runtime_jll v0.15.5+0[39m
  [90m[a5c6f535] [39m[92m+ XGBoost_jll v2.0.1+0[39m
[32m[

In [12]:
# OLS
LinearRegressor = @load LinearRegressor pkg=MLJScikitLearnInterface verbosity=0
dreg_ols = Standardizer() |> LinearRegressor()
yreg_ols = Standardizer() |> LinearRegressor()
result_ols = dml(x, d, y, dreg_ols, yreg_ols, 10)
table_ols = summarize(result_ols..., "OLS")

[36m[1m[ [22m[39m[36m[1mInfo: [22m[39mTraining machine(DeterministicPipeline(standardizer = Standardizer(features = Symbol[], …), …), …).
[36m[1m[ [22m[39m[36m[1mInfo: [22m[39mTraining machine(:standardizer, …).
[36m[1m[ [22m[39m[36m[1mInfo: [22m[39mTraining machine(:linear_regressor, …).
[36m[1m[ [22m[39m[36m[1mInfo: [22m[39mTraining machine(DeterministicPipeline(standardizer = Standardizer(features = Symbol[], …), …), …).
[36m[1m[ [22m[39m[36m[1mInfo: [22m[39mTraining machine(:standardizer, …).
[36m[1m[ [22m[39m[36m[1mInfo: [22m[39mTraining machine(:linear_regressor, …).
[36m[1m[ [22m[39m[36m[1mInfo: [22m[39mTraining machine(DeterministicPipeline(standardizer = Standardizer(features = Symbol[], …), …), …).
[36m[1m[ [22m[39m[36m[1mInfo: [22m[39mTraining machine(:standardizer, …).
[36m[1m[ [22m[39m[36m[1mInfo: [22m[39mTraining machine(:linear_regressor, …).
[36m[1m[ [22m[39m[36m[1mInfo: [22m[39mTraining mac

 coef (se) = 1.5709386603565003(0.5566576621515648)


[36m[1m[ [22m[39m[36m[1mInfo: [22m[39mTraining machine(:standardizer, …).
[36m[1m[ [22m[39m[36m[1mInfo: [22m[39mTraining machine(:linear_regressor, …).


Row,model,estimate,stderr,rmse_y,rmse_d
Unnamed: 0_level_1,String,Float64,Float64,Float64,Float64
1,OLS,1.57094,0.556658,6.9873,0.0986685


In [14]:
# Lasso
LassoCVRegressor = @load LassoCVRegressor pkg=MLJScikitLearnInterface verbosity=0
dreg_lasso = Standardizer() |> LassoCVRegressor(max_iter=200000)
yreg_lasso = Standardizer() |> LassoCVRegressor(max_iter=200000)
result_lasso = dml(x, d, y, dreg_lasso, yreg_lasso, 10)
table_lasso = summarize(result_lasso..., "Lasso")

[36m[1m[ [22m[39m[36m[1mInfo: [22m[39mTraining machine(DeterministicPipeline(standardizer = Standardizer(features = Symbol[], …), …), …).
[36m[1m[ [22m[39m[36m[1mInfo: [22m[39mTraining machine(:standardizer, …).
[36m[1m[ [22m[39m[36m[1mInfo: [22m[39mTraining machine(:lasso_cv_regressor, …).
[36m[1m[ [22m[39m[36m[1mInfo: [22m[39mTraining machine(DeterministicPipeline(standardizer = Standardizer(features = Symbol[], …), …), …).
[36m[1m[ [22m[39m[36m[1mInfo: [22m[39mTraining machine(:standardizer, …).
[36m[1m[ [22m[39m[36m[1mInfo: [22m[39mTraining machine(:lasso_cv_regressor, …).
[36m[1m[ [22m[39m[36m[1mInfo: [22m[39mTraining machine(DeterministicPipeline(standardizer = Standardizer(features = Symbol[], …), …), …).
[36m[1m[ [22m[39m[36m[1mInfo: [22m[39mTraining machine(:standardizer, …).
[36m[1m[ [22m[39m[36m[1mInfo: [22m[39mTraining machine(:lasso_cv_regressor, …).
[36m[1m[ [22m[39m[36m[1mInfo: [22m[39mTraini

 coef (se) = 1.4553455503963617(0.5561085672312804)


Row,model,estimate,stderr,rmse_y,rmse_d
Unnamed: 0_level_1,String,Float64,Float64,Float64,Float64
1,Lasso,1.45535,0.556109,6.99038,0.0988129


In [15]:
# Random Forest
RandomForestRegressor = @load RandomForestRegressor pkg=MLJScikitLearnInterface verbosity=0
dreg_rf = RandomForestRegressor()
yreg_rf = RandomForestRegressor()
result_rf = dml(x, d, y, dreg_rf, yreg_rf, 10)
table_rf = summarize(result_rf..., "RF")

[36m[1m[ [22m[39m[36m[1mInfo: [22m[39mTraining machine(RandomForestRegressor(n_estimators = 100, …), …).
[36m[1m[ [22m[39m[36m[1mInfo: [22m[39mTraining machine(RandomForestRegressor(n_estimators = 100, …), …).
[36m[1m[ [22m[39m[36m[1mInfo: [22m[39mTraining machine(RandomForestRegressor(n_estimators = 100, …), …).
[36m[1m[ [22m[39m[36m[1mInfo: [22m[39mTraining machine(RandomForestRegressor(n_estimators = 100, …), …).
[36m[1m[ [22m[39m[36m[1mInfo: [22m[39mTraining machine(RandomForestRegressor(n_estimators = 100, …), …).
[36m[1m[ [22m[39m[36m[1mInfo: [22m[39mTraining machine(RandomForestRegressor(n_estimators = 100, …), …).
[36m[1m[ [22m[39m[36m[1mInfo: [22m[39mTraining machine(RandomForestRegressor(n_estimators = 100, …), …).
[36m[1m[ [22m[39m[36m[1mInfo: [22m[39mTraining machine(RandomForestRegressor(n_estimators = 100, …), …).
[36m[1m[ [22m[39m[36m[1mInfo: [22m[39mTraining machine(RandomForestRegressor(n_estimator

 coef (se) = 1.5053645786584204(0.6426445504973509)


Row,model,estimate,stderr,rmse_y,rmse_d
Unnamed: 0_level_1,String,Float64,Float64,Float64,Float64
1,RF,1.50536,0.642645,7.40085,0.0905319


In [17]:
# Elastic Net
ElasticNetRegressor = @load ElasticNetRegressor pkg=MLJScikitLearnInterface verbosity=0
dreg_en = Standardizer() |> ElasticNetRegressor(alpha=0.5, l1_ratio=0.1)  # Reemplaza "lambda" por "l1_ratio"
yreg_en = Standardizer() |> ElasticNetRegressor(alpha=0.5, l1_ratio=0.1)
result_en = dml(x, d, y, dreg_en, yreg_en, 10)
table_en = summarize(result_en..., "ElasticNet")

[36m[1m[ [22m[39m[36m[1mInfo: [22m[39mTraining machine(DeterministicPipeline(standardizer = Standardizer(features = Symbol[], …), …), …).
[36m[1m[ [22m[39m[36m[1mInfo: [22m[39mTraining machine(:standardizer, …).
[36m[1m[ [22m[39m[36m[1mInfo: [22m[39mTraining machine(:elastic_net_regressor, …).
[36m[1m[ [22m[39m[36m[1mInfo: [22m[39mTraining machine(DeterministicPipeline(standardizer = Standardizer(features = Symbol[], …), …), …).
[36m[1m[ [22m[39m[36m[1mInfo: [22m[39mTraining machine(:standardizer, …).
[36m[1m[ [22m[39m[36m[1mInfo: [22m[39mTraining machine(:elastic_net_regressor, …).
[36m[1m[ [22m[39m[36m[1mInfo: [22m[39mTraining machine(DeterministicPipeline(standardizer = Standardizer(features = Symbol[], …), …), …).
[36m[1m[ [22m[39m[36m[1mInfo: [22m[39mTraining machine(:standardizer, …).
[36m[1m[ [22m[39m[36m[1mInfo: [22m[39mTraining machine(:elastic_net_regressor, …).
[36m[1m[ [22m[39m[36m[1mInfo: [22m[

 coef (se) = 0.18783548777207718(0.5231771458798286)


[36m[1m[ [22m[39m[36m[1mInfo: [22m[39mTraining machine(:standardizer, …).
[36m[1m[ [22m[39m[36m[1mInfo: [22m[39mTraining machine(:elastic_net_regressor, …).


Row,model,estimate,stderr,rmse_y,rmse_d
Unnamed: 0_level_1,String,Float64,Float64,Float64,Float64
1,ElasticNet,0.187835,0.523177,7.07545,0.106333


In [18]:
# K Nearest Neighbors
KNNRegressor = @load KNNRegressor pkg=NearestNeighborModels verbosity=0
dreg = KNNRegressor()
yreg = KNNRegressor()
results_knn = dml(x, d, y, dreg, yreg, 10)
table_knn = summarize(results_knn..., "KNN")

[36m[1m[ [22m[39m[36m[1mInfo: [22m[39mTraining machine(KNNRegressor(K = 5, …), …).
[36m[1m[ [22m[39m[36m[1mInfo: [22m[39mTraining machine(KNNRegressor(K = 5, …), …).
[36m[1m[ [22m[39m[36m[1mInfo: [22m[39mTraining machine(KNNRegressor(K = 5, …), …).
[36m[1m[ [22m[39m[36m[1mInfo: [22m[39mTraining machine(KNNRegressor(K = 5, …), …).
[36m[1m[ [22m[39m[36m[1mInfo: [22m[39mTraining machine(KNNRegressor(K = 5, …), …).
[36m[1m[ [22m[39m[36m[1mInfo: [22m[39mTraining machine(KNNRegressor(K = 5, …), …).
[36m[1m[ [22m[39m[36m[1mInfo: [22m[39mTraining machine(KNNRegressor(K = 5, …), …).
[36m[1m[ [22m[39m[36m[1mInfo: [22m[39mTraining machine(KNNRegressor(K = 5, …), …).
[36m[1m[ [22m[39m[36m[1mInfo: [22m[39mTraining machine(KNNRegressor(K = 5, …), …).
[36m[1m[ [22m[39m[36m[1mInfo: [22m[39mTraining machine(KNNRegressor(K = 5, …), …).
[36m[1m[ [22m[39m[36m[1mInfo: [22m[39mTraining machine(KNNRegressor(K = 5, …), …).

 coef (se) = 0.7429484013047497(0.5816572915893975)


Row,model,estimate,stderr,rmse_y,rmse_d
Unnamed: 0_level_1,String,Float64,Float64,Float64,Float64
1,KNN,0.742948,0.581657,7.63472,0.103197


In [None]:
# Gradient Boosting Machine (GBM) using XGBoost
XGBoostRegressor = @load XGBoostRegressor pkg=XGBoost verbosity=0
dreg_gbm = XGBoostRegressor(
    num_round=100,      # Número de iteraciones
    max_depth=3,        # Profundidad máxima de los árboles
    eta=0.1,            # Tasa de aprendizaje
    booster="gbtree"    # Modelo base (árboles)
)
yreg_gbm = XGBoostRegressor(
    num_round=100, 
    max_depth=3, 
    eta=0.1, 
    booster="gbtree"
)

result_gbm = dml(x, d, y, dreg_gbm, yreg_gbm, 10)
table_gbm = summarize(result_gbm..., "GBM")


[36m[1m[ [22m[39m[36m[1mInfo: [22m[39mTraining machine(XGBoostRegressor(test = 1, …), …).
[36m[1m[ [22m[39m[36m[1mInfo: [22m[39mXGBoost: starting training.
[36m[1m[ [22m[39m[36m[1mInfo: [22m[39m[1]	train-rmse:15.83809179854136140
[36m[1m[ [22m[39m[36m[1mInfo: [22m[39m[2]	train-rmse:14.58895733050216670
[36m[1m[ [22m[39m[36m[1mInfo: [22m[39m[3]	train-rmse:13.49199795475135311
[36m[1m[ [22m[39m[36m[1mInfo: [22m[39m[4]	train-rmse:12.52995093362801704
[36m[1m[ [22m[39m[36m[1mInfo: [22m[39m[5]	train-rmse:11.69102764640653902
[36m[1m[ [22m[39m[36m[1mInfo: [22m[39m[6]	train-rmse:10.96277068433759361
[36m[1m[ [22m[39m[36m[1mInfo: [22m[39m[7]	train-rmse:10.33295323136701960
[36m[1m[ [22m[39m[36m[1mInfo: [22m[39m[8]	train-rmse:9.79249308823849418
[36m[1m[ [22m[39m[36m[1mInfo: [22m[39m[9]	train-rmse:9.32901233997069212
[36m[1m[ [22m[39m[36m[1mInfo: [22m[39m[10]	train-rmse:8.93488705491117230
[36m[1m[

 coef (se) = 0.9613631113558706(0.6455538783392295)


[36m[1m[ [22m[39m[36m[1mInfo: [22m[39m[68]	train-rmse:0.08304720426699297
[36m[1m[ [22m[39m[36m[1mInfo: [22m[39m[69]	train-rmse:0.08300992966065458
[36m[1m[ [22m[39m[36m[1mInfo: [22m[39m[70]	train-rmse:0.08299876246043485
[36m[1m[ [22m[39m[36m[1mInfo: [22m[39m[71]	train-rmse:0.08298469421017057
[36m[1m[ [22m[39m[36m[1mInfo: [22m[39m[72]	train-rmse:0.08292541829822986
[36m[1m[ [22m[39m[36m[1mInfo: [22m[39m[73]	train-rmse:0.08283062967730669
[36m[1m[ [22m[39m[36m[1mInfo: [22m[39m[74]	train-rmse:0.08279009916673040
[36m[1m[ [22m[39m[36m[1mInfo: [22m[39m[75]	train-rmse:0.08271418675507410
[36m[1m[ [22m[39m[36m[1mInfo: [22m[39m[76]	train-rmse:0.08264936818167916
[36m[1m[ [22m[39m[36m[1mInfo: [22m[39m[77]	train-rmse:0.08262929211343362
[36m[1m[ [22m[39m[36m[1mInfo: [22m[39m[78]	train-rmse:0.08257081328845824
[36m[1m[ [22m[39m[36m[1mInfo: [22m[39m[79]	train-rmse:0.08252120886255739
[36m[1m[ [22m

Row,model,estimate,stderr,rmse_y,rmse_d
Unnamed: 0_level_1,String,Float64,Float64,Float64,Float64
1,GBM,0.961363,0.645554,6.9665,0.0848431


In [21]:
# Combinar todos los resultados en una tabla
pretty_table([table_ols; table_lasso; table_rf; table_en; table_knn; table_gbm])

┌────────────┬──────────┬──────────┬─────────┬───────────┐
│[1m      model [0m│[1m estimate [0m│[1m   stderr [0m│[1m  rmse_y [0m│[1m    rmse_d [0m│
│[90m     String [0m│[90m  Float64 [0m│[90m  Float64 [0m│[90m Float64 [0m│[90m   Float64 [0m│
├────────────┼──────────┼──────────┼─────────┼───────────┤
│        OLS │  1.57094 │ 0.556658 │  6.9873 │ 0.0986685 │
│      Lasso │  1.45535 │ 0.556109 │ 6.99038 │ 0.0988129 │
│         RF │  1.50536 │ 0.642645 │ 7.40085 │ 0.0905319 │
│ ElasticNet │ 0.187835 │ 0.523177 │ 7.07545 │  0.106333 │
│        KNN │ 0.742948 │ 0.581657 │ 7.63472 │  0.103197 │
│        GBM │ 0.961363 │ 0.645554 │  6.9665 │ 0.0848431 │
└────────────┴──────────┴──────────┴─────────┴───────────┘
