# Part 1: Heterogeneous treatment effects using causal trees and forests

For this part, we will be using experimental data for computing heterogeneous effects through causal trees and forests. For all exercises, the predictors X are all variables that are not the outcome Y or the treatment D.

1.1. Load the data (1 points). This is data for and experiment regarding the National Supported Work Demonstration (NSW) job-training program. You can find the data here, and read a description of the data here. For further details of the experiment and the program, you can use this link

In [49]:
using CSV
using DataFrames  

url = "https://raw.githubusercontent.com/d2cml-ai/CausalAI-Course/main/Labs/Assignment/Assignment_5/data/experimental/experimental_control.csv"
df = CSV.read(download(url), DataFrame)

first(df, 5)

Row,treat,age,educ,black,hisp,marr,nodegree,re74,re75,re78
Unnamed: 0_level_1,Int64,Int64,Int64,Int64,Int64,Int64,Int64,Float64,Float64,Float64
1,1,37,11,1,0,1,1,0.0,0.0,9930.05
2,1,22,9,0,1,0,1,0.0,0.0,3595.89
3,1,30,12,1,0,0,0,0.0,0.0,24909.4
4,1,27,11,1,0,0,1,0.0,0.0,7506.15
5,1,33,8,1,0,0,1,0.0,0.0,289.79


In [50]:
using Pkg
Pkg.add("ScientificTypes")  
using ScientificTypes  

[32m[1m   Resolving[22m[39m package versions...
[32m[1m  No Changes[22m[39m to `C:\Users\KARL\.julia\environments\v1.11\Project.toml`
[32m[1m  No Changes[22m[39m to `C:\Users\KARL\.julia\environments\v1.11\Manifest.toml`


In [51]:

summary_table = DataFrame(
    names = names(df),
    scitypes = scitype.(eachcol(df)),  
    types = eltype.(eachcol(df))       

)


Row,names,scitypes,types
Unnamed: 0_level_1,String,DataType,DataType
1,treat,AbstractVector{Count},Int64
2,age,AbstractVector{Count},Int64
3,educ,AbstractVector{Count},Int64
4,black,AbstractVector{Count},Int64
5,hisp,AbstractVector{Count},Int64
6,marr,AbstractVector{Count},Int64
7,nodegree,AbstractVector{Count},Int64
8,re74,AbstractVector{Continuous},Float64
9,re75,AbstractVector{Continuous},Float64
10,re78,AbstractVector{Continuous},Float64


1.2. Find the ATE (1.5 points). With re78 as the outcome variable of interest, find the Average Treatment Effect of participation in the program. Specifically, you should find it by calculating the difference between the means of the treatment group and the control group (the Simple Difference of Means or SDM). What can you say about the program?

In [52]:
# First method
using Statistics 

mean_treat = mean(df[df.treat .== 1, :re78])
mean_control = mean(df[df.treat .== 0, :re78])

# Calcular el ATE
ATE = mean_treat - mean_control

println("El Average Treatment Effect (ATE) es: $ATE")

El Average Treatment Effect (ATE) es: 1794.3423818501024


In [53]:
# Second method
using GLM

model = lm(@formula(re78 ~ treat), df)

StatsModels.TableRegressionModel{LinearModel{GLM.LmResp{Vector{Float64}}, GLM.DensePredChol{Float64, LinearAlgebra.CholeskyPivoted{Float64, Matrix{Float64}, Vector{Int64}}}}, Matrix{Float64}}

re78 ~ 1 + treat

Coefficients:
───────────────────────────────────────────────────────────────────────
               Coef.  Std. Error      t  Pr(>|t|)  Lower 95%  Upper 95%
───────────────────────────────────────────────────────────────────────
(Intercept)  4554.8      408.046  11.16    <1e-24   3752.85     5356.75
treat        1794.34     632.853   2.84    0.0048    550.574    3038.11
───────────────────────────────────────────────────────────────────────

The program had a positive impact on the participants' income, increasing their earnings by an average of 1794.34 monetary units compared to the control group. It can be concluded that the program is effective in improving participants' income.

1.3. Heterogeneous effects with causal trees (3 points). Use causal trees like we saw in class. For Python, you should use the econml package; for R, use the grf package; and for Julia, you will need to create the auxiliary variable 
Y∗ and fit a decision tree regressor. Report the splits the tree finds and interpret them.

Utiliza árboles causales como se explicó en clase. Para Python, deberías usar el paquete econml; para R, utiliza el paquete grf; y para Julia, necesitarás crear la variable auxiliar 𝑌∗Y ∗  y ajustar un árbol de decisión regresor. Reporta las divisiones que los árboles encuentran e interpreta los resultados.

In [54]:
using Pkg
Pkg.add("MLJ")
Pkg.add("MLJModels")
Pkg.add("RDatasets")
Pkg.add("MLJScikitLearnInterface")

[32m[1m   Resolving[22m[39m package versions...
[32m[1m  No Changes[22m[39m to `C:\Users\KARL\.julia\environments\v1.11\Project.toml`
[32m[1m  No Changes[22m[39m to `C:\Users\KARL\.julia\environments\v1.11\Manifest.toml`
[32m[1m   Resolving[22m[39m package versions...
[32m[1m  No Changes[22m[39m to `C:\Users\KARL\.julia\environments\v1.11\Project.toml`
[32m[1m  No Changes[22m[39m to `C:\Users\KARL\.julia\environments\v1.11\Manifest.toml`
[32m[1m   Resolving[22m[39m package versions...
[32m[1m  No Changes[22m[39m to `C:\Users\KARL\.julia\environments\v1.11\Project.toml`
[32m[1m  No Changes[22m[39m to `C:\Users\KARL\.julia\environments\v1.11\Manifest.toml`
[32m[1m   Resolving[22m[39m package versions...
[32m[1m  No Changes[22m[39m to `C:\Users\KARL\.julia\environments\v1.11\Project.toml`
[32m[1m  No Changes[22m[39m to `C:\Users\KARL\.julia\environments\v1.11\Manifest.toml`


In [56]:
using MLJ, MLJModels, RDatasets

coerce!(df, 
    :age => Continuous,  
    :educ => Continuous,
)

coerce!(df, Count => Multiclass)

y, X = unpack(df, ==(:re78), !=(:re78));
coerce!(X, Count => Multiclass)

D, X = unpack(X, ==(:treat), !=(:treat));

In [57]:
onehotencoder = @load OneHotEncoder pkg=MLJModels verbosity=0

ohe = onehotencoder(features = [:black,:hisp,:marr,:nodegree])
ohe_machine = machine(ohe, X)
fit!(ohe_machine);
X = MLJ.transform(ohe_machine, X);

[36m[1m[ [22m[39m[36m[1mInfo: [22m[39mTraining machine(OneHotEncoder(features = [:black, :hisp, :marr, :nodegree], …), …).
[36m[1m[ [22m[39m[36m[1mInfo: [22m[39mSpawning 2 sub-features to one-hot encode feature :black.
[36m[1m[ [22m[39m[36m[1mInfo: [22m[39mSpawning 2 sub-features to one-hot encode feature :hisp.
[36m[1m[ [22m[39m[36m[1mInfo: [22m[39mSpawning 2 sub-features to one-hot encode feature :marr.
[36m[1m[ [22m[39m[36m[1mInfo: [22m[39mSpawning 2 sub-features to one-hot encode feature :nodegree.


In [58]:
summary_table = DataFrame(
    names = names(df),
    scitypes = scitype.(eachcol(df)),  
    types = eltype.(eachcol(df))       

)


Row,names,scitypes,types
Unnamed: 0_level_1,String,DataType,DataType
1,treat,AbstractVector{Multiclass{2}},"CategoricalValue{Int64, UInt32}"
2,age,AbstractVector{Continuous},Float64
3,educ,AbstractVector{Continuous},Float64
4,black,AbstractVector{Multiclass{2}},"CategoricalValue{Int64, UInt32}"
5,hisp,AbstractVector{Multiclass{2}},"CategoricalValue{Int64, UInt32}"
6,marr,AbstractVector{Multiclass{2}},"CategoricalValue{Int64, UInt32}"
7,nodegree,AbstractVector{Multiclass{2}},"CategoricalValue{Int64, UInt32}"
8,re74,AbstractVector{Continuous},Float64
9,re75,AbstractVector{Continuous},Float64
10,re78,AbstractVector{Continuous},Float64


In [59]:
LogisticClassifier = @load LogisticClassifier pkg=MLJScikitLearnInterface verbosity=0

log_model = LogisticClassifier()

log_model_machine = machine(log_model, X, D)

fit!(log_model_machine)

[36m[1m[ [22m[39m[36m[1mInfo: [22m[39mTraining machine(LogisticClassifier(penalty = l2, …), …).
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(


trained Machine; caches model-specific representations of data
  model: LogisticClassifier(penalty = l2, …)
  args: 
    1:	Source @899 ⏎ Table{AbstractVector{Continuous}}
    2:	Source @403 ⏎ AbstractVector{Multiclass{2}}


In [61]:
coerce!(df, :treat => Count)

pscore = pdf.(MLJ.predict(log_model_machine, X),1)
y_star = df.re78./ (df.treat .* pscore .- (1 .- df.treat) .* (1 .- pscore));

In [63]:
using Pkg
Pkg.add("MLJDecisionTreeInterface")

[32m[1m   Resolving[22m[39m package versions...
[32m[1m   Installed[22m[39m ScikitLearnBase ────────── v0.5.0
[32m[1m   Installed[22m[39m AbstractTrees ──────────── v0.4.5
[32m[1m   Installed[22m[39m MLJDecisionTreeInterface ─ v0.4.2
[32m[1m   Installed[22m[39m DecisionTree ───────────── v0.12.4
[32m[1m    Updating[22m[39m `C:\Users\KARL\.julia\environments\v1.11\Project.toml`
  [90m[c6f25543] [39m[92m+ MLJDecisionTreeInterface v0.4.2[39m
[32m[1m    Updating[22m[39m `C:\Users\KARL\.julia\environments\v1.11\Manifest.toml`
  [90m[1520ce14] [39m[92m+ AbstractTrees v0.4.5[39m
  [90m[7806a523] [39m[92m+ DecisionTree v0.12.4[39m
  [90m[c6f25543] [39m[92m+ MLJDecisionTreeInterface v0.4.2[39m
  [90m[6e75b9c4] [39m[92m+ ScikitLearnBase v0.5.0[39m
[92m[1mPrecompiling[22m[39m project...
   3352.5 ms[32m  ✓ [39m[90mAbstractTrees[39m
   2211.5 ms[32m  ✓ [39m[90mScikitLearnBase[39m
   2068.8 ms[32m  ✓ [39m[90mDecisionTree[39m
   2818.0 

In [65]:
first(df, 5)

Row,treat,age,educ,black,hisp,marr,nodegree,re74,re75,re78
Unnamed: 0_level_1,Int64,Float64,Float64,Cat…,Cat…,Cat…,Cat…,Float64,Float64,Float64
1,2,37.0,11.0,1,0,1,1,0.0,0.0,9930.05
2,2,22.0,9.0,0,1,0,1,0.0,0.0,3595.89
3,2,30.0,12.0,1,0,0,0,0.0,0.0,24909.4
4,2,27.0,11.0,1,0,0,1,0.0,0.0,7506.15
5,2,33.0,8.0,1,0,0,1,0.0,0.0,289.79


In [67]:
X = select!(X, Not(:black__0, :hisp__0, :marr__0, :nodegree__0));

In [68]:
forest_model = RandomForestRegressor(n_trees = 100)
forest_machine = machine(forest_model, X, y_star)
fit!(forest_machine);

[36m[1m[ [22m[39m[36m[1mInfo: [22m[39mTraining machine(RandomForestRegressor(max_depth = -1, …), …).


In [69]:
feature_importances(forest_machine)

8-element Vector{Pair{Symbol, Float64}}:
         :age => 0.3108334995673283
        :re74 => 0.23504385459407331
        :re75 => 0.20351408601314336
        :educ => 0.13293173259217456
    :black__1 => 0.03938920370651372
     :marr__1 => 0.03859921629377362
     :hisp__1 => 0.023855838991200553
 :nodegree__1 => 0.015832568241792636

1.4. Heterogeneous effects with causal forests (3 points). Use causal forests like we saw in class. For Python, you should use the econml package; for R, use the grf package; and for Julia, you will need to use the auxiliary variable Y∗ computed in the previous exercise and fit a random forest regressor. Report the importance of the prediction variables.

Usa bosques causales como se explicó en clase. Para Python, deberías usar el paquete econml; para R, utiliza el paquete grf; y para Julia, necesitarás usar la variable auxiliar 𝑌∗Y ∗  calculada en el ejercicio anterior y ajustar un regresor aleatorio de bosques. Reporta la importancia de las variables de predicción.

1.5. Plot heterogeneous effects (1.5 points). Plot how the predicted treatment effect changes depending on a variable of your choice. (You can see the last example in PD11 for clarification of what you should do in this exercise)

Traza cómo cambia el efecto predicho del tratamiento dependiendo de una variable de tu elección. (Puedes ver el último ejemplo en PD11 para una aclaración de lo que debes hacer en este ejercicio).