# Question 2 - DML (Debiased Machine Learning)
## Julia Implementation

**Compatible con Flux v0.16.5**

**Todos los métodos de R incluidos**:
- ✅ OLS + Logistic Regression
- ✅ Lasso
- ✅ Random Forest
- ✅ Neural Network (con API nueva de Flux 0.16.5)

## 0) Setup e imports

In [1]:
using DataFrames, CSV, Downloads
using Statistics, Random, LinearAlgebra
using GLM          # For OLS and Logistic Regression
using GLMNet       # For Lasso
using DecisionTree # For Random Forest
using Flux         # For Neural Networks
using Distributions # For Normal distribution (only for p-values)
using Printf

Random.seed!(12345)
println("✓ Packages loaded successfully!")
#println("✓ Flux version: ", Flux.version)

✓ Packages loaded successfully!


## 1) Load and Clean Data

In [2]:
# Download the data
url = "https://raw.githubusercontent.com/CausalAIBook/MetricsMLNotebooks/main/data/penn_jae.dat"
data_path = Downloads.download(url)
DT = CSV.read(data_path, DataFrame, delim=' ', ignorerepeated=true)

# Normalize column names to lowercase
rename!(DT, [Symbol(lowercase(string(n))) => n for n in names(DT)])

# Filter tg == 0 or tg == 4
DT = DT[in.(DT.tg, Ref([0, 4])), :]

# Create treatment variable
DT.T4 = Int.(DT.tg .== 4)

# Create outcome variable
DT.y = log.(DT.inuidur1)

# Create dep dummies
DT.dep = Int.(DT.dep)
DT.dep_0 = Int.(DT.dep .== 0)
DT.dep_1 = Int.(DT.dep .== 1)
DT.dep_2 = Int.(DT.dep .== 2)

# Handle age variables
if !hasproperty(DT, :agelt35) && hasproperty(DT, :age)
    DT.agelt35 = Int.(DT.age .< 35)
    DT.agegt54 = Int.(DT.age .> 54)
end

# Define X variables
x_vars = [:female, :black, :othrace, 
          :dep_1, :dep_2,
          :q2, :q3, :q4, :q5, :q6,
          :recall, :agelt35, :agegt54,
          :durable, :nondurable, :lusd, :husd]

# Select columns and remove missing
use_cols = vcat([:y, :T4], x_vars)
DT = DT[:, use_cols]
DT = dropmissing(DT)

# Extract vectors and matrix
y = DT.y
d = DT.T4
X = Matrix{Float64}(DT[:, x_vars])
n, p = size(X)

println("✓ Final sample: $n rows, $p predictors")

✓ Final sample: 5099 rows, 17 predictors


## 2) Utility Functions

In [3]:
"""Calculate Root Mean Squared Error"""
rmse(a::Vector, b::Vector) = sqrt(mean((a .- b).^2))

"""Calculate theta and standard error for PLM"""
function plm_theta_se(y_tilde::Vector, d_tilde::Vector)
    theta = sum(d_tilde .* y_tilde) / sum(d_tilde .* d_tilde)
    psi = (y_tilde .- d_tilde .* theta) .* d_tilde
    se = sqrt(mean(psi.^2) / (length(y_tilde) * mean(d_tilde.^2)^2))
    return (theta=theta, se=se)
end

println("✓ Utility functions defined")

✓ Utility functions defined


## 3) Learners

### 3.1) OLS and Logistic Regression

In [4]:
# --- OLS and Logistic Regression ---
function fit_y_ols(X::Matrix, y::Vector)
    n_features = size(X, 2)
    col_names = [Symbol("x$i") for i in 1:n_features]
    df = DataFrame(X, col_names)
    df.y = y
    formula_str = "y ~ " * join(string.(col_names), " + ")
    return lm(eval(Meta.parse("@formula($formula_str)")), df)
end

function pred_y_ols(fit, X::Matrix)
    n_features = size(X, 2)
    col_names = [Symbol("x$i") for i in 1:n_features]
    df = DataFrame(X, col_names)
    return GLM.predict(fit, df)
end

function fit_d_logit(X::Matrix, d::Vector)
    n_features = size(X, 2)
    col_names = [Symbol("x$i") for i in 1:n_features]
    df = DataFrame(X, col_names)
    df.d = d
    formula_str = "d ~ " * join(string.(col_names), " + ")
    return glm(eval(Meta.parse("@formula($formula_str)")), df, Binomial(), LogitLink())
end

function pred_d_logit(fit, X::Matrix)
    n_features = size(X, 2)
    col_names = [Symbol("x$i") for i in 1:n_features]
    df = DataFrame(X, col_names)
    return GLM.predict(fit, df)
end

println("✓ OLS and Logistic functions defined")

✓ OLS and Logistic functions defined


### 3.2) Lasso

In [5]:
# --- Lasso ---
function fit_y_lasso(X::Matrix, y::Vector)
    return glmnetcv(X, y, alpha=1.0)
end

function pred_y_lasso(fit, X::Matrix)
    return vec(GLMNet.predict(fit, X))
end

function fit_d_lasso(X::Matrix, d::Vector)
    return glmnetcv(X, Float64.(d), alpha=1.0)
end

function pred_d_lasso(fit, X::Matrix)
    preds = GLMNet.predict(fit, X)
    return clamp.(vec(preds), 0.0, 1.0)
end

println("✓ Lasso functions defined")

✓ Lasso functions defined


### 3.3) Random Forest

In [14]:
# --- Random Forest ---
function fit_y_rf(X::Matrix, y::Vector)
    return build_forest(y, X, 
                       n_trees=1000,
                       max_depth=-1,
                       min_samples_leaf=5,
                       rng=Random.MersenneTwister(1))
end

function pred_y_rf(fit, X::Matrix)
    return apply_forest(fit, X)
end

function pred_d_rf(fit, X::Matrix{Float64})
    # Para clasificación binaria con etiquetas "0" y "1"
    probs = apply_forest_proba(fit, X, ["0", "1"])  # n × 2
    # columna 2 = probabilidad de clase "1"
    return vec(probs[:, 2])
end


function pred_d_rf(fit, X::Matrix)
    preds = apply_forest_proba(fit, X, ["0", "1"])
    return preds[:, 2]
end

println("✓ Random Forest functions defined")

✓ Random Forest functions defined


In [None]:
# --- Random Forest (compatible con DecisionTree.jl actual) ---
# using DecisionTree

function fit_y_rf(X::Matrix{Float64}, y::Vector{Float64})
    n_features = size(X, 2)
    # número de variables por split (típico: sqrt(p))
    n_subfeatures       = max(1, round(Int, sqrt(n_features)))
    n_trees             = 1000
    partial_sampling    = 0.7      # proporción de filas en cada árbol
    max_depth           = -1       # sin límite
    min_samples_leaf    = 5
    min_samples_split   = 2
    min_purity_increase = 0.0
    rng                 = Random.MersenneTwister(1)

    return build_forest(
        y, X,
        n_subfeatures,
        n_trees,
        partial_sampling,
        max_depth,
        min_samples_leaf,
        min_samples_split,
        min_purity_increase;
        rng = rng,
    )
end

function pred_y_rf(fit, X::Matrix{Float64})
    return apply_forest(fit, X)
end

function fit_d_rf(X::Matrix{Float64}, d::Vector)
    # clasificación: etiquetas como strings
    d_labels = string.(d)

    n_features = size(X, 2)
    n_subfeatures       = max(1, round(Int, sqrt(n_features)))
    n_trees             = 1000
    partial_sampling    = 0.7
    max_depth           = -1
    min_samples_leaf    = 5
    min_samples_split   = 2
    min_purity_increase = 0.0
    rng                 = Random.MersenneTwister(1)

    return build_forest(
        d_labels, X,
        n_subfeatures,
        n_trees,
        partial_sampling,
        max_depth,
        min_samples_leaf,
        min_samples_split,
        min_purity_increase;
        rng = rng,
    )
end

function pred_d_rf(fit, X::Matrix{Float64})
    # Devuelve matriz n×K con probabilidades por clase.
    # Para etiquetas "0","1", la columna 2 es P(d=1|X).
    probs = apply_forest_proba(fit, X)
    return vec(probs[:, 2])
end


pred_d_rf (generic function with 2 methods)

In [23]:
using DecisionTree
using Random

# ------------------------------------------------------------------
# 1) Wrapper de compatibilidad para apply_forest_proba (2 argumentos)
#    -> lo convertimos internamente en la llamada correcta de 3 args
# ------------------------------------------------------------------
import DecisionTree: apply_forest_proba, Ensemble

function apply_forest_proba(forest::Ensemble{S,T},
                            X::AbstractMatrix{S}) where {S<:AbstractFloat, T}
    # En este trabajo d es binaria: 0/1 => usamos etiquetas "0","1"
    return DecisionTree.apply_forest_proba(forest, X, ["0", "1"])
end

# ------------------------------------------------------------------
# 2) Random Forest para y (regresión)
# ------------------------------------------------------------------
function fit_y_rf(X::AbstractMatrix, y::AbstractVector)
    Xf = Array{Float64}(X)
    yf = Array{Float64}(y)

    n_subfeatures       = -1      # usar default (sqrt(#features))
    n_trees             = 1000
    partial_sampling    = 0.7
    max_depth           = -1      # sin límite
    min_samples_leaf    = 5
    min_samples_split   = 2
    min_purity_increase = 0.0
    seed                = 1

    return build_forest(
        yf, Xf,
        n_subfeatures,
        n_trees,
        partial_sampling,
        max_depth,
        min_samples_leaf,
        min_samples_split,
        min_purity_increase;
        rng = seed,
    )
end

function pred_y_rf(fit, X::AbstractMatrix)
    Xf = Array{Float64}(X)
    return apply_forest(fit, Xf)
end

# ------------------------------------------------------------------
# 3) Random Forest para d (clasificación binaria)
# ------------------------------------------------------------------
function fit_d_rf(X::AbstractMatrix, d::AbstractVector)
    Xf       = Array{Float64}(X)
    d_labels = string.(d)         # "0","1"

    n_subfeatures       = -1
    n_trees             = 1000
    partial_sampling    = 0.7
    max_depth           = -1
    min_samples_leaf    = 5
    min_samples_split   = 2
    min_purity_increase = 0.0
    seed                = 1

    return build_forest(
        d_labels, Xf,
        n_subfeatures,
        n_trees,
        partial_sampling,
        max_depth,
        min_samples_leaf,
        min_samples_split,
        min_purity_increase;
        rng = seed,
    )
end

function pred_d_rf(fit, X::AbstractMatrix)
    Xf = Array{Float64}(X)

    # OJO: aquí usamos el wrapper de 2 args que acabamos de definir arriba
    probs = apply_forest_proba(fit, Xf)   # n × 2, columnas: "0","1"

    # Probabilidad de d=1 (columna asociada a etiqueta "1")
    return vec(probs[:, 2])
end


pred_d_rf (generic function with 3 methods)

### 3.4) Neural Network - ✅ FLUX 0.16.5 API

**Compatible con Flux 0.16.5** - Usa la API completamente nueva y moderna

In [16]:
# --- Neural Network - API FLUX 0.16.5 ---
function fit_y_nn(X::Matrix, y::Vector; epochs=100, hidden_size=32)
    X_t = Float32.(X')
    y_t = Float32.(reshape(y, 1, :))
    
    n_features = size(X, 2)
    model = Chain(
        Dense(n_features => hidden_size, relu),
        Dense(hidden_size => hidden_size, relu),
        Dense(hidden_size => 1)
    )
    
    # ✅ Flux 0.16.5: Setup optimizer state
    opt_state = Flux.setup(Adam(0.001), model)
    
    # ✅ Flux 0.16.5: Training loop sin warnings
    for epoch in 1:epochs
        loss, grads = Flux.withgradient(model) do m
            ŷ = m(X_t)
            Flux.mse(ŷ, y_t)
        end
        
        Flux.update!(opt_state, model, grads[1])
    end
    
    return model
end

function pred_y_nn(fit, X::Matrix)
    X_t = Float32.(X')
    return vec(fit(X_t))
end

function fit_d_nn(X::Matrix, d::Vector; epochs=100, hidden_size=32)
    X_t = Float32.(X')
    d_t = Float32.(reshape(d, 1, :))
    
    n_features = size(X, 2)
    model = Chain(
        Dense(n_features => hidden_size, relu),
        Dense(hidden_size => hidden_size, relu),
        Dense(hidden_size => 1, σ)
    )
    
    # ✅ Flux 0.16.5: Setup optimizer state
    opt_state = Flux.setup(Adam(0.001), model)
    
    # ✅ Flux 0.16.5: Training loop
    for epoch in 1:epochs
        loss, grads = Flux.withgradient(model) do m
            ŷ = m(X_t)
            Flux.Losses.binarycrossentropy(ŷ, d_t)
        end
        
        Flux.update!(opt_state, model, grads[1])
    end
    
    return model
end

function pred_d_nn(fit, X::Matrix)
    X_t = Float32.(X')
    return vec(fit(X_t))
end

println("✓ Neural Network functions defined (Flux 0.16.5 API)")

✓ Neural Network functions defined (Flux 0.16.5 API)


## 4) Learner Dictionary

In [24]:
learners = Dict(
    "OLS+LOGIT" => (
        ml_y = (fit = fit_y_ols,   pred = pred_y_ols),
        ml_d = (fit = fit_d_logit, pred = pred_d_logit)
    ),
    "LASSO" => (
        ml_y = (fit = fit_y_lasso, pred = pred_y_lasso),
        ml_d = (fit = fit_d_lasso, pred = pred_d_lasso)
    ),
    "RF" => (
        ml_y = (fit = fit_y_rf, pred = pred_y_rf),
        ml_d = (fit = fit_d_rf, pred = pred_d_rf)
    ),
    "NN" => (
        ml_y = (fit = fit_y_nn, pred = pred_y_nn),
        ml_d = (fit = fit_d_nn, pred = pred_d_nn)
    )
)

println("✓ Learners dictionary created (4 methods - igual que R)")

✓ Learners dictionary created (4 methods - igual que R)


## 5) DML Functions

### 5.1) DML with Cross-Fitting

In [18]:
function dml_plm(y::Vector, d::Vector, X::Matrix, K::Int;
                 ml_y, ml_d, return_nuisance_rmse::Bool=true)
    
    n = length(y)
    indices = shuffle(1:n)
    fold_size = div(n, K)
    
    y_tilde = zeros(n)
    d_tilde = zeros(n)
    y_pred_all = zeros(n)
    d_pred_all = zeros(n)
    
    for k in 1:K
        test_start = (k-1) * fold_size + 1
        test_end = k == K ? n : k * fold_size
        
        test_idx = indices[test_start:test_end]
        train_idx = setdiff(indices, test_idx)
        
        X_train, X_test = X[train_idx, :], X[test_idx, :]
        y_train, y_test = y[train_idx], y[test_idx]
        d_train, d_test = d[train_idx], d[test_idx]
        
        # Fit and predict for y
        fit_y = ml_y.fit(X_train, y_train)
        y_pred = ml_y.pred(fit_y, X_test)
        
        # Fit and predict for d
        fit_d = ml_d.fit(X_train, d_train)
        d_pred = ml_d.pred(fit_d, X_test)
        
        y_tilde[test_idx] = y_test .- y_pred
        d_tilde[test_idx] = d_test .- d_pred
        
        if return_nuisance_rmse
            y_pred_all[test_idx] = y_pred
            d_pred_all[test_idx] = d_pred
        end
    end
    
    est = plm_theta_se(y_tilde, d_tilde)
    
    if return_nuisance_rmse
        rmse_y = rmse(y, y_pred_all)
        rmse_d = rmse(Float64.(d), d_pred_all)
        return (theta=est.theta, se=est.se, rmse_y=rmse_y, rmse_d=rmse_d)
    else
        return est
    end
end

println("✓ DML with cross-fitting defined")

✓ DML with cross-fitting defined


### 5.2) DML without Cross-Fitting

In [19]:
function dml_plm_no_cf(y::Vector, d::Vector, X::Matrix, K::Int;
                       ml_y, ml_d, return_nuisance_rmse::Bool=true)
    
    # Train on full data
    fit_y = ml_y.fit(X, y)
    fit_d = ml_d.fit(X, d)
    
    # Predict on full data (in-sample)
    y_pred = ml_y.pred(fit_y, X)
    d_pred = ml_d.pred(fit_d, X)
    
    # Residualize
    y_tilde = y .- y_pred
    d_tilde = d .- d_pred
    
    est = plm_theta_se(y_tilde, d_tilde)
    
    if return_nuisance_rmse
        rmse_y = rmse(y, y_pred)
        rmse_d = rmse(Float64.(d), d_pred)
        return (theta=est.theta, se=est.se, rmse_y=rmse_y, rmse_d=rmse_d)
    else
        return est
    end
end

println("✓ DML without cross-fitting defined")

✓ DML without cross-fitting defined


## 6) Run Both Methods

In [12]:
function run_block(fun, y::Vector, d::Vector, X::Matrix, K::Int, 
                   learners::Dict)
    results = DataFrame(
        Method = String[],
        theta = Float64[],
        se = Float64[],
        pval = Float64[],
        rmse_y = Float64[],
        rmse_d = Float64[]
    )
    
    for (name, ml) in learners
        println("  Running $name...")
        Random.seed!(42)
        
        est = fun(y, d, X, K; ml_y=ml.ml_y, ml_d=ml.ml_d)
        
        pval = 2 * cdf(Normal(0, 1), -abs(est.theta / est.se))
        
        push!(results, (
            Method = name,
            theta = est.theta,
            se = est.se,
            pval = pval,
            rmse_y = est.rmse_y,
            rmse_d = est.rmse_d
        ))
    end
    
    return results
end

println("✓ Execution functions defined")

✓ Execution functions defined


In [25]:
println("="^70)
println("Running DML with Cross-Fitting...")
println("="^70)

K = 2
tab_cf = run_block(dml_plm, y, d, X, K, learners)
tab_cf.CrossFitting .= "Yes"

println("\n✓ Cross-fitting completed!")
tab_cf

Running DML with Cross-Fitting...
  Running LASSO...


  Running NN...
  Running OLS+LOGIT...
  Running RF...

✓ Cross-fitting completed!


Row,Method,theta,se,pval,rmse_y,rmse_d,CrossFitting
Unnamed: 0_level_1,String,Float64,Float64,Float64,Float64,Float64,String
1,LASSO,-0.0706055,0.03534,0.0457281,1.20175,0.475281,Yes
2,NN,-0.0831491,0.0370581,0.0248484,1.26961,0.479523,Yes
3,OLS+LOGIT,-0.0657776,0.0350882,0.0608426,1.20197,0.478827,Yes
4,RF,-0.0227687,0.0331474,0.492151,1.21583,0.514819,Yes


In [26]:
println("="^70)
println("Running DML WITHOUT Cross-Fitting...")
println("="^70)

tab_nocf = run_block(dml_plm_no_cf, y, d, X, K, learners)
tab_nocf.CrossFitting .= "No"

println("\n✓ No cross-fitting completed!")
tab_nocf

Running DML WITHOUT Cross-Fitting...
  Running LASSO...
  Running NN...
  Running OLS+LOGIT...
  Running RF...

✓ No cross-fitting completed!


Row,Method,theta,se,pval,rmse_y,rmse_d,CrossFitting
Unnamed: 0_level_1,String,Float64,Float64,Float64,Float64,Float64,String
1,LASSO,-0.0726661,0.0351212,0.0385449,1.19057,0.473694,No
2,NN,-0.0876378,0.0368471,0.017387,1.24319,0.472044,No
3,OLS+LOGIT,-0.0726457,0.0351153,0.0385672,1.19047,0.473362,No
4,RF,-0.0684385,0.0331128,0.0387504,1.13257,0.481151,No


In [27]:
# Combine results
results_all = vcat(tab_cf, tab_nocf)
sort!(results_all, [:CrossFitting, :Method])

println("="^70)
println("ALL RESULTS")
println("="^70)
results_all

ALL RESULTS


Row,Method,theta,se,pval,rmse_y,rmse_d,CrossFitting
Unnamed: 0_level_1,String,Float64,Float64,Float64,Float64,Float64,String
1,LASSO,-0.0726661,0.0351212,0.0385449,1.19057,0.473694,No
2,NN,-0.0876378,0.0368471,0.017387,1.24319,0.472044,No
3,OLS+LOGIT,-0.0726457,0.0351153,0.0385672,1.19047,0.473362,No
4,RF,-0.0684385,0.0331128,0.0387504,1.13257,0.481151,No
5,LASSO,-0.0706055,0.03534,0.0457281,1.20175,0.475281,Yes
6,NN,-0.0831491,0.0370581,0.0248484,1.26961,0.479523,Yes
7,OLS+LOGIT,-0.0657776,0.0350882,0.0608426,1.20197,0.478827,Yes
8,RF,-0.0227687,0.0331474,0.492151,1.21583,0.514819,Yes


## 7) OLS with Controls as Benchmark

In [34]:
import Pkg; Pkg.add("StatsModels")

[32m[1m   Resolving[22m[39m package versions...
[32m[1m    Updating[22m[39m `C:\Users\ASUS\.julia\environments\v1.11\Project.toml`
  [90m[3eaba693] [39m[92m+ StatsModels v0.7.7[39m
[32m[1m  No Changes[22m[39m to `C:\Users\ASUS\.julia\environments\v1.11\Manifest.toml`


In [35]:
using StatsBase
using GLM
using StatsModels



In [37]:
println("="^70)
println("Running OLS with Controls...")
println("="^70)

# Si quieres seguir teniendo el DataFrame para inspección:
df_full = DataFrame(X, x_vars)
df_full.y = y
df_full.d = d

n, p = size(X)

# ---------------- OLS: y ~ d + X (intercepto + d + controles) ----------------
X_ols = hcat(ones(n), d, X)   # columnas: 1=intercepto, 2=d, 3..=controles

ols_full = lm(X_ols, y)       # interfaz matricial de GLM

ols_coef_table = GLM.coeftable(ols_full)
d_idx = 2                     # segunda columna es d

theta_ols_controls = GLM.coef(ols_full)[d_idx]
se_ols_controls    = GLM.stderror(ols_full)[d_idx]
pval_ols_controls  = ols_coef_table.cols[4][d_idx]

y_pred_ols = GLM.predict(ols_full, X_ols)
rmse_y_ols = rmse(y, y_pred_ols)

# ---------------- LOGIT: d ~ X (intercepto + controles X) ----------------
X_logit = hcat(ones(n), X)    # aquí NO incluimos d como regressor

logit_d = glm(X_logit, d, Binomial(), LogitLink())
d_pred_logit = GLM.predict(logit_d, X_logit)
rmse_d_ols = rmse(Float64.(d), d_pred_logit)

# ---------------- Agregar fila a results_all ----------------
ols_row = DataFrame(
    Method       = ["OLS with controls"],
    theta        = [theta_ols_controls],
    se           = [se_ols_controls],
    pval         = [pval_ols_controls],
    rmse_y       = [rmse_y_ols],
    rmse_d       = [rmse_d_ols],
    CrossFitting = ["N/A"],
)

results_all = vcat(results_all, ols_row)
sort!(results_all, [:CrossFitting, :Method])

println("✓ OLS with controls completed!")


Running OLS with Controls...
✓ OLS with controls completed!


## 8) Model Selection

In [38]:
println("="^70)
println("Model Selection...")
println("="^70)

tab_cf_sorted = sort(tab_cf, :se)
best_cf = tab_cf_sorted[1, :]

println("\n✓ Best model (smallest SE with cross-fitting):")
println("  Method: ", best_cf.Method)
@printf("  theta=%.4f, se=%.4f, pval=%.4g\n", 
        best_cf.theta, best_cf.se, best_cf.pval)

best_cf

Model Selection...

✓ Best model (smallest SE with cross-fitting):
  Method: RF
  theta=-0.0228, se=0.0331, pval=0.4922


Row,Method,theta,se,pval,rmse_y,rmse_d,CrossFitting
Unnamed: 0_level_1,String,Float64,Float64,Float64,Float64,Float64,String
1,RF,-0.0227687,0.0331474,0.492151,1.21583,0.514819,Yes


## 9) Print Readable Tables

In [39]:
function print_table(tab::DataFrame, title::String)
    println("\n $title ")
    println("-"^70)
    
    tab_display = select(tab, 
        :CrossFitting, :Method,
        :theta => (x -> round.(x, digits=4)) => :theta,
        :se => (x -> round.(x, digits=4)) => :se,
        :pval => (x -> round.(x, sigdigits=3)) => :pval,
        :rmse_y => (x -> round.(x, digits=4)) => :rmse_y,
        :rmse_d => (x -> round.(x, digits=4)) => :rmse_d
    )
    
    println(tab_display)
    println("-"^70)
end

print_table(filter(row -> row.CrossFitting == "Yes", results_all), 
            "Table A. DML con cross-fitting")

print_table(filter(row -> row.CrossFitting == "No", results_all), 
            "Table B. DML sin cross-fitting")

print_table(results_all, 
            "Appendix. Todos los modelos (incluye OLS con controles)")


 Table A. DML con cross-fitting 
----------------------------------------------------------------------
[1m4×7 DataFrame[0m
[1m Row [0m│[1m CrossFitting [0m[1m Method    [0m[1m theta   [0m[1m se      [0m[1m pval    [0m[1m rmse_y  [0m[1m rmse_d  [0m
     │[90m String       [0m[90m String    [0m[90m Float64 [0m[90m Float64 [0m[90m Float64 [0m[90m Float64 [0m[90m Float64 [0m
─────┼──────────────────────────────────────────────────────────────────────
   1 │ Yes           LASSO      -0.0706   0.0353   0.0457   1.2017   0.4753
   2 │ Yes           NN         -0.0831   0.0371   0.0248   1.2696   0.4795
   3 │ Yes           OLS+LOGIT  -0.0658   0.0351   0.0608   1.202    0.4788
   4 │ Yes           RF         -0.0228   0.0331   0.492    1.2158   0.5148
----------------------------------------------------------------------

 Table B. DML sin cross-fitting 
----------------------------------------------------------------------
[1m4×7 DataFrame[0m
[1m Row [0m

## 10) Answers

### PLM and DML
We estimate the partially linear model:
$$y = \theta d + g_0(X) + \varepsilon, \quad d = m_0(X) + \nu$$

DML uses cross-fitting to build out-of-sample residuals:
$$\tilde{y} = y - \hat{g}(X), \; \tilde{d} = d - \hat{m}(X)$$

and estimates:
$$\hat{\theta} = \frac{\sum_i \tilde{d}_i \tilde{y}_i}{\sum_i \tilde{d}_i^2}$$

with IF-based standard errors.

### Cross-fitting vs no cross-fitting
- RMSE for predicting $y$ and $d$ is usually **smaller** without cross-fitting due to in-sample optimism.
- Lower RMSE there does **not** mean better causal inference; it reflects **overfitting** of nuisances.
- Sin cross-fitting, el sesgo de regularización se filtra al estimando y genera **sesgo** y **inferencias no conservadoras**.

### Selected model
Choose the CF method with the smallest SE in Table A and report its $\hat{\theta}$ as the final effect.