### Double Machine Learning (DML)

DML is an algorithm that applies machine learning methods to fit the treatment and response, then uses a linear model to predict the response residuals from the treatment residuals.

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/WinterSchool2026/ch09-causal-inference-extremes/blob/main/notebooks/04_causal_models.ipynb)

In [None]:
# Upgrade pip first for better dependency resolution
!pip install -U pip

# Install packages, ensuring numpy is at a version compatible with most 2024-2025 builds
!pip install -q pycaret econml numba xarray zarr fsspec aiohttp geopandas dask netcdf4 h5netcdf "numpy<2.0"

In [None]:
import numpy as np
import os
import pandas as pd
import geopandas as gpd
import matplotlib.pyplot as plt
import matplotlib as mpl
import seaborn as sns
import sys
import os
from google.colab import drive

from econml.dml import LinearDML, CausalForestDML

%matplotlib inline
import warnings
warnings.filterwarnings("ignore")

In [None]:
# 1. Mount drive if you haven't already
drive.mount('/content/drive')

In [None]:
# 2. Append the PARENT directory (notebooks), not the utils folder itself
path_to_parent = '/content/drive/MyDrive/09_challenge_EllisWinterSchool'
if path_to_parent not in sys.path:
    sys.path.append(path_to_parent)

# 3. Now Python sees 'utils' as a package inside 'notebooks'
import utils.utils
from utils.utils import *

print("✅ Success! Functions imported.")

Load sample data (trimmed)

In [None]:
samples = pd.read_csv("/content/drive/MyDrive/09_challenge_EllisWinterSchool/df_ps_trimmed.csv")

print(f"Shape of sampled data: {samples.shape}")

Set the variables: Outcome (target), Treatment, Heterogeneity (zones), Confounders.

In [None]:
target = ['DI_agri_extreme_M7']

treatment = ['SMA_2']

zones = ['basin_lv2']

vars_list = ['E_gleam_ds','S_gleam_ds','H_gleam_ds',
            'pev_ds','sro_ds','sp_ds','tp_ds','d2m_ds',
            'agri_irri', 'agri_mix', 'agri_rain',
            'soil_clay', 'soil_oc', 'soil_roots','soil_sand', 'soil_tawc',
            'lst_night_ds','ndvi_ds','ndwi_ds',
            'pop','road','hand','lc2','lc3','lc5','lc8',
            'censo','soi_long','pdo_timeseries_sstens','noaa_globaltmp_comb']

Encode Xi (the heterogeneity features) - Create a one-hot ecoder (dummy variable)

In [None]:
zones_encoded = encode_categorical_raster(samples[zones[0]], prefix='zone')
samples_zones = samples.join(zones_encoded)
zone_vars = [v for v in samples_zones.columns if v.startswith(('zone'))]
samples_zones.head()

### Show proportion of true/negatives for treatment/outcome

In [None]:
# Generate the confusion matrix using crosstab
# Rows = Treatment, Columns = Target
ct = pd.crosstab(samples_zones[treatment[0]], samples_zones[target[0]])
    
# Extract values using .loc[row, col]
NN = ct.loc[0, 0] # Treatment 0, Outcome 0
NP = ct.loc[1, 0] # Treatment 1, Outcome 0 (T=1, O=0)
PN = ct.loc[0, 1] # Treatment 0, Outcome 1 (T=0, O=1)
PP = ct.loc[1, 1] # Treatment 1, Outcome 1 (T=1, O=1)

results = []

results.append({
    'Treatment': treatment,
    '(T=0, O=0)': NN,
    '(T=0, O=1)': PN,
    '(T=1, O=0)': NP,
    '(T=1, O=1)': PP,
    'Tot_P_Treat': NP + PP,
    'Tot_P_Out': PN + PP,
})

# 3. Display the summary dataframe
summary_df = pd.DataFrame(results)
print("Outcome:")
print(samples_zones[target[0]].value_counts())
print("\nConfusion Matrix:")
display(summary_df)

#### Visualize Treatment and Outcome for each Heterogenous zone:

In [None]:
# 1. Initialize a list to store results for all zones
target_col = target[0] if isinstance(target, list) else target
all_zone_results = []

# 2. Loop through each zone and calculate the confusion matrix
for zone in zone_vars:
    # Generate crosstab: Row = Zone Presence (0 or 1), Column = Outcome (0 or 1)
    ct = pd.crosstab(samples_zones[zone], samples_zones[target_col])
    
    # Ensure all quadrants exist (2x2)
    ct = ct.reindex(index=[0, 1], columns=[0, 1], fill_value=0)
    
    # Append the results for this specific zone
    all_zone_results.append({
        'Zone': zone,
        '(T=0, O=0)': ct.loc[0, 0],  # Outside this zone, no outcome
        '(T=0, O=1)': ct.loc[0, 1],  # Outside this zone, outcome exists
        '(T=1, O=0)': ct.loc[1, 0],  # Inside this zone, no outcome
        '(T=1, O=1)': ct.loc[1, 1]   # Inside this zone, outcome exists
    })

# 3. Create the consolidated summary DataFrame
summary_df = pd.DataFrame(all_zone_results).set_index('Zone')

# 4. Define colors
# Blue/Light Blue for Negatives, Orange/Gold for Positives
colors = ['#aec7e8', '#1f77b4', '#ebb95e', '#d18b00']

# 5. Plot everything in one figure
fig, ax = plt.subplots(figsize=(9, 5), dpi=100)

summary_df[['(T=0, O=0)', '(T=0, O=1)', '(T=1, O=0)', '(T=1, O=1)']].plot(
    kind='bar', 
    stacked=True, 
    color=colors, 
    ax=ax, 
    edgecolor='white',
    width=0.8
)

# Formatting
plt.title('Data Imbalance across Regions (Zones)', fontsize=12, fontweight='bold')
plt.ylabel('Number of Observations', fontsize=12)
plt.xlabel('Zone (heterogeneity)', fontsize=12)
plt.legend(title='(Treatment,Outcome)', bbox_to_anchor=(1.02, 1), loc='upper left')
plt.xticks(rotation=45, ha='right')
plt.grid(axis='y', linestyle='--', alpha=0.3)
plt.tight_layout()

plt.show()

#### Visualize spatial bias (T/O) to check if we have enough representation

In [None]:
plot_spatial_bias(samples_zones, treatment[0], target[0])

---

## Double Machine Learning (EconML)

Double Machine Learning (DML) is an algorithm that applies machine learning methods to fit the treatment and response, then uses a model to predict the response residuals from the treatment residuals.

The EconML SDK implements the following DML classes:

- **LinearDML**: suitable for estimating heterogeneous treatment effects.

- **CausalForestDML**: ML algorithm used to estimate heterogeneous treatment effects (Conditional Average Treatment Effect, CATE). By combining Double Machine Learning (DML) with Causal Forests. It is a popular estimator for causal inference, designed to identify how a treatment effect varies across different segments of a population based on individual characteristics, even in high-dimensional settings. 

$Y$ – Outcomes for each sample

$T$ – Treatments for each sample

$X$ – Features for each sample (Regions, heterogeneity). To find variation (CATE). These are the variables where you suspect the treatment effect might be different. Encoded one-hot

$W$ – Controls for each sample (Confounders). Removes bias. These are variables that affect both the treatment ($T$) and the outcome ($Y$). We include them to ensure we are comparing "apples to apples." However, we do not care if the treatment effect changes based on these variables.

The confounders ($W$) in Double ML, are used in the "first stage" to clean the data (residualization).

Once the data is cleaned, the models only looks at $X$ to see how the effect changes.

The model does not calculate "causal importance" for confounders because, by definition, we are "controlling" for them, not measuring their treatment effect.

#### Define your data structure

In [None]:
# Define your variables
Y = samples_zones[target].values.ravel()
T = samples_zones[treatment].values.ravel()
W = samples_zones[vars_list]
X = samples_zones[zone_vars]

Y=Y.squeeze()
T=T.squeeze()


print(f"Datasets shapes: Y={Y.shape}, T={T.shape}, W={W.shape}, X={X.shape},")


#### Train estimator using LinearDML

In [None]:
## TODO: train your model here without the heterogeneity variables (X)

In the context of causal inference, an Average Treatment Effect (ATE) represents the average difference in the outcome ($Y$) caused by the treatment ($T$), across the sampled population.

The ATE tells you what would happen if you took your entire group and forced everyone to take the treatment, versus what would happen if you forced everyone to take the control.

Mathematically: $ATE = E[Y(1) - Y(0)]

On average, the treatment increases the outcome by $ATE$ units compared to not having the treatment.

Since $Y$ is binary the ATE shows the percentage point increase in the probability of a drought impact.

This ATE is specifically the causal effect after "cleaning out" the influence of your confounders ($W$)

By using the Frisch-Waugh-Lovell approach, we have isolated the isolated impact of the treatment itself.

In [None]:
est_linear.summary()

When we add $X$, we include regions to look for conditional ATE (CATE)

In [None]:
## TODO: train your Linear model here with the heterogeneity variables (X)

In [None]:
est_linear.summary()

Since we are using a linear model (*LinearDML*), the *summary()* is showing the coefficients ($\beta$) of the linear equation used to calculate those effects.

We've told the model that treatment effect ($T$) varies based on $X$ (regions). Because we are using a linear parametric model, EconML is fitting this specific equation behind the scenes:
$$\text{Effect}(X) = \beta_{1}(\text{zone\_1}) + \beta_{2}(\text{zone\_2}) + \dots + \text{cate\_intercept}$$

- \text{cate\_intercept}: This is the baseline effect. If a region isn't explicitly called out or if all $X$ variables were zero, this would be the effect.
- \text{point\_estimate}: These are the coefficients, relative shifts. A coefficient of $0.105$ for one zone means that the effect in that zone is $0.105$ higher than the baseline.

When we run est_linear.effect(unique_X), EconML does the addition for us. For a pixel in one Zone, the calculation looks like this:

$$\text{CATE}_{\text{Zone 1}} = \text{cate\_intercept} + \text{coeff}_{\text{zone\_5}}$$
$$\text{CATE}_{\text{Zone 1}} = \text{cate\_intercept} + \text{point\_estimate} = \text{CATE} $$

#### Train estimator using CausalForestDML

In [None]:
## TODO: train your CausalForest model here with the heterogeneity variables (X)

In causal inference, we cannot use standard $R^2$ or Accuracy because you never observe the "ground truth" (the counterfactual).
The RScorer in EconML is a way to get around this using what is called the R-loss.

Instead of comparing your model to the true effect (which is invisible), the RScorer measures how well your model explains the "leftover" variation in the data (the error).

This corresponds to the extra variance of the outcome explained by introducing heterogeneity in the effect as captured by the cate model, as opposed to always predicting a constant effect. A negative score, means that the cate model performs even worse than a constant effect model and hints at overfitting during training of the cate model.

In [None]:
score = cf.score(Y, T, X=X, W=W)
print(f"\nR-Scorer: {score}")

Visualizing the CATE by Climate Region

In [None]:
## TODO: visualize CATE with confidence intervals, prepare the data for plotting

In [None]:
summary['zone_name'] = summary[zone_vars].idxmax(axis=1)

plot_causal_effects(summary, title=f'Causal Impact of {treatment[0]} by zone')

---
# Validation - Sensitivity analysis

Since we don’t have a “causal test set” with a ground-truth, we must use refutation tests.

- **Placebo treatment**: Replace the treatment variable with a random noise variable. The model must find a causal effect of zero. If it finds a significant effect, the model is flawed.

- **Omitted variable test**: “How strong would an unmeasured confounder (one I forgot to include) have to be to make my causal effect go to zero?” If the answer is unrealistically strong, your finding is robust.


**1. Validate the Nuisance Models** (The "Residualizers")

We must ensure model_Y and model_T actually learned something. 

In [None]:
# Prepare the features
# In EconML, when both X and W are provided, the nuisance models are trained on the concatenation of [X, W]
features = np.hstack([X.values, W.values])

In [None]:
## TODO: measure accuracy and ROC AUC for residual models (nuisance functions)

**2. The Placebo Validation (or "Permutation Test")**

Checking if the model is just finding patterns in noise. If we shuffle the treatment assignments, the relationship between $T$ and $Y$ is broken. A robust Causal Forest should then report an ATE of zero. 

If the placebo ATE is similar to the real ATE, the model is likely picking up a "spurious" correlation.

We shuffle the treatment vector $T$, re-fit the model (or just the effect part), and compare the "Fake ATE" to your "Real ATE."

In [None]:
## TODO: compare ATE estimates between the real treatment and a placebo treatment

print(f"Real ATE:    {real_ate_val:.6f}")
print(f"Placebo ATE: {placebo_ate_val:.6f}")

# 5. Calculation of the 'Causal Signal-to-Noise' Ratio
signal_to_noise = abs(real_ate_val / placebo_ate_val) if placebo_ate_val != 0 else np.inf
print(f"Signal-to-Noise Ratio: {signal_to_noise:.2f}x")

**3. Subset Refuter or Leave-One-Confounder-Out check**

In a stable causal model, removing a single confounder should not wildly swing the ATE unless that specific variable was the only thing holding the model's logic together.

If the ATE changes drastically when you drop one variable, it suggests that the model is "leaning" too heavily on that feature to explain the relationship, which often points back to the overfitting issues we saw in the R-score.

Implementation: The "Leave-One-Out" Sensitivity Check. We will loop through the list of control variables ($W$), remove one at a time, re-fit the forest, and record the new ATE.

In [None]:
# 1. Identify your confounders
confounders = vars_list  # This is your W columns list
results = []

# 2. Iterate through each confounder and re-estimate ATE
for col in confounders:

    ## TODO: drop a condounder, re-train your model, and re-estimate ATE

# 3. Create a summary DataFrame
df_sens = pd.DataFrame(results)
df_sens.loc[len(df_sens)] = {'Dropped Confounder': 'None (Original)', 'New ATE': real_ate_val}

# 4. Visualize the sensitivity
plt.figure(figsize=(10, 6))
plt.axvline(x=real_ate_val, color='red', linestyle='--', label='Original ATE')
plt.barh(df_sens['Dropped Confounder'], df_sens['New ATE'], color='skyblue')
plt.xlabel('Estimated ATE')
plt.title('Sensitivity Analysis: Dropping One Confounder')
plt.legend()
plt.show()