# Weighted doubly robust learning: A simulation

This notebook reviews _weighted doubly robust learning_, a technique by Zhan, et al., to recover causal effects when different treatments are confounded. For example, a company may apply several promotions to customers during a large advertising campaign. This method is akin to Shapley values to isolate the true conditional average treatment effect (CATE) of each treatment.

I set up a simulation very similar to that in the paper to demonstrate the method's effectiveness.

**Reference**

Zhan, B., Liu, C., Li, Y. and Wu, C., 2024. Weighted doubly robust learning: An uplift modeling technique for estimating mixed treatments' effect. _Decision Support Systems_, 176, p.114060. https://doi.org/10.1016/j.dss.2023.114060

In [4]:
from itertools import chain, combinations

import numpy as np
import pandas as pd
from sklearn.linear_model import LinearRegression, LogisticRegression

np.random.seed(1804)

## Set up simulation

Generate a dataset of 10 thousand observations, with 40 features, and three treatments (A, B, C). Only the first few features will have any effect on the outcome.

In [6]:
# Parameters
n_units = 10_000
n_features = 40

# Generate covariates (X variables)
X = np.random.normal(size=(n_units, n_features))
X_columns = [f'X{i+1}' for i in range(n_features)]
df = pd.DataFrame(X, columns=X_columns)

# Treatment assignment probabilities (influenced by covariates)
p_TA = 1 / (1 + np.exp(-X[:, 0] + 0.5 * X[:, 1]))  # Sigmoid function
p_TB = 1 / (1 + np.exp(-X[:, 2] - 0.3 * X[:, 3]))
p_TC = 1 / (1 + np.exp(X[:, 4] + 0.7 * X[:, 5]))

# Assign treatments (discrete binary values, independent but influenced by X)
df['TA'] = np.random.binomial(1, p_TA)
df['TB'] = np.random.binomial(1, p_TB)
df['TC'] = np.random.binomial(1, p_TC)

The treatment effects are directly set below. Ideally, the result of this algorithm will show effects very similar to these:

In [7]:
# Treatment effects
effect_TA = 2.0
effect_TB = -1.5
effect_TC = 3.0

Finish the simulation:

In [8]:
# Generate outcome Y (additive model)
noise = np.random.normal(0, 1, n_units)
df['y'] = (
    effect_TA * df['TA'] +
    effect_TB * df['TB'] +
    effect_TC * df['TC'] +
    0.3 * X[:, 6] - 0.2 * X[:, 7] + 0.5 * X[:, 8] +  # Covariate effects
    noise
)

         X1        X2        X3        X4        X5        X6        X7  \
0 -0.214325 -0.553342 -0.571716 -0.821458 -0.976144 -0.287905 -1.255918   
1 -1.093172 -1.085448  0.735668  1.975224  1.032347  0.126879  1.747678   
2  0.846787  0.031084  1.183459 -1.463539 -0.679032  0.661826  0.764342   
3  1.299280  1.335320  0.326247  1.134321 -0.204174 -0.715236  0.408855   
4 -0.433620  0.538348 -0.528164  2.055443 -0.392551 -1.185478 -0.465252   

         X8        X9       X10  ...       X36       X37       X38       X39  \
0 -2.048305 -0.838211  0.783283  ...  0.706155  1.044931  0.162699  0.368533   
1  0.580236  0.640855 -0.362036  ...  0.123426  0.913608  0.493342  1.291280   
2  0.281379  0.042616 -0.798705  ... -0.080010 -1.334000  0.541168  0.492100   
3  0.064756 -0.881248  0.934774  ... -0.052189  0.383240 -0.188075  0.203655   
4  1.790214  1.176643 -0.138594  ... -1.685168 -1.247496 -1.884123  1.194776   

        X40  TA  TB  TC         y    t  
0 -0.251308   0   0   0 -1.

Create a label listing all treatments applied to an observation:

In [9]:
# Create a joined treatment column using letters
def combine_treatments(row):
    treatments = []
    if row['TA'] == 1:
        treatments.append('A')
    if row['TB'] == 1:
        treatments.append('B')
    if row['TC'] == 1:
        treatments.append('C')
    return ''.join(treatments) if treatments else '0'

# Apply the function to each row and create a new column 't'
df['t'] = df.apply(combine_treatments, axis=1)

# Preview
print(df.head())

         X1        X2        X3        X4        X5        X6        X7  \
0 -0.214325 -0.553342 -0.571716 -0.821458 -0.976144 -0.287905 -1.255918   
1 -1.093172 -1.085448  0.735668  1.975224  1.032347  0.126879  1.747678   
2  0.846787  0.031084  1.183459 -1.463539 -0.679032  0.661826  0.764342   
3  1.299280  1.335320  0.326247  1.134321 -0.204174 -0.715236  0.408855   
4 -0.433620  0.538348 -0.528164  2.055443 -0.392551 -1.185478 -0.465252   

         X8        X9       X10  ...       X36       X37       X38       X39  \
0 -2.048305 -0.838211  0.783283  ...  0.706155  1.044931  0.162699  0.368533   
1  0.580236  0.640855 -0.362036  ...  0.123426  0.913608  0.493342  1.291280   
2  0.281379  0.042616 -0.798705  ... -0.080010 -1.334000  0.541168  0.492100   
3  0.064756 -0.881248  0.934774  ... -0.052189  0.383240 -0.188075  0.203655   
4  1.790214  1.176643 -0.138594  ... -1.685168 -1.247496 -1.884123  1.194776   

        X40  TA  TB  TC         y    t  
0 -0.251308   0   0   0 -1.

In [10]:
# Split into train/test
df_train = df.iloc[:7000]
df_test = df.iloc[7000:]

Below shows that all possible treatment combinations are present, and occurs in equal numbers:

In [12]:
# Create treatment weights
treatment_prob = df_train['t'].value_counts(normalize=True).to_dict()

treatment_prob

{'A': 0.12757142857142856,
 'B': 0.12742857142857142,
 'AC': 0.127,
 'BC': 0.12571428571428572,
 'ABC': 0.125,
 'C': 0.12414285714285714,
 'AB': 0.12214285714285714,
 '0': 0.121}

In [13]:
TREATMENTS = ['A', 'B', 'C']

all_combinations = list(chain.from_iterable(combinations(TREATMENTS, r) for r in range(len(TREATMENTS) + 1)))
all_combinations = [''.join(i) for i in all_combinations]

C = ['0' if i == '' else i for i in all_combinations]
C

['0', 'A', 'B', 'C', 'AB', 'AC', 'BC', 'ABC']

## Perform WDRL

The functions below are the main implementation of WDRL. The ultimate results are printed below, and are very close to the treatment effect manually set above.

In [14]:
# Function to calculate propensity scores
def calculate_propensity_scores(X, T):
    model_t = LogisticRegression(penalty=None).fit(X, T)
    return model_t.predict_proba(X)[:, 1]

# Function to fit outcome models and compute doubly robust scores
def calculate_doubly_robust_scores(X, y, T, ps):
    m0 = LinearRegression().fit(X[T == 0], y[T == 0])
    m1 = LinearRegression().fit(X[T == 1], y[T == 1])
    
    m0_hat = m0.predict(X)
    m1_hat = m1.predict(X)
    
    y_dr_1 = m1_hat + (T * (y - m1_hat)) / ps
    y_dr_0 = m0_hat + ((1 - T) * (y - m0_hat)) / (1 - ps)
    
    return y_dr_1 - y_dr_0

# Function to compute weighted doubly robust scores
def compute_weighted_dr_scores(dr_scores, treatment_prob):
    valid_weights = {col: treatment_prob[col] for col in dr_scores.columns if col in treatment_prob}
    weighted_sum = sum(dr_scores[col] * w for col, w in valid_weights.items())
    return weighted_sum / sum(valid_weights.values())

# Main loop to calculate ATE for each treatment
def calculate_ate(df_train, C, treatment_prob, treatments):
    treatment_cols = [f'T{t}' for t in treatments]
    X = df_train.drop(columns=['y', 't'] + treatment_cols)
    results = {}
    
    for treatment in treatments:
        print(f"\nProcessing treatment {treatment}...")
        
        # Split combinations into those with and without the current treatment
        W_k = [comb for comb in C if treatment in comb]
        S__k = [comb for comb in C if treatment not in comb]
        
        y_hat_dr_list = []
        
        for w_k, s__k in zip(W_k, S__k):
            # Create treatment/control datasets
            df_subset = df_train.query('t == @w_k | t == @s__k').copy()
            df_subset['t_binary'] = np.where(df_subset['t'] == w_k, 1, 0)
            
            X_treated = df_subset.drop(columns=['y', 't', 't_binary'] + treatment_cols)
            y_treated = df_subset['y']
            T_treated = df_subset['t_binary']
            
            # Calculate propensity scores and doubly robust scores
            ps = calculate_propensity_scores(X_treated, T_treated)
            dr_scores = calculate_doubly_robust_scores(X_treated, y_treated, T_treated, ps)
            
            # Fit DR model on treatment group and apply it to the full dataset
            dr_model = LinearRegression().fit(X_treated, dr_scores)
            y_hat_dr = dr_model.predict(X)
            y_hat_dr_list.append(y_hat_dr)
        
        # Aggregate results
        dr_df = pd.DataFrame({w_k: scores for w_k, scores in zip(W_k, y_hat_dr_list)})
        aggregated_dr_scores = compute_weighted_dr_scores(dr_df, treatment_prob)
        
        # Fit regression model to estimate ATE
        ate_model = LinearRegression().fit(X, aggregated_dr_scores)
        ate = np.mean(ate_model.predict(X))
        
        results[treatment] = ate
        print(f"ATE for treatment {treatment}: {ate:.4f}")
    
    return results

# Example usage (replace with actual data and variables)
# df_train = ...
treatments = ['A', 'B', 'C']  # List of all treatment labels
# C = ...  # All possible combinations of treatments
# treatment_prob = ...  # Dictionary with treatment probabilities

ate_results = calculate_ate(df_train, C, treatment_prob, treatments)
print("Final ATE results:", ate_results)


Processing treatment A...
ATE for treatment A: 1.9316

Processing treatment B...
ATE for treatment B: -1.4757

Processing treatment C...
ATE for treatment C: 3.0026
Final ATE results: {'A': np.float64(1.931633198660085), 'B': np.float64(-1.4756925591041896), 'C': np.float64(3.0026359906402273)}
