In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import statsmodels.api as sm
import statsmodels.formula.api as smf

from sklearn.neighbors import NearestNeighbors


## Questions 1-4

In this dataset, match treated (﻿X equals 1﻿) to untreated (﻿X equals 0﻿) based on the confounder (﻿Z﻿). Find the average treatment effect (each item corresponds to one counterfactual) where the counterfactual is the nearest item in the other group (you can use NearestNeighbors for this.) Then, find the average treatment effect on the treated, where each treated item corresponds to a counterfactual untreated item, but we otherwise ignore the untreated items. Then, find the average treatment effect on the untreated, where each untreated item corresponds to a counterfactual treated item, but we otherwise ignore the treated items. Finally, find the marginal treatment effect, which is the maximum treatment effect across all untreated items (i.e., it ends up considering only a single untreated item with its single counterfactual). 

In [4]:
df_6_1 = pd.read_csv('homework_6.1.csv')

df_6_1.head()

Unnamed: 0.1,Unnamed: 0,Z,X,Y
0,0,0.548814,0,-0.82322
1,1,0.715189,1,0.842405
2,2,0.602763,1,0.898618
3,3,0.544883,0,-0.817325
4,4,0.423655,0,-0.635482


In [5]:
#Split the data into treated and untreated groups

treated = df_6_1[df_6_1['X'] == 1].reset_index(drop=True)
untreated = df_6_1[df_6_1['X'] == 0].reset_index(drop=True)

Z_treated = treated[['Z']]
Z_untreated = untreated[['Z']]

In [7]:
#Fit Nearest Neighbor Models

nn_treated = NearestNeighbors(n_neighbors=1, algorithm='auto').fit(Z_untreated)
nn_untreated = NearestNeighbors(n_neighbors=1, algorithm='auto').fit(Z_treated)

In [9]:
#Match Each Treated to Closest Untreated (ATT)
dist_tu, idx_tu = nn_treated.kneighbors(Z_treated)
Y_untreated_match = untreated.loc[idx_tu.flatten(), 'Y'].values
ATT = np.mean(treated['Y'].values - Y_untreated_match)

print(f"Average Treatment Effect on the Treated (ATT): {ATT:.4f}")

Average Treatment Effect on the Treated (ATT): 1.8464


In [10]:
#Match Each Untreated to Closest Treated (ATU)
dist_ut, idx_ut = nn_untreated.kneighbors(Z_untreated)
Y_treated_match = treated.loc[idx_ut.flatten(), 'Y'].values
ATU = np.mean(Y_treated_match - untreated['Y'].values)

print(f"Average Treatment Effect on the Untreated (ATU): {ATU:.4f}")

Average Treatment Effect on the Untreated (ATU): 1.5495


In [12]:
# Average Treatment Effect (ATE)

ATE = np.mean(np.concatenate([
    treated['Y'].values - Y_untreated_match,
    Y_treated_match - untreated['Y'].values
]))
print(f"Average Treatment Effect (ATE): {ATE:.4f}")

Average Treatment Effect (ATE): 1.6953


In [13]:
#Marginal Treatment Effect 
marginal_effects = treated['Y'].values - Y_untreated_match
marginal_effect = np.max(marginal_effects)  
print(f"Marginal Treatment Effect: {marginal_effect:.4f}")

Marginal Treatment Effect: 2.1763


## Reflection Questions

1. What is a potential problem with computing the Marginal Treatment Effect simply by comparing each untreated item to its counterfactual and taking the maximum difference?  (Hint: think of statistics here.  Consider that only the most extreme item ends up being used to estimate the MTE.  That's not necessarily a bad thing; the MTE is supposed to come from the untreated item that will produce the maximum effect.  But there is nevertheless a problem.)
Possible answer: We are likely to find the item with the most extreme difference, which may be high simply due to randomness.
(Please explain / justify this answer, or give a different one if you can think of one.)


The Marginal Treatment Effect (MTE) estimates the causal effect for the untreated unit at the margin, where treatment yields the most significant effect. Calculating MTE by comparing each untreated unit to its counterfactual and taking the maximum difference only highlights the most extreme observed difference, which may be due to random noise rather than an actual treatment effect. This overestimates the true maximum because the maximum value often results from sampling variability, known as extreme value or selection bias. For example, rolling 100 dice will likely produce a max of 6, even though the average is 3.5, illustrating bias when focusing on extremes and in MTE, choosing the untreated unit with the most significant observed effect risks confusing noise with real signal, overestimating the effect, and producing unstable estimates that won’t replicate. To address this, modeling the distribution of effects and accounting for variability is necessary, rather than relying solely on the raw maximum. Estimate the actual marginal treatment effect.


2. Propose a solution that remedies this problem and write some code that implements your solution.  It's very important here that you clearly explain what your solution will do.
Possible answer: maybe we could take the 90th percentile of the treatment effect and use it as a proxy for the Marginal Treatment Effect.
(Either code this answer or choose a different one.)

In [2]:
# Simulate some untreated units
np.random.seed(42)  # for reproducibility
num_units = 100

# Assume untreated outcomes (Y0) and predicted treated outcomes (Y1_hat)
Y0 = np.random.normal(loc=50, scale=10, size=num_units)          # observed untreated outcomes
Y1_hat = Y0 + np.random.normal(loc=5, scale=5, size=num_units)  # predicted treated outcomes

# Compute individual treatment effects (ITE)
ITE = Y1_hat - Y0

# Compute 90th percentile as a robust proxy for MTE
mte_estimate = np.percentile(ITE, 90)

print("Estimated Marginal Treatment Effect (90th percentile):", mte_estimate)


Estimated Marginal Treatment Effect (90th percentile): 10.867251482674538
