Question

In this dataset, match treated (X = 1) to untreated (X = 0) based on the confounder (Z). Find the average treatment effect (each item corresponds to one counterfactual) where the counterfactual is the nearest item in the other group (you can use NearestNeighbors for this.) Then, find the average treatment effect on the treated, where each treated item corresponds to a counterfactual untreated item, but we otherwise ignore the untreated items. Then, find the average treatment effect on the untreated, where each untreated item corresponds to a counterfactual treated item, but we otherwise ignore the treated items. Finally, find the marginal treatment effect, which is the maximum treatment effect across all untreated items (i.e., it ends up considering only a single untreated item with its single counterfactual). 

Question 1
Which is closest to the average treatment effect? 

Option A
2.014

Option B
1.695

Option C
1.583

Option D
1.832

Question 2
Which is closest to the average treatment effect on the treated? 

Option A
1.503

Option B
1.620

Option C
1.846

Option D
1.714

Question 3
Which is closest to the average treatment effect on the untreated? 

Option A
1.689

Option B
1.843

Option C
1.549

Option D
2.305

Question 4
Which is closest to the marginal treatment effect? 

Option A
0.8935

Option B
1.134 

Option C
2.172

Option D
1.480 



In [3]:
import pandas as pd
import numpy as np
from sklearn.neighbors import NearestNeighbors

df = pd.read_csv('homework_6.1.csv')

# 기본 가드
if not {'X','Z','Y'}.issubset(df.columns):
    raise ValueError("CSV에 X, Z, Y 컬럼이 모두 있어야 합니다.")

treated   = df[df['X'] == 1].copy().reset_index(drop=True)
untreated = df[df['X'] == 0].copy().reset_index(drop=True)

if len(treated)==0 or len(untreated)==0:
    raise ValueError("처리/미처리 그룹이 모두 있어야 매칭이 가능합니다.")

Z_t = treated[['Z']].to_numpy()
Z_u = untreated[['Z']].to_numpy()

# 최근접 이웃 (with replacement)
nn_u = NearestNeighbors(n_neighbors=1).fit(Z_u)  # treated -> nearest untreated
nn_t = NearestNeighbors(n_neighbors=1).fit(Z_t)  # untreated -> nearest treated

dist_t, idx_t = nn_u.kneighbors(Z_t)
cf_u_for_t = untreated.iloc[idx_t.flatten()].reset_index(drop=True)
effects_treated = treated['Y'].to_numpy() - cf_u_for_t['Y'].to_numpy()

dist_u, idx_u = nn_t.kneighbors(Z_u)
cf_t_for_u = treated.iloc[idx_u.flatten()].reset_index(drop=True)
effects_untreated = cf_t_for_u['Y'].to_numpy() - untreated['Y'].to_numpy()

# ATE: 두 방향 효과를 모두 포함해 전체 평균
ate = np.concatenate([effects_treated, effects_untreated]).mean()

# ATT/ATU
att = effects_treated.mean()
atu = effects_untreated.mean()

# MTE: 미처리 항목에서의 효과 중 최댓값
mte = effects_untreated.max()

print(f"ATE : {ate:.4f}")
print(f"ATT : {att:.4f}")
print(f"ATU : {atu:.4f}")
print(f"MTE : {mte:.4f}")


ATE : 1.6953
ATT : 1.8464
ATU : 1.5495
MTE : 2.1725
