# Nearest-Neighbor Matching — Homework 6.1

This notebook computes ATE, ATT, ATU, and the **Optimal Treatment Effect** using 1-Nearest Neighbor matching on the confounder **Z** from `homework_6.1.csv`.

**Columns:**
- `Z`: confounder used for matching
- `X`: treatment indicator (1 = treated, 0 = untreated)
- `Y`: outcome

## Steps
1. Load data
2. Split treated vs untreated
3. Fit 1-NN on `Z` to find counterfactual matches
4. Compute ATE, ATT, ATU, and Optimal TE


In [None]:

import pandas as pd
import numpy as np
from sklearn.neighbors import NearestNeighbors

# Display options
pd.set_option("display.float_format", lambda x: f"{x:0.6f}")


## 1) Load dataset

In [None]:

# Update path if needed
csv_path = "homework_6.1.csv"
df = pd.read_csv(csv_path)
df.head()


## 2) Basic checks

In [None]:

df.info()
df.describe()


## 3) Split groups and fit 1-NN on Z

In [None]:

treated = df[df["X"] == 1].reset_index(drop=True)
untreated = df[df["X"] == 0].reset_index(drop=True)

# Fit nearest neighbors using only confounder Z
nn_treated = NearestNeighbors(n_neighbors=1).fit(untreated[["Z"]])
nn_untreated = NearestNeighbors(n_neighbors=1).fit(treated[["Z"]])

# Index of nearest match for each item
treated_to_untreated_idx = nn_treated.kneighbors(treated[["Z"]], return_distance=False).flatten()
untreated_to_treated_idx = nn_untreated.kneighbors(untreated[["Z"]], return_distance=False).flatten()

# Counterfactual outcomes (Y) from the matched item in the other group
treated_cf = untreated.iloc[treated_to_untreated_idx]["Y"].values
untreated_cf = treated.iloc[untreated_to_treated_idx]["Y"].values


## 4) Compute effects

In [None]:

# Effects per-item
te_treated = treated["Y"].values - treated_cf            # effect for treated items
te_untreated = untreated_cf - untreated["Y"].values      # effect for untreated items

# Aggregates
ate = np.mean(np.concatenate([te_treated, te_untreated]))
att = np.mean(te_treated)
atu = np.mean(te_untreated)
optimal_te = np.max(te_untreated)  # maximum effect among untreated items

results = {
    "ATE": ate,
    "ATT": att,
    "ATU": atu,
    "Optimal TE (max over untreated)": optimal_te
}
results


## 5) Pretty print results

In [None]:

for k, v in results.items():
    print(f"{k}: {v:0.6f}")


## 6) (Optional) Visualization

In [None]:

import matplotlib.pyplot as plt

plt.figure()
plt.scatter(df.loc[df.X==0, "Z"], df.loc[df.X==0, "Y"], label="Untreated (X=0)", alpha=0.6)
plt.scatter(df.loc[df.X==1, "Z"], df.loc[df.X==1, "Y"], label="Treated (X=1)", alpha=0.6)
plt.xlabel("Z (confounder)")
plt.ylabel("Y (outcome)")
plt.title("Outcome vs Confounder by Treatment Group")
plt.legend()
plt.tight_layout()
plt.show()
