# Linear Balancing weights

OLS is doubly robust demo [Robins et al 2007, Kline 2011]

In [1]:
import cbpys
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

from sklearn.preprocessing import MinMaxScaler


This formulation makes it clear that balancing weights can be solved for using only summary data from the target group (i.e. the covariate means for the treatment group for the ATT, or the covariate means for the target population under covariate shift).

## Kline's examples

'OLS is doubly-robust' result: results consistent for ATT if either outcome $Y^0$ or selection odds $\frac{\pi(X)}{1-\pi(X)}$ linear in X.

Replicating examples from [Kline (2011)](https://eml.berkeley.edu/~pkline/papers/OB_reweighting.pdf). Data and code [here](https://eml.berkeley.edu/~pkline/papers/Oaxaca_web.zip).

In [9]:
df = pd.read_stata("nswre74.dta")
yn, wn = "re78", "treat"
xn = df.columns.drop([yn, wn]).tolist()
n = df.shape[0]
y, w, X = df[yn].values, df[wn].values, np.c_[np.ones(n), df[xn].values]

# control covariate matrix, and target moments
X0, X1 = X[w == 0], X[w == 1].mean(axis = 0)
# estimate ATT
lin_weights = cbpys.lbw(X0, X1)
print(f"DiM: {y[w == 1].mean() - y[w == 0].mean():.2f}")
print(
  f"Reweighted: {y[w == 1].mean() - np.average(y[w == 0], weights=lin_weights):.2f}"
)


DiM: 1794.34
Reweighted: 1784.79


Applying this to experiments makes little difference since imbalances are small and coincidental.

### observational

In [11]:
df = pd.read_stata("cps3re74.dta")
yn, wn = "re78", "treat"
xn = df.columns.drop([yn, wn]).tolist()
n = df.shape[0]
y, w, X = df[yn].values, df[wn].values, np.c_[np.ones(n), df[xn].values]

# control covariate matrix, and target moments
X0, X1 = X[w == 0], X[w == 1].mean(axis = 0)

lin_weights = cbpys.lbw(X0, X1)
print(f"DiM: {y[w == 1].mean() - y[w == 0].mean():.2f}")
print(
  f"Reweighted: {y[w == 1].mean() - np.average(y[w == 0], weights=lin_weights):.2f}"
)


DiM: -635.03
Reweighted: 1701.17


CPS3 has 'mild' selection bias [Smith and Todd (2005)], so we can get close to experimental estimates with reweighting alone.

In [12]:
df = pd.read_csv("lalonde_psid.csv")
yn, wn = "re78", "treat"
xn = df.columns.drop([yn, wn]).tolist()
n = df.shape[0]
y, w, X = df[yn].values, df[wn].values, np.c_[np.ones(n), df[xn].values]
# control covariate matrix, and target moments
X0, X1 = X[w == 0], X[w == 1].mean(axis = 0)

lin_weights = cbpys.lbw(X0, X1)
print(f"DiM: {y[w == 1].mean() - y[w == 0].mean():.2f}")
print(
  f"Reweighted: {y[w == 1].mean() - np.average(y[w == 0], weights=lin_weights):.2f}"
)

DiM: = -15204.775555988717
DiM: -15204.78
Reweighted: 687.82


PSID has worse selection bias, so it is harder to undo with reweighting alone.